41 datasets found

Freesound Loop Dataset
zenodo.org
bin, zip
Updated Jul 31, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Antonio Ramires; Frederic Font; Dmitry Bogdanov; Jordan B. L. Smith; Yi-Hsuan Yang; Joann Ching; Bo-Yu Chen; Yueh-Kao Wu; Hsu Wei-Han; Xavier Serra; Antonio Ramires; Frederic Font; Dmitry Bogdanov; Jordan B. L. Smith; Yi-Hsuan Yang; Joann Ching; Bo-Yu Chen; Yueh-Kao Wu; Hsu Wei-Han; Xavier Serra (2020). Freesound Loop Dataset [Dataset]. http://doi.org/10.5281/zenodo.3967852
Explore at:
bin, zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.3967852
Dataset updated
Jul 31, 2020
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Antonio Ramires; Frederic Font; Dmitry Bogdanov; Jordan B. L. Smith; Yi-Hsuan Yang; Joann Ching; Bo-Yu Chen; Yueh-Kao Wu; Hsu Wei-Han; Xavier Serra; Antonio Ramires; Frederic Font; Dmitry Bogdanov; Jordan B. L. Smith; Yi-Hsuan Yang; Joann Ching; Bo-Yu Chen; Yueh-Kao Wu; Hsu Wei-Han; Xavier Serra
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Freesound Loop Dataset

This dataset contains 9,455 loops from Freesound.org and the corresponding annotations. These loops have tempo, key, genre and instrumentation annotation.

Dataset Construction

To collect this dataset, the following steps were performed:

Freesound was queried with "loop" and "bpm", so as to collect loops which have a beats-per-minute(BPM) annotations.

The sounds were analysed with AudioCommons extractor, so as to obtain key information.

The textual metadata of each sound was analysed, to obtain the BPM proposed by the user, and to obtain genre information.

Annotators used a web interface to annotate around 3,000 loops.

Dataset Organisation

The dataset contains two folders and two files in the root directory:

'FSL10K' encloses the audio files and their metadata and analysis. The audios are in the 'audio' folder and are named '

'annotations' holds the expert provided annotation for the sounds in the dataset. The annotations are separated in a folder for each annotator and each annotation is stored as a .json file, named 'sound-

Licenses

All the sounds have some kind of Creative Commons license. The license of each sound in the dataset can be obtained from the 'FSL10K/metadata.json' file

Authors and Contact

This dataset was developed by António Ramires et. al.

Any questions related to this dataset please contact:

António Ramires

antonio.ramires@upf.edu

aframires@gmail.com

References

Please cite this paper if you use this dataset:

@inproceedings{ramires2020, author = "Antonio Ramires and Frederic Font and Dmitry Bogdanov and Jordan B. L. Smith and Yi-Hsuan Yang and Joann Ching and Bo-Yu Chen and Yueh-Kao Wu and Hsu Wei-Han and Xavier Serra", title = "The Freesound Loop Dataset and Annotation Tool", booktitle = "Proc. of the 21st International Society for Music Information Retrieval (ISMIR)", year = "2020" }

Acknowledgements

This work has received funding from the European Union's Horizon 2020 research and innovation programme under the Marie Skłodowska-Curie grant agreement No. 765068 (MIP-Frontiers).
z
SymbioLCD - Datasets - Dataset - data.govt.nz - discover and use data
portal.zero.govt.nz
catalogue.data.govt.nz
Updated Jan 18, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
zero.govt.nz (2022). SymbioLCD - Datasets - Dataset - data.govt.nz - discover and use data [Dataset]. https://portal.zero.govt.nz/77d6ef04507c10508fcfc67a7c24be32/dataset/oai-figshare-com-article-14958228
Explore at:
Dataset updated
Jan 18, 2022
License
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Description
Overview:Three new datasets available here represent normal household areas with common objects - lounge, kitchen and garden - with varying trajectories.Description:Lounge: The lounge dataset with common household objects.Lounge_oc: The lounge dataset with object occlusions near the end of trajectory.Kitchen: The kitchen dataset with common household objects.Kitchen_oc: The kitchen dataset with object occlusions near the end of trajectory.Garden: The garden dataset with common household objects.Garden_oc: The garden dataset with object occlusions near the end of trajectory.convert.py: Python script to convert a video file into jpgs.Paper:The datasets were used for the paper "SymbioLCD: Ensemble-Based Loop Closure Detection using CNN-Extracted Objects and Visual Bag-of-Words", accepted at 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems.Abstract:Loop closure detection is an essential tool of Simultaneous Localization and Mapping (SLAM) to minimize drift in its localization. Many state-of-the-art loop closure detection (LCD) algorithms use visual Bag-of-Words (vBoW), which is robust against partial occlusions in a scene but cannot perceive the semantics or spatial relationships between feature points. CNN object extraction can address those issues, by providing semantic labels and spatial relationships between objects in a scene. Previous work has mainly focused on replacing vBoW with CNN derived features.In this paper we propose SymbioLCD, a novel ensemble-based LCD that utilizes both CNN-extracted objects and vBoW features for LCD candidate prediction. When used in tandem, the added elements of object semantics and spatial-awareness creates a more robust and symbiotic loop closure detection system. The proposed SymbioLCD uses scale-invariant spatial and semantic matching, Hausdorff distance with temporal constraints, and a Random Forest that utilizes combined information from both CNN-extracted objects and vBoW features for predicting accurate loop closure candidates. Evaluation of the proposed method shows it outperforms other Machine Learning (ML) algorithms - such as SVM, Decision Tree and Neural Network, and demonstrates that there is a strong symbiosis between CNN-extracted object information and vBoW features which assists accurate LCD candidate prediction. Furthermore, it is able to perceive loop closure candidates earlier than state-of-the-art SLAM algorithms, utilizing added spatial and semantic information from CNN-extracted objects.Citation:Please use the bibtex below for citing the paper:@inproceedings{kim2021symbiolcd,title = {SymbioLCD: Ensemble-Based Loop Closure Detection using CNN-Extracted Objects and Visual Bag-of-Words},author = {Jonathan Kim and Martin Urschler and Pat Riddle and J"{o}rg Wicker},year = {2021},date = {2021-09-27},booktitle = {IEEE/RSJ International Conference on Intelligent Robots and Systems},keywords = {},pubstate = {forthcoming},tppubtype = {inproceedings}}
Free Recall with Closed-Loop Stimulation at Encoding (Encoding Classifier)
openneuro.org
Updated Oct 6, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Haydn G. Herrema; Michael J. Kahana (2024). Free Recall with Closed-Loop Stimulation at Encoding (Encoding Classifier) [Dataset]. http://doi.org/10.18112/openneuro.ds005557.v1.0.0
Explore at:
Unique identifier
https://doi.org/10.18112/openneuro.ds005557.v1.0.0
Dataset updated
Oct 6, 2024
Dataset provided by
OpenNeurohttps://openneuro.org/
Authors
Haydn G. Herrema; Michael J. Kahana
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
Free Recall with Closed-Loop Stimulation at Encoding (Encoding Classifier)

Description

This dataset contains behavioral events and intracranial electrophysiology recordings from a delayed free recall task with closed-loop stimulation at encoding, using a classifier trained on encoding data. The experiment consists of participants studying a list of words, presented visually one at a time, completing simple arithmetic problems that function as a distractor, and then freely recalling the words from the just-presented list in any order. The data was collected at clinical sites across the country as part of a collaboration with the Computational Memory Lab at the University of Pennsylvania. This dataset is a closed-loop stimulation version of the FR1 and FR2 datasets.

This study contains closed-loop electrical stimulation of the brain during encoding. There is no stimulation during the distractor or retrieval phases. Stimulation is delivered to a single electrode at a time, and the stimulation parameters are included in the bevavioral events tsv files, denoting the anode/cathode labels, amplitude, pulse frequency, pulse width, and pulse count.

Classifier Details

The L2 logistic regression classifier is trained to predict whether an encoded item will be subsequently recalled based on the neural features during encoding, using data from a participant's FR1 sessions. The bipolar recordings during the 0-1366 ms interval after word presentation are filtered with a Butterworth band stop filter (58-62 Hz, 4th order) to remove 60 Hz line noise, and then a Morlet wavelet transformation (wavenumber = 5) is applied to the signal to estimate spectral power, using 8 log-spaced wavelets between 3-180 Hz (center frequencies 3.0, 5.4, 9.7, 17.4, 31.1, 55.9, 100.3, 180 Hz) and 1365 ms mirrored buffers. The powers are log-transformed prior to removal of the buffer, and then z-transformed based on the within-session mean and standard deviation across all encoding events. These z-transformed log power values represent the feature matrix, and the label vector is the recalled status of the encoded items. The penalty parameter is chosen based on the value that leads to the highest average AUC for all prior participants with at least two FR1 sessiona, and is inversely weighted according to the class (i.e., recalled v. not recalled) imbalance to ensure the best fit values of the penalty parameter are comparable across different class distributions (recall rates). Class weights are computed as: (1/Na) / ((1/Na + 1/Nb) / 2) where Na and Nb are the number of events in each class.

After at least 3 training sessions with a minimum of 15 lists, each participant's classifier is tested using leave-one-session-out (LOSO) cross validation, and the true AUC is compared to a 200-sample AUC distribution generated from classification of label-permuted data. p < 0.05 (one-sided) is used as the significance threshold for continuing to the closed-loop task.

Closed-Loop Procedure

Each session contains 26 lists (the first being a practice list) and there is no stimulation on the first 4 lists. The classifier ouput for each presented item on the first 4 lists is compared to the classifier output when tested on data from all previous sessions using a two-sample Kolmogorov-Smirnov test. The null hypothesis that the current session and the training data come from the same distribution must not be rejected (p > 0.05) for the closed-loop task to continue.

The remaining 22 lists are equally divided into stimulation and no stimulation lists, with conditions balanced in each half of the session. On stimulation lists, classifier output is evaluated during the 0-1366 ms interval following word presentation onset. The input values are normalized using the mean and standard deviation across encoding events on all prior no stimulation lists in the session. If the classifier output is below the median classifier output from the training sessions, stimulation occurs immediately following the 1366 ms decoding interval and lasts for 500 ms. With a 750-1000 ms inter-stimulus interval, there is enough time for stimulation artifacts to subside before the next word onset (next classifier decoding).

To Note

The iEEG recordings are labeled either "monopolar" or "bipolar." The monopolar recordings are referenced (typically a mastoid reference), but should always be re-referenced before analysis. The bipolar recordings are referenced according to a paired scheme indicated by the accompanying bipolar channels tables.

Each subject has a unique montage of electrode locations. MNI and Talairach coordinates are provided when available.

Recordings done with the Blackrock system are in units of 250 nV, while recordings done with the Medtronic system are estimated through testing to have units of 0.1 uV. We have completed the scaling to provide values in V.

Contact

For questions or inquiries, please contact sas-kahana-sysadmin@sas.upenn.edu.
g
Simple download service (Atom) of the dataset: Features of the Poses Loop...
gimi9.com
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Simple download service (Atom) of the dataset: Features of the Poses Loop PPRI | gimi9.com [Dataset]. https://gimi9.com/dataset/eu_fr-120066022-srv-5d3d696d-a435-48b6-bd36-e8e1bd543532
Explore at:
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
Features of the Poses Loop PPRI
A benchmark dataset of Solidity smart contracts
zenodo.org
Updated Mar 17, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Tianyuan Hu; Tianyuan Hu (2023). A benchmark dataset of Solidity smart contracts [Dataset]. http://doi.org/10.5281/zenodo.7606610
Explore at:
Unique identifier
https://doi.org/10.5281/zenodo.7606610
Dataset updated
Mar 17, 2023
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Tianyuan Hu; Tianyuan Hu
Description
A benchmark dataset contains 4,364 real-world Solidity smart contracts, which are manually labeled with ten types of vulnerabilities.

DC (DelegateCall).

The address.delegatecall() function allows a smart contract to dynamically load external contracts from address at runtime. If the attacker can control the external contract and affect the current contract status, the contract is vulnerable to DC.

IOU (Arithmetic/Integer Overflow and Underflow).

An arithmetic overflow or underflow, often called Integer Overflow or Underflow (IOU), occurs when an arithmetic operation attempts to create a numeric variable value that is larger than the maximum value or smaller than the minimum value of the variable type. If the arithmetic operation may pass a variable type’s maximum or minimum value and is performed without using SafeMath, the contract is vulnerable to IOU.

NC (Nested Call).

The function containing the loop has a high risk of exceeding its gas limitation and causing an out-of-gas error. If the attacker can control the loop iteration and causes the out-of-gas error, the contract is vulnerable to NC,

RE (Reentrancy).

The contract vulnerable to RE uses the call() function to transfer ether to an external contract. The external contract can reenter the vulnerable contract by fallback function. If the state variable change is after the call() function, the reentrance will cause status inconsistency.

TD (Timestamp Dependency).

The contract uses the timestamp as the deciding factor for critical operations, e.g., sending ether. If the attacker can get ether from the contract by manipulating the timestamp or affecting the critical operations, the contract is vulnerable to TD.

TO (TxOrigin).

If the contract only uses tx.origin to verify the caller's identification for critical operations, it is vulnerable to TO.

TOD (Transaction Order Dependency).

The contract may send out ether differently according to different values of a global state variable or different balance values of the contract. If the attackers can get ether from the contract by manipulating the transaction sequences, the contract is vulnerable to TOD.

UcC (Unchecked Call).

The contract uses the function call() or send() without result checking. If the send() or call() function fails and leads to status inconsistency, the contract is vulnerable to UcC.

Unprotected Suicide).

If an attacker can self-destruct the contract by calling the selfdestruct(address) function, the contract is vulnerable to UpS.

FE (Frozen Ether).

If the contract can receive ether but cannot transfer it by itself, it is vulnerable to FE.

For the purpose of protection for smart contracts, the dataset can be available after request.
d
Seattle 20 Second Freeway
catalog.data.gov
data.virginia.gov
+3more
Updated Mar 16, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
US Department of Transportation (2025). Seattle 20 Second Freeway [Dataset]. https://catalog.data.gov/dataset/seattle-20-second-freeway
Explore at:
Dataset updated
Mar 16, 2025
Dataset provided by
US Department of Transportation
Area covered
Seattle
Description
This set of data files is one of the four test data sets acquired by the USDOT Data Capture and Management program. It contains the following data for the six months from May 1 2011 to October 31 2011: -Raw and cleaned data for traffic detectors deployed by Washington Department of Transportation (WSDOT) along I-5 in Seattle. Data includes 20-second raw reports. -Incident response records from the WSDOT's Washington Incident Tracking System (WITS). -A record of all messages and travel times posted on WSDOT's Active Traffic -Management signs and conventional variable message signs on I-5. -Loop detector volume and occupancy data from arterials parallel to I-5, estimated travel times on arterials derived from Automatic License Plate Reader (ALPR) data, and arterial signal timing plans. -Scheduled and actual bus arrival times from King County Metro buses and Sound Transit buses. -Incidents on I-5 during the six month period -Seattle weather data for the six month period -A dataset of GPS breadcrumb data from commercial trucks described in the documentation is not available to the public because of data ownership and privacy issues. This legacy dataset was created before data.transportation.gov and is only currently available via the attached file(s). Please contact the dataset owner if there is a need for users to work with this data using the data.transportation.gov analysis features (online viewing, API, graphing, etc.) and the USDOT will consider modifying the dataset to fully integrate in data.transportation.gov. Note: All extras are attached in Seattle Freeway Travel Times https://data.transportation.gov/Automobiles/Seattle-Freeway-Travel-Times/9v5g-t8u8
z
CISC-LIVE-LAB-3/dataset: v1.0.2
zenodo.org
zip
Updated Jan 31, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ammar N. Abbas; Winniewelsh; Ammar N. Abbas; Winniewelsh (2024). CISC-LIVE-LAB-3/dataset: v1.0.2 [Dataset]. http://doi.org/10.5281/zenodo.10600674
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.10600674
Dataset updated
Jan 31, 2024
Dataset provided by
Zenodo
Authors
Ammar N. Abbas; Winniewelsh; Ammar N. Abbas; Winniewelsh
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Human-in-the-Loop Decision Support in Process Control Rooms Dataset

Overview

This repository contains a comprehensive dataset to assess cognitive states, workload, situational awareness, stress, and performance in human-in-the-loop process control rooms. The dataset includes objective and subjective measures from various data collection tools such as NASA-TLX, SART, eye tracking, EEG, Health Monitoring Watch, surveys, and think-aloud situational awareness assessments. It is based on an experimental study of a formaldehyde production plant based on participants' interactions in a controlled control room experimental setting.

Purpose

The study compared three different setups of human system interfaces in four human-in-the-loop (HITL) configurations, incorporating two alarm design formats (Prioritised vs non-prioritised) and three procedural guidance setups (e.g. one presenting paper procedures, one offering digitised screen-based procedures, and lastly an AI-based procedural guidance system).

Key Features

Subject Area: Chemical Engineering, Control and Safety Engineering, Human Factors and Ergonomics, Human-Computer Interaction, and Artificial Intelligence

Data Format: Raw, Analyzed, Filtered

Type of Data: CSV File (.csv), Matlab File (.mat), Excel (.xlsx), Table

Data Collection: The dataset contains behavioural, cognitive, and performance data from 92 participants, including system data under each participant from three scenarios, each simulating a typical control room monitoring, alarm handling, planning, and intervention tasks and subtasks. The participants consented to participate on the test day, after which the researchers trained them. They performed tasks under three scenarios, each lasting 15 - 18 minutes. During these tests, the participant wore a watch for health monitoring, including an eye tracker. They were asked situational awareness questions based on the SPAM methodology at specific periods within 15 minutes, especially at the 6th, 8th, and 12th minutes. These questions assessed the three levels of situational awareness: perception, comprehension, and projection. This feedback collection process on situational awareness differed for one of the groups that used an AI-based decision support system. The question for this group was asked right after specific actions. Therefore, for the overall study, the following performance-shaping factors are considered: type of decision support system (alarm display design, procedure format, AI support, interface design), communication, situational awareness, cognitive workload, experience/training, task complexity, and stress. In both cases, communication was excluded as a factor considered in the first and second scenarios based on this absence. The data collected was normalized using the Min-Max normalization.

Potential Applications

The dataset provides an opportunity for various applications, including:

Developing human performance models and process safety models

Developing a digital twin simulating human-machine interaction in process control rooms

Optimizing human-AI interaction in safety-critical industries

Qualifying and quantifying the performance and effectiveness of AI-enhanced decision support systems incorporating Deep Reinforcement Learning (DRL) using a Specialized Reinforcement Learning Agent (SRLA) framework

Validating proposed solutions for the industry

Usage

The dataset is instrumental for researchers, decision-makers, system engineers, human factor engineers, and teams developing guidelines and standards. It is also applicable for validating proposed solutions for the industry and for researchers in similar or close domains.

Data Structure

The concatenated Excel file for the dataset may include the following detailed data:

Demographic and Educational Background Data:

Participant Identifier: A unique alphanumeric code assigned to each participant for anonymity and tracking purposes.

Age: The age of each participant at the time of the experiment.

Gender: The gender of each participant, typically categorized as male, female, or other.

Educational Background: Details of participants' academic qualifications, including degree type (e.g., Masters, PhD), year of study, and field of study (e.g., Chemical Engineering, IT).

Dominant Hand: Information on whether participants are right or left-handed, which could influence their interaction with the simulation interface.

Familiarity with Industry and Control Room: Self-reported familiarity levels with the industry in general and control room environments specifically, on a scale from 1 to 5.

SPAM Metrics:

Participant Identifier: Unique codes for participants (e.g., P04, P06), maintaining anonymity while allowing for individual analysis.

Group Assignment: Indicates the experimental group (e.g., G4, G3, G2, G1) to which participants belonged, reflecting different levels of decision support in the simulation.

Scenario Engagement: Identifies the specific scenarios (e.g., S1, S2, S3) each participant encountered, representing diverse challenges within the control room simulation.

SPAM Metrics: Participant ratings across three dimensions of the SPAM questionnaire - Perception, Understanding, and Projection, on a scale typically from 1 to 5.

SPAM Index: Composite scores derived from the SPAM, indicating overall situation awareness levels experienced by participants. Calculated as the average of the score on perception, understanding and projection.

NASA-TLX Responses:

Participant Identifier: A unique alphanumeric code assigned to each participant for anonymity and tracking purposes.

Group Assignment: Indicates the experimental group (e.g., G1) to which participants were assigned, reflecting different levels of decision support in the simulation.

TLX Ratings: Participants' responses utilizing the NASA Task Load Index (NASA TLX) questionnaire, providing insights into the cognitive, physical, and emotional workload experienced by operators in simulated control room scenarios.

TLX Index: Composite scores derived from the NASA TLX, representing the overall workload experienced by the participant, calculated as an average of the ratings across the six dimensions.

SART Data:

Participant Identifier: Unique codes for participants (e.g., P04, P06), maintaining anonymity while allowing for individual analysis.

Group Assignment: Indicates the experimental group (e.g., G1) to which participants belonged, reflecting different levels of decision support in the simulation.

SART Metrics: Participants' responses to the Situation Awareness Rating Technique (SART) questionnaire, capturing metrics reflecting the participants' situation awareness. It is calculated using the equation U - (D - S). Situation Understanding (U) comprises Information Quantity, Information Quality, and Familiarity. Situation demand (D) includes the situation's Instability, Complexity, and Variability. At the same time, the Supply of attentional resources (S) comprises Arousal, Concentration, Division of Attention, and Spare Capacity.

AI Decision Support System Feedback:

Participant Identifier: A unique alphanumeric code assigned to each participant for anonymity and tracking purposes.

AI System Ratings: Participants' feedback and ratings across different aspects of the AI decision support system, such as support, explainability, and trust, providing insights into the system's perceived strengths and areas for improvement.

Workload Impact Data: Information on the workload impact and the balance between AI benefits and additional workload, offering valuable perspectives on the practicality and efficiency of integrating AI systems in control room operations.

DRL (Deep Reinforcement Learning) Role: Emphasis on the importance of validating AI recommendations and the role of Deep Reinforcement Learning (DRL) in enhancing trust.

Performance Metrics:

Participant Identifier: A unique alphanumeric code assigned to each participant for anonymity and tracking purposes.

Scenario Engagement: Details of the specific scenario (e.g., S1, S2, S3) each participant encountered, representing various challenges in the control room environment.

Task-Specific Performance Measures: Data capturing the participants' experiences and performance across different scenarios in a control room simulation, including task-specific performance measures and outcomes related to decision-making processes in safety- critical environments.

This detailed breakdown provides a comprehensive view of the specific data elements that could be included in the concatenated Excel file, allowing for thorough analysis and exploration of the participants' experiences, cognitive states, workload, and decision-making processes in control room environments.

<a
Water Meter Jbktv 7vz5k Axdg Dataset
universe.roboflow.com
zip
Updated Mar 8, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
roboflow 20 VL (2025). Water Meter Jbktv 7vz5k Axdg Dataset [Dataset]. https://universe.roboflow.com/roboflow-20-vl/water-meter-jbktv-7vz5k-axdg
Explore at:
zipAvailable download formats
Dataset updated
Mar 8, 2025
Dataset provided by
Roboflow
Authors
roboflow 20 VL
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Variables measured
Water Meter Jbktv 7vz5k Axdg Axdg Bounding Boxes
Description
Overview

Introduction

Object Classes

Number

0

1

2

3

4

5

6

7

8

9

Introduction

This dataset is designed to facilitate the task of detecting and recognizing characters on a water meter. It includes annotations for individual digits (0-9) as well as any other textual or numeric information present. The goal is to provide comprehensive labels for training object detection models to accurately identify and interpret the numeric readings.

Number: Refers to any sequence or individual numeric or alphabetic characters that are not single digits.

0-9: Individual digits from 0 to 9 that appear as standalone characters on the water meter display.

Object Classes

Number

Description

The "Number" class includes any sequences or isolated numeric or alphabetic characters that are not part of the specific digit classes (0-9). These can be alphanumeric codes, labels, or any other text visible on the water meter that is not a single digit.

Instructions

Annotate any sequence of numbers or text that is not a single digit.

Include all visible parts of the alphanumeric sequence.

Do not annotate individual digits that are clearly separate and belong to classes 0-9.

0

Description

The digit "0" is represented as a single closed loop character, often circular or oval in shape, found on the water meter display.

Instructions

Annotate the complete visible form of the digit "0".

Ensure the bounding box closely encloses the outer edges of the character.

Ignore partial digits that are not clearly readable as "0".

1

Description

The digit "1" typically appears as a single vertical line, occasionally with a small base or top serif, depending on the font style.

Instructions

Draw a bounding box around the full visible extent of the digit "1".

If the digit has serifs, include them within the box.

Do not annotate shadows or indistinct lines that could be mistaken for "1".

2

Description

The digit "2" usually has a rounded top loop and a descending diagonal stroke ending in a horizontal base.

Instructions

Capture the entire digit "2", from its rounded top to the bottom base.

Make sure the bounding box is snug around both the top loop and the bottom line.

Do not label portions that do not clearly define the character "2".

3

Description

The digit "3" consists of two rounded loops stacked vertically with their centers aligned.

Instructions

Enclose both rounded loops of the digit "3" within a bounding box.

Ensure there is minimal space between the edge of the loops and the box.

Disregard any artifacts that do not contribute to the full shape of "3".

4

Description

The digit "4" appears with a vertical line intersected by a diagonal line forming a triangle and a horizontal base.

Instructions

Include the vertical, diagonal, and horizontal components within the bounding box.

Verify the triangle and base are entirely contained.

Ignore marks that do not complete the "4" shape.

5

Description

The digit "5" features a top horizontal line, a curved back, and a flat base, resembling an incomplete circle with a flat top.

Instructions

Annotate from the top line through the back curve and base line.

Align the bounding box tightly around the curved sections.

Exclude markings that do not complete the recognizably "5" structure.

6

Description

The digit "6" includes a closed loop at the bottom with an open top loop, appearing as a partially twisted circle.

Instructions

Ensure the bounding box covers both the open top and closed bottom loops.

The box should encompass the whole digit, avoiding excess space.

Neglect incomplete loops not forming a full "6".

7

Description

The digit "7" has a flat top line connected to a diagonal descending line, often lacking additional embellishments.

Instructions

Frame both the horizontal top and the descending line.

Box should be tight, especially around the junction of the horizontal and diagonal lines.

Do not include extraneous lines that do not match "7".

8

Description

The digit "8" consists of two equal-sized closed loops stacked vertically.

Instructions

The bounding box should capture both loops fully, ensuring the character's symmetry is maintained.

Do not annotate shapes that do not distinctly form an "8".

9

Description

The digit "9" appears as a top loop with a straight or slightly curved descending tail, resembling an upside-down "6".

Instructions

Annotate both the top round and tail section within the bounding box.

Ensure the box encapsulates the full shape from top to bottom.

Exclude lines without a connecting loop or tail completing a "9
GulfFlow: A gridded surface current product for the Gulf of Mexico from...
zenodo.org
data.niaid.nih.gov
Updated Jul 5, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jonathan M. Lilly; Jonathan M. Lilly; Paula Pérez-Brunius; Paula Pérez-Brunius (2023). GulfFlow: A gridded surface current product for the Gulf of Mexico from consolidated drifter measurements [Dataset]. http://doi.org/10.5281/zenodo.4421958
Explore at:
Unique identifier
https://doi.org/10.5281/zenodo.4421958
Dataset updated
Jul 5, 2023
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Jonathan M. Lilly; Jonathan M. Lilly; Paula Pérez-Brunius; Paula Pérez-Brunius
Area covered
Gulf of Mexico (Gulf of America)
Description
This dataset is comprised of mean and variance of the surface velocity field of the Gulf of Mexico, obtained from a large set of historical surface drifter data from the Gulf of Mexico—3770 trajectories spanning 28 years and more than a dozen data sources— which were uniformly processed and quality controlled, and assimilated into a spatially and temporally gridded dataset. A gridded product, called GulfFlow, is created by averaging all available data from the GulfDrifters dataset within quarter-degree spatial bins, and within overlapping monthlong temporal bins having a semimonthly spacing. The dataset spans monthly time bins centered on July 16, 1992 through July 1, 2020, for a total of 672 overlapping time slices. Odd- numbered slices correspond to calendar months, while even-numbered slices run from halfway through one month to halfway through the following month. A higher spatial resolution version, GulfFlow-1/12 degree is created in the identical way but using 1/12 degree bins instead of quarter-degree bins. In addition to the average velocities within each 3D bin, the count of sources contributing to each bin is also distributed, as is the subgridscale velocity variance. The count variable is a four-dimensional array of integers, the fourth dimension of which has length 45. This variable gives the number of hourly observations from each source dataset contributing to each three-dimensional bin. Values 1–15 are the count of velocity observations from drifters from each of the 15 experiments that are flagged as having retained their drogues, values 16–30 are for observation from drifters that are flagged as having lost their drogues, and values 31–45 are for observations from drifters of an unknown drogue status. In defining averaged quantities, we represent the velocity as a vector, u = [u v]^T , where the superscript “T” denotes the transpose. Let an overbar, \(\overline {\bf u}\) , denote an average over a spatial bin and over all times, while angled brackets, <u>, denote an average over a spatial bin and a particular temporal bin. Thus, <u>, is a function of time while \(\overline {\bf u}\) is not. We refer to <u>, as the local average, \(\overline {\bf u}\) as the global average, and \(\overline {<\bf u>}\) as the double average. Given the inhomogeneity of the drifter data, turns out the global average is biased towards intensive but short duration programs, hence the double average results in a much better representation of the true mean velocity field. The dataset includes the global average \(\overline {<\bf u>}\), the local covariance defined as

\(\bf{ε}=<(u − )(𝐮−< 𝐮 >)^T>\)

and \(\epsilon^2\)which is the trace of \(\overline{\bf ε}\)

\(\epsilon^2\)=\(tr\{\overline{\bf ε}\}\)

The data is distributed in two separate netcdCDF files, one for each grid resolution.

Here the article describing this dataset.

Lilly, J. M. and P. Pérez-Brunius (2021). A gridded surface current product for the Gulf of Mexico from consolidated drifter measurements. Earth System Science Data, 13: 645–669. https://doi.org/10.5194/essd-13-645-2021.
m
Educational Attainment in North Carolina Public Schools: Use of statistical...
data.mendeley.com
Updated Nov 14, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Scott Herford (2018). Educational Attainment in North Carolina Public Schools: Use of statistical modeling, data mining techniques, and machine learning algorithms to explore 2014-2017 North Carolina Public School datasets. [Dataset]. http://doi.org/10.17632/6cm9wyd5g5.1
Explore at:
Unique identifier
https://doi.org/10.17632/6cm9wyd5g5.1
Dataset updated
Nov 14, 2018
Authors
Scott Herford
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
North Carolina
Description
The purpose of data mining analysis is always to find patterns of the data using certain kind of techiques such as classification or regression. It is not always feasible to apply classification algorithms directly to dataset. Before doing any work on the data, the data has to be pre-processed and this process normally involves feature selection and dimensionality reduction. We tried to use clustering as a way to reduce the dimension of the data and create new features. Based on our project, after using clustering prior to classification, the performance has not improved much. The reason why it has not improved could be the features we selected to perform clustering are not well suited for it. Because of the nature of the data, classification tasks are going to provide more information to work with in terms of improving knowledge and overall performance metrics. From the dimensionality reduction perspective: It is different from Principle Component Analysis which guarantees finding the best linear transformation that reduces the number of dimensions with a minimum loss of information. Using clusters as a technique of reducing the data dimension will lose a lot of information since clustering techniques are based a metric of 'distance'. At high dimensions euclidean distance loses pretty much all meaning. Therefore using clustering as a "Reducing" dimensionality by mapping data points to cluster numbers is not always good since you may lose almost all the information. From the creating new features perspective: Clustering analysis creates labels based on the patterns of the data, it brings uncertainties into the data. By using clustering prior to classification, the decision on the number of clusters will highly affect the performance of the clustering, then affect the performance of classification. If the part of features we use clustering techniques on is very suited for it, it might increase the overall performance on classification. For example, if the features we use k-means on are numerical and the dimension is small, the overall classification performance may be better. We did not lock in the clustering outputs using a random_state in the effort to see if they were stable. Our assumption was that if the results vary highly from run to run which they definitely did, maybe the data just does not cluster well with the methods selected at all. Basically, the ramification we saw was that our results are not much better than random when applying clustering to the data preprocessing. Finally, it is important to ensure a feedback loop is in place to continuously collect the same data in the same format from which the models were created. This feedback loop can be used to measure the model real world effectiveness and also to continue to revise the models from time to time as things change.
Water Meter Jbktv 7vz5k Ftoz Dataset
universe.roboflow.com
zip
Updated Mar 13, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Roboflow 20-VL (2025). Water Meter Jbktv 7vz5k Ftoz Dataset [Dataset]. https://universe.roboflow.com/rf20-vl/water-meter-jbktv-7vz5k-ftoz
Explore at:
zipAvailable download formats
Dataset updated
Mar 13, 2025
Dataset provided by
Roboflow
Authors
Roboflow 20-VL
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Variables measured
Water Meter Jbktv 7vz5k Ftoz Ftoz Bounding Boxes
Description
Overview

Introduction

Object Classes

0

1

2

3

4

5

6

7

8

9

0: The digit 0

1: The digit 1

2: The digit 2

3: The digit 3

4: The digit 4

5: The digit 5

6: The digit 6

7: The digit 7

8: The digit 8

9: The digit 9

Introduction

This dataset contains images of water meter readings with the purpose of digitizing the numeric values. There are 10 classes representing the digits 0 through 9. Annotators will label the digits as they appear on the meters to facilitate accurate recognition.

Object Classes

0

Description

The digit "0" is characterized by its oval or circular shape, often with a distinctive horizontal thickness.

Instructions

Annotate the entire visible area of the digit, ensuring the bounding box captures the full curvature without cutting into adjacent digits.

Do not label if only partially visible or distorted beyond recognition.

1

Description

The digit "1" typically appears as a straight vertical line, sometimes with a short horizontal base.

Instructions

Draw a bounding box around the full length of the digit, including any visible base. Ensure clear vertical alignment, and avoid labeling if obscured by glare or shadows.

2

Description

The digit "2" has a curved top and straight middle section, finishing with a horizontal or diagonal stroke at the base.

Instructions

Ensure that the bounding box covers the entire silhouette, from the curved top to the base.

Avoid labeling if any section is visibly missing or incomplete.

3

Description

The digit "3" is identified by two stacked curved sections without intersecting lines.

Instructions

Capture both curvatures completely within the bounding box.

Ensure that no parts are obscured by reflections or meter shadows before labeling.

4

Description

The digit "4" often features intersecting horizontal and vertical lines with a triangle-like top section.

Instructions

Include the intersecting lines and full top section within the bounding box.

Avoid labeling if structural lines are affected by wear or visibility issues.

5

Description

The digit "5" combines a prominent upper loop with a lower horizontal stroke and a straight vertical line.

Instructions

Enclose the loop and strokes fully within the bounding box.

Ensure separation from neighboring digits, and do not label if substantial parts are unclear or missing.

6

Description

The digit "6" features a closed top loop with an extended lower curve that continues downward.

Instructions

Draw the bounding box to encompass both the loop and the extended curve, ensuring clarity and complete visibility.

Avoid labeling partially obstructed digits.

7

Description

The digit "7" is characterized by a horizontal top line connecting to a diagonal downward stroke.

Instructions

Include the horizontal and diagonal lines entirely in the bounding box.

Check for clarity and distinction from shadows before labeling.

8

Description

The digit "8" resembles two stacked circles or loops, one above the other.

Instructions

Ensure the bounding box captures both loops.

Confirm that the digit is not overlapping with others and is entirely visible before labeling.

9

Description

The digit "9" starts with a circular or elliptical loop at the top, leading into a straight downward stroke.

Instructions

The bounding box should cover the complete loop and the straight line.

Do not annotate if any parts are obscured or not clearly discernible.
P
Real Estate Price Prediction Dataset
paperswithcode.com
Updated Mar 7, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2025). Real Estate Price Prediction Dataset [Dataset]. https://paperswithcode.com/dataset/real-estate-price-prediction
Explore at:
Dataset updated
Mar 7, 2025
Description
Problem Statement

👉 Download the case studies here

Investors and buyers in the real estate market faced challenges in accurately assessing property values and market trends. Traditional valuation methods were time-consuming and lacked precision, making it difficult to make informed investment decisions. A real estate firm sought a predictive analytics solution to provide accurate property price forecasts and market insights.

Challenge

Developing a real estate price prediction system involved addressing the following challenges:

Collecting and processing vast amounts of data, including historical property prices, economic indicators, and location-specific factors.

Accounting for diverse variables such as neighborhood quality, proximity to amenities, and market demand.

Ensuring the model’s adaptability to changing market conditions and economic fluctuations.

Solution Provided

A real estate price prediction system was developed using machine learning regression models and big data analytics. The solution was designed to:

Analyze historical and real-time data to predict property prices accurately.

Provide actionable insights on market trends, enabling better investment strategies.

Identify undervalued properties and potential growth areas for investors.

Development Steps

Data Collection

Collected extensive datasets, including property listings, sales records, demographic data, and economic indicators.

Preprocessing

Cleaned and structured data, removing inconsistencies and normalizing variables such as location, property type, and size.

Model Development

Built regression models using techniques such as linear regression, decision trees, and gradient boosting to predict property prices. Integrated feature engineering to account for location-specific factors, amenities, and market trends.

Validation

Tested the models using historical data and cross-validation to ensure high prediction accuracy and robustness.

Deployment

Implemented the prediction system as a web-based platform, allowing users to input property details and receive price estimates and market insights.

Continuous Monitoring & Improvement

Established a feedback loop to update models with new data and refine predictions as market conditions evolved.

Results

Increased Prediction Accuracy

The system delivered highly accurate property price forecasts, improving investor confidence and decision-making.

Informed Investment Decisions

Investors and buyers gained valuable insights into market trends and property values, enabling better strategies and reduced risks.

Enhanced Market Insights

The platform provided detailed analytics on neighborhood trends, demand patterns, and growth potential, helping users identify opportunities.

Scalable Solution

The system scaled seamlessly to include new locations, property types, and market dynamics.

Improved User Experience

The intuitive platform design made it easy for users to access predictions and insights, boosting engagement and satisfaction.
GIOTTO RADIO SCIENCE EXPERIMENT DATA V1.0
data.nasa.gov
datasets.ai
+3more
application/rdfxml +5
Updated Jun 26, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2018). GIOTTO RADIO SCIENCE EXPERIMENT DATA V1.0 [Dataset]. https://data.nasa.gov/Earth-Science/GIOTTO-RADIO-SCIENCE-EXPERIMENT-DATA-V1-0/tyz9-5y3g
Explore at:
csv, application/rdfxml, xml, tsv, json, application/rssxmlAvailable download formats
Dataset updated
Jun 26, 2018
License
U.S. Government Workshttps://www.usa.gov/government-works
License information was derived automatically
Description
The Giotto Radio Science Experiment data set consists of four tables. Each table contains a measurement value listed as a function of time. The measurements are: closed-loop receiver carrier signal amplitude, closed-loop receiver carrier frequency residual, open-loop receiver carrier signal amplitude, and open-loop receiver carrier frequency.
Digital Geologic-GIS Map of The Loop and Druid Arch Quadrangles, Utah (NPS,...
s.cnmilf.com
catalog.data.gov
Updated Mar 11, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
National Park Service (2025). Digital Geologic-GIS Map of The Loop and Druid Arch Quadrangles, Utah (NPS, GRD, GRI, CANY, THDR digital map) adapted from a U.S. Geological Survey Miscellaneous Field Studies Map by Billingsley, Block and Felger (2002) [Dataset]. https://s.cnmilf.com/user74170196/https/catalog.data.gov/dataset/digital-geologic-gis-map-of-the-loop-and-druid-arch-quadrangles-utah-nps-grd-gri-cany-thdr
Explore at:
Dataset updated
Mar 11, 2025
Dataset provided by
National Park Servicehttp://www.nps.gov/
Area covered
Utah
Description
The Digital Geologic-GIS Map of The Loop and Druid Arch Quadrangles, Utah is composed of GIS data layers and GIS tables, and is available in the following GRI-supported GIS data formats: 1.) an ESRI file geodatabase (thdr_geology.gdb), a 2.) Open Geospatial Consortium (OGC) geopackage, and 3.) 2.2 KMZ/KML file for use in Google Earth, however, this format version of the map is limited in data layers presented and in access to GRI ancillary table information. The file geodatabase format is supported with a 1.) ArcGIS Pro map file (.mapx) file (thdr_geology.mapx) and individual Pro layer (.lyrx) files (for each GIS data layer). The OGC geopackage is supported with a QGIS project (.qgz) file. Upon request, the GIS data is also available in ESRI shapefile format. Contact Stephanie O'Meara (see contact information below) to acquire the GIS data in these GIS data formats. In addition to the GIS data and supporting GIS files, three additional files comprise a GRI digital geologic-GIS dataset or map: 1.) a readme file (cany_geology_gis_readme.pdf), 2.) the GRI ancillary map information document (.pdf) file (cany_geology.pdf) which contains geologic unit descriptions, as well as other ancillary map information and graphics from the source map(s) used by the GRI in the production of the GRI digital geologic-GIS data for the park, and 3.) a user-friendly FAQ PDF version of the metadata (thdr_geology_metadata_faq.pdf). Please read the cany_geology_gis_readme.pdf for information pertaining to the proper extraction of the GIS data and other map files. Google Earth software is available for free at: https://www.google.com/earth/versions/. QGIS software is available for free at: https://www.qgis.org/en/site/. Users are encouraged to only use the Google Earth data for basic visualization, and to use the GIS data for any type of data analysis or investigation. The data were completed as a component of the Geologic Resources Inventory (GRI) program, a National Park Service (NPS) Inventory and Monitoring (I&M) Division funded program that is administered by the NPS Geologic Resources Division (GRD). For a complete listing of GRI products visit the GRI publications webpage: https://www.nps.gov/subjects/geology/geologic-resources-inventory-products.htm. For more information about the Geologic Resources Inventory Program visit the GRI webpage: https://www.nps.gov/subjects/geology/gri.htm. At the bottom of that webpage is a "Contact Us" link if you need additional information. You may also directly contact the program coordinator, Jason Kenworthy (jason_kenworthy@nps.gov). Source geologic maps and data used to complete this GRI digital dataset were provided by the following: U.S. Geological Survey. Detailed information concerning the sources used and their contribution the GRI product are listed in the Source Citation section(s) of this metadata record (thdr_geology_metadata.txt or thdr_geology_metadata_faq.pdf). Users of this data are cautioned about the locational accuracy of features within this dataset. Based on the source map scale of 1:24,000 and United States National Map Accuracy Standards features are within (horizontally) 12.2 meters or 40 feet of their actual _location as presented by this dataset. Users of this data should thus not assume the _location of features is exactly where they are portrayed in Google Earth, ArcGIS Pro, QGIS or other software used to display this dataset. All GIS and ancillary tables were produced as per the NPS GRI Geology-GIS Geodatabase Data Model v. 2.3. (available at: https://www.nps.gov/articles/gri-geodatabase-model.htm).
u
UGOS Drifter Dataset for ID # 09630 - Dataset - UGOS
ugos.info
Updated Nov 12, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). UGOS Drifter Dataset for ID # 09630 - Dataset - UGOS [Dataset]. https://ugos.info/dataset/ugos-drifter-dataset-09630
Explore at:
Dataset updated
Nov 12, 2024
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
This dataset contains surface current data collected from UGOS MASTR Drifters (Far Horizon Drifters) deployed in the Gulf. They were specifically positioned to capture dynamic features such as the Loop Current, Dry Tortugas Eddy, Florida Current, Gulf Stream, and Eddy Denali. The dataset includes quality-controlled, hourly measurements intended for model assimilation and validation. Data covers the temporal range from January to July 2024.
Z
ESM-2 embeddings for TCR-Epitope Binding Affinity Prediction Task
data.niaid.nih.gov
zenodo.org
Updated Jun 17, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Tony Reina (2024). ESM-2 embeddings for TCR-Epitope Binding Affinity Prediction Task [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7502653
Explore at:
Dataset updated
Jun 17, 2024
Dataset authored and provided by
Tony Reina
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This is the accompanying dataset that was generated by the GitHub project: https://github.com/tonyreina/tdc-tcr-epitope-antibody-binding. In that repository I show how to create a machine learning models for predicting if a T-cell receptor (TCR) and protein epitope will bind to each other.

A model that can predict how well a TCR bindings to an epitope can lead to more effective treatments that use immunotherapy. For example, in anti-cancer therapies it is important for the T-cell receptor to bind to the protein marker in the cancer cell so that the T-cell (actually the T-cell's friends in the immune system) can kill the cancer cell.

HuggingFace provides a "one-stop shop" to train and deploy AI models. In this case, we use Facebook's open-source Evolutionary Scale Model (ESM-2). These embeddings turn the protein sequences into a vector of numbers that the computer can use in a mathematical model.

To load them into Python use the Pandas library:

import pandas as pd

train_data = pd.read_pickle("train_data.pkl") validation_data = pd.read_pickle("validation_data.pkl") test_data = pd.read_pickle("test_data.pkl")

The epitope_aa and the tcr_full columns are the protein (peptide) sequences for the epitope and the T-cell receptor, respectively. The letters correspond to the standard amino acid codes.

The epitope_smi column is the SMILES notation for the chemical structure of the epitope. We won't use this information. Instead, the ESM-1b embedder should be sufficient for the input to our binary classification model.

The tcr column is the CDR3 hyperloop. It's the part of the TCR that actually binds (assuming it binds) to the epitope.

The label column is whether the two proteins bind. 0 = No. 1 = Yes.

The tcr_vector and epitope_vector columns are the bio-embeddings of the TCR and epitope sequences generated by the Facebook ESM-1b model. These two vectors can be used to create a machine learning model that predicts whether the combination will produce a successful protein binding.

From the TDC website:

T-cells are an integral part of the adaptive immune system, whose survival, proliferation, activation and function are all governed by the interaction of their T-cell receptor (TCR) with immunogenic peptides (epitopes). A large repertoire of T-cell receptors with different specificity is needed to provide protection against a wide range of pathogens. This new task aims to predict the binding affinity given a pair of TCR sequence and epitope sequence.

Weber et al.

Dataset Description: The dataset is from Weber et al. who assemble a large and diverse data from the VDJ database and ImmuneCODE project. It uses human TCR-beta chain sequences. Since this dataset is highly imbalanced, the authors exclude epitopes with less than 15 associated TCR sequences and downsample to a limit of 400 TCRs per epitope. The dataset contains amino acid sequences either for the entire TCR or only for the hypervariable CDR3 loop. Epitopes are available as amino acid sequences. Since Weber et al. proposed to represent the peptides as SMILES strings (which reformulates the problem to protein-ligand binding prediction) the SMILES strings of the epitopes are also included. 50% negative samples were generated by shuffling the pairs, i.e. associating TCR sequences with epitopes they have not been shown to bind.

Task Description: Binary classification. Given the epitope (a peptide, either represented as amino acid sequence or as SMILES) and a T-cell receptor (amino acid sequence, either of the full protein complex or only of the hypervariable CDR3 loop), predict whether the epitope binds to the TCR.

Dataset Statistics: 47,182 TCR-Epitope pairs between 192 epitopes and 23,139 TCRs.

References:

Weber, Anna, Jannis Born, and María Rodriguez Martínez. “TITAN: T-cell receptor specificity prediction with bimodal attention networks.” Bioinformatics 37.Supplement_1 (2021): i237-i244.

Bagaev, Dmitry V., et al. “VDJdb in 2019: database extension, new analysis infrastructure and a T-cell receptor motif compendium.” Nucleic Acids Research 48.D1 (2020): D1057-D1062.

Dines, Jennifer N., et al. “The immunerace study: A prospective multicohort study of immune response action to covid-19 events with the immunecode™ open access database.” medRxiv (2020).

Dataset License: CC BY 4.0.

Contributed by: Anna Weber and Jannis Born.

The Facebook ESM-2 model has the MIT license and was published in:

Zeming Lin et al, Evolutionary-scale prediction of atomic-level protein structure with a language model, Science (2023). DOI: 10.1126/science.ade2574 https://www.science.org/doi/10.1126/science.ade2574

HuggingFace has several versions of the trained model.

Checkpoint name Number of layers Number of parameters

esm2_t48_15B_UR50D 48 15B

esm2_t36_3B_UR50D 36 3B

esm2_t33_650M_UR50D 33 650M

esm2_t30_150M_UR50D 30 150M

esm2_t12_35M_UR50D 12 35M

esm2_t6_8M_UR50D 6 8M
d
Data from: SNARE chaperone Sly1 directly mediates close-range vesicle...
datadryad.org
zip
Updated Feb 27, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Rachael Plemel; Alex Merz; Mengtong Duan; Elizabeth Miller (2024). SNARE chaperone Sly1 directly mediates close-range vesicle tethering [Dataset]. http://doi.org/10.5061/dryad.dr7sqvb5b
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5061/dryad.dr7sqvb5b
Dataset updated
Feb 27, 2024
Dataset provided by
Dryad
Authors
Rachael Plemel; Alex Merz; Mengtong Duan; Elizabeth Miller
Time period covered
2023
Description
Yeast Sly1 SGA, BioGrid, and Gene Ontology Supplemental Dataset

https://doi.org/10.5061/dryad.dr7sqvb5b

Description of the data and file structure:

To gain genome-scale insight into the sly1∆loop allele’s loss of function, we used synthetic genome array (SGA) analysis. SGA measures the synthetic sickness or rescue (suppression) of a query allele versus a genome-scale collection of loss-of-function alleles (Tong and Boone, 2005). The sly1∆loop allele was knocked into the genomic SLY1 locus. The SGA data from this analysis was then aligned with the BioGRID dataset.

Contents of the dataset:

sly1∆loop SGA contains the raw SGA dataset. The SGA score algorithm processes raw colony size data, normalizes them for a series of experimental systematic effects and calculates a quantitative genetic interaction score.

LogRatios indicates the log-transformed ratio of the growth of the indicated double mutant to the growth of the single mutant with the indicated quer...
P
Data from: Quality Control in Pharmaceuticals Dataset
paperswithcode.com
Updated Mar 7, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2025). Quality Control in Pharmaceuticals Dataset [Dataset]. https://paperswithcode.com/dataset/quality-control-in-pharmaceuticals
Explore at:
Dataset updated
Mar 7, 2025
Description
Problem Statement

👉 Download the case studies here

A pharmaceutical manufacturer faced significant challenges in ensuring consistent quality during the production of medications. Manual quality control processes were prone to errors and inefficiencies, leading to product recalls and compliance risks. The company needed an advanced solution to automate quality control, reduce production errors, and comply with stringent regulatory standards.

Challenge

Implementing automated quality control in pharmaceutical manufacturing posed several challenges:

Detecting microscopic defects, contamination, or irregularities in products and packaging.

Ensuring high-speed inspection without disrupting production workflows.

Meeting strict industry regulations for product quality and traceability.

Solution Provided

An AI-powered quality control system was developed using machine vision and advanced inspection algorithms. The solution was designed to:

Automatically inspect pharmaceutical products for defects, contamination, and compliance with production standards.

Analyze packaging integrity to detect labeling errors, seal defects, or missing components.

Provide real-time quality control insights to production teams for immediate corrective actions.

Development Steps

Data Collection

Captured high-resolution images and videos of pharmaceutical products during production, including tablets, capsules, and packaging components.

Preprocessing

Preprocessed visual data to enhance features such as shape, texture, and color, enabling accurate defect detection.

Model Training

Developed machine vision models to detect defects and anomalies at microscopic levels. Integrated AI algorithms to classify defects and provide actionable insights for process improvement.

Validation

Tested the system on a variety of production scenarios to ensure high accuracy and reliability in defect detection.

Deployment

Installed AI-powered inspection systems on production lines, integrating them with existing manufacturing processes and quality control frameworks.

Continuous Monitoring & Improvement

Established a feedback loop to refine models based on new production data and evolving quality standards.
f
Data from: S1 Dataset -
plos.figshare.com
xlsx
Updated Jan 29, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yuh-Chin T. Huang; Luke Henriquez; Hengji Chen; Craig Henriquez (2024). S1 Dataset - [Dataset]. http://doi.org/10.1371/journal.pone.0297519.s001
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0297519.s001
Dataset updated
Jan 29, 2024
Dataset provided by
PLOS ONE
Authors
Yuh-Chin T. Huang; Luke Henriquez; Hengji Chen; Craig Henriquez
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Pulmonary function tests (PFTs) are usually interpreted by clinicians using rule-based strategies and pattern recognition. The interpretation, however, has variabilities due to patient and interpreter errors. Most PFTs have recognizable patterns that can be categorized into specific physiological defects. In this study, we developed a computerized algorithm using the python package (pdfplumber) and validated against clinicians’ interpretation. We downloaded PFT reports in the electronic medical record system that were in PDF format. We digitized the flow volume loop (FVL) and extracted numeric values from the reports. The algorithm used FEV1/FVC
g
HystLab Software v1.1.1 (NERC Grant NE/P017266/1) | gimi9.com
gimi9.com
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
HystLab Software v1.1.1 (NERC Grant NE/P017266/1) | gimi9.com [Dataset]. https://www.gimi9.com/dataset/uk_hystlab-software-v1-1-1-nerc-grant-ne-p017266-1/
Explore at:
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
HystLab (Hysteresis Loop analysis box), is MATLAB based software for the advanced processing and analysis of magnetic hysteresis data. Hysteresis loops are one of the most ubiquitous rock magnetic measurements and with the growing need for high resolution analyses of ever larger datasets, there is a need to rapidly, consistently, and accurately process and analyze these data. HystLab is an easy to use graphical interface that is compatible with a wide range of software platforms. The software can read a wide range of data formats and rapidly process the data. It includes functionality to re-center loops, correction for drift, and perform a range of slope saturation corrections.

Facebook

Twitter

Click to copy link

Link copied

Cite

Antonio Ramires; Frederic Font; Dmitry Bogdanov; Jordan B. L. Smith; Yi-Hsuan Yang; Joann Ching; Bo-Yu Chen; Yueh-Kao Wu; Hsu Wei-Han; Xavier Serra; Antonio Ramires; Frederic Font; Dmitry Bogdanov; Jordan B. L. Smith; Yi-Hsuan Yang; Joann Ching; Bo-Yu Chen; Yueh-Kao Wu; Hsu Wei-Han; Xavier Serra (2020). Freesound Loop Dataset [Dataset]. http://doi.org/10.5281/zenodo.3967852

Freesound Loop Dataset

Explore at:

32 scholarly articles cite this dataset (View in Google Scholar)

bin, zipAvailable download formats

Unique identifier

https://doi.org/10.5281/zenodo.3967852

Dataset updated

Jul 31, 2020

Dataset provided by

Zenodohttp://zenodo.org/

Authors

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Freesound Loop Dataset

This dataset contains 9,455 loops from Freesound.org and the corresponding annotations. These loops have tempo, key, genre and instrumentation annotation.

Dataset Construction

To collect this dataset, the following steps were performed:

Freesound was queried with "loop" and "bpm", so as to collect loops which have a beats-per-minute(BPM) annotations.
The sounds were analysed with AudioCommons extractor, so as to obtain key information.
The textual metadata of each sound was analysed, to obtain the BPM proposed by the user, and to obtain genre information.
Annotators used a web interface to annotate around 3,000 loops.

Dataset Organisation

The dataset contains two folders and two files in the root directory:

'FSL10K' encloses the audio files and their metadata and analysis. The audios are in the 'audio' folder and are named '
'annotations' holds the expert provided annotation for the sounds in the dataset. The annotations are separated in a folder for each annotator and each annotation is stored as a .json file, named 'sound-

Licenses

All the sounds have some kind of Creative Commons license. The license of each sound in the dataset can be obtained from the 'FSL10K/metadata.json' file

Authors and Contact

This dataset was developed by António Ramires et. al.

Any questions related to this dataset please contact:

António Ramires

antonio.ramires@upf.edu

aframires@gmail.com

References

Please cite this paper if you use this dataset:

@inproceedings{ramires2020, author = "Antonio Ramires and Frederic Font and Dmitry Bogdanov and Jordan B. L. Smith and Yi-Hsuan Yang and Joann Ching and Bo-Yu Chen and Yueh-Kao Wu and Hsu Wei-Han and Xavier Serra", title = "The Freesound Loop Dataset and Annotation Tool", booktitle = "Proc. of the 21st International Society for Music Information Retrieval (ISMIR)", year = "2020" }

Acknowledgements

This work has received funding from the European Union's Horizon 2020 research and innovation programme under the Marie Skłodowska-Curie grant agreement No. 765068 (MIP-Frontiers).

Clear search

Close search

Google apps

Main menu

Freesound Loop Dataset

SymbioLCD - Datasets - Dataset - data.govt.nz - discover and use data

Free Recall with Closed-Loop Stimulation at Encoding (Encoding Classifier)

Free Recall with Closed-Loop Stimulation at Encoding (Encoding Classifier)

Description

Classifier Details

Closed-Loop Procedure

To Note

Contact

Simple download service (Atom) of the dataset: Features of the Poses Loop...

A benchmark dataset of Solidity smart contracts

Seattle 20 Second Freeway

CISC-LIVE-LAB-3/dataset: v1.0.2

Human-in-the-Loop Decision Support in Process Control Rooms Dataset

Overview

Purpose

Key Features

Potential Applications

Usage

Data Structure

<a

Water Meter Jbktv 7vz5k Axdg Dataset

Overview

Introduction

Object Classes

Number

Description

Instructions

0

Description

Instructions

1

Description

Instructions

2

Description

Instructions

3

Description

Instructions

4

Description

Instructions

5

Description

Instructions

6

Description

Instructions

7

Description

Instructions

8

Description

Instructions

9

Description

Instructions

GulfFlow: A gridded surface current product for the Gulf of Mexico from...

Educational Attainment in North Carolina Public Schools: Use of statistical...

Water Meter Jbktv 7vz5k Ftoz Dataset

Overview

Introduction

Object Classes

0

Description

Instructions

1

Description

Instructions

2

Description

Instructions

3

Description

Instructions

4

Description

Instructions

5