Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Dataset Card for HQ-EDIT
HQ-Edit, a high-quality instruction-based image editing dataset with total 197,350 edits. Unlike prior approaches relying on attribute guidance or human feedback on building datasets, we devise a scalable data collection pipeline leveraging advanced foundation models, namely GPT-4V and DALL-E 3. HQ-Editβs high-resolution images, rich in detail and accompanied by comprehensive editing prompts, substantially enhance the capabilities of existing image editingβ¦ See the full description on the dataset page: https://huggingface.co/datasets/UCSC-VLAA/HQ-Edit-data-demo.
Comprehensive dataset of 1,690 Video editing services in Brazil as of July, 2025. Includes verified contact information (email, phone), geocoded addresses, customer ratings, reviews, business categories, and operational details. Perfect for market research, lead generation, competitive analysis, and business intelligence. Download a complimentary sample to evaluate data quality and completeness.
https://lindat.mff.cuni.cz/repository/xmlui/page/licence-TAUS_QT21https://lindat.mff.cuni.cz/repository/xmlui/page/licence-TAUS_QT21
Human post-edited test sentences for the WMT 2017 Automatic post-editing task. This consists in 2,000 German sentences belonging to the IT domain and already tokenized. Source and target segments can be downloaded from: https://lindat.mff.cuni.cz/repository/xmlui/handle/11372/LRT-2133. All data is provided by the EU project QT21 (http://www.qt21.eu/).
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
The PEΒ²rr corpus contains source language texts from different domains along with their automatically generated translations into several morphologically rich languages, their post-edited versions, and error annotations of the performed post-edit operations. The main advantage of the corpus is the fusion of post-editing and error classification tasks, which have usually been seen as two independent tasks, although naturally they are not.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Python3 pickled dictionaries. Keys are target site names, values are pandas dataframes where each row is a unique editing outcome, and there is a column for each substrate nucleotide and a frequency column.
Comprehensive dataset of 25 Video editing services in Rhode Island, United States as of July, 2025. Includes verified contact information (email, phone), geocoded addresses, customer ratings, reviews, business categories, and operational details. Perfect for market research, lead generation, competitive analysis, and business intelligence. Download a complimentary sample to evaluate data quality and completeness.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Machine learning (ML) developers frequently use interactive computational notebooks, such as Jupyter notebooks, to host code for data processing and model training. Notebooks provide a convenient tool for writing ML pipelines and interactively observing outputs. However, maintaining notebooks, e.g., to add new features or fix bugs, can be challenging due to the length and complexity of the ML pipeline code. Moreover, there is no existing benchmark related to developer edits on notebooks.
In this paper, we present early results of the first study on learning to edit ML pipeline code in notebooks using large language models (LLMs). We collect the first dataset of 48,398 notebook edits derived from 20,095 revisions of 792 ML-related GitHub repositories. Our dataset captures granular details of file-level and cell-level modifications, offering a foundation for understanding real-world maintenance patterns in ML pipelines. We observe that the edits on notebooks are highly localized. Although LLMs have been shown to be effective on general-purpose code generation and editing, our results reveal that the same LLMs, even after finetuning, have low accuracy on notebook editing, demonstrating the complexity of real-world ML pipeline maintenance tasks. Our findings emphasize the critical role of contextual information in improving model performance and point toward promising avenues for advancing LLMs' capabilities in engineering ML code.
Comprehensive dataset of 30 Video editing services in New Mexico, United States as of July, 2025. Includes verified contact information (email, phone), geocoded addresses, customer ratings, reviews, business categories, and operational details. Perfect for market research, lead generation, competitive analysis, and business intelligence. Download a complimentary sample to evaluate data quality and completeness.
This dataset tracks the updates made on the dataset "NCCI Procedure to Procedure Edits (PTP) Quarter Beginning 01/01/2018" as a repository for previous versions of the data and metadata.
Comprehensive dataset of 2 Video editing services in Phitsanulok, Thailand as of June, 2025. Includes verified contact information (email, phone), geocoded addresses, customer ratings, reviews, business categories, and operational details. Perfect for market research, lead generation, competitive analysis, and business intelligence. Download a complimentary sample to evaluate data quality and completeness.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
CSVs containing designed sgRNA-target sites and base editing outcomes across multiple replicates.
This dataset tracks the updates made on the dataset "NCCI Procedure to Procedure Edits (PTP) Quarter Beginning 04/01/2020" as a repository for previous versions of the data and metadata.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
The Institutional Cost Report (ICR) is a uniform report completed by New York hospitals to report income, expenses, assets, liabilities, and statistics to the Department of Health (DOH). Under DOH regulations, (Part 86-1.2), Article 28 hospitals are required to file financial and statistical data with DOH annually. The data filed is part of the ICR and is received electronically through a secured network. This data is used to develop Medicaid rates, assist in the formulation of reimbursement methodologies, and analyze trends. This dataset includes the print image of the edited data. The ICR is a comprehensive compilation of exhibits that have been modified over time that users should consider when using the ICR dataset. It is possible that data is updated subsequent to posting on this website; therefore the data could become obsolete. To get the details related to the exhibits and data elements, please refer to the blank ICR form, the ICR Table of Contents, the ICR Instructions and the Glossary of Terms, Acronyms, and Abbreviations which are in the Supporting Information section of this site. The data posted as edited contains desk edit adjustments by DOH personnel. In 2009, this information was not audited; however effective with the 2010 ICR, all ICRs will be audited by a Certified Public Accounting Firm annually.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This is part of the dataset we curated based on VCTK to study partial speech deepfake detection in the era of neural speech editing. For more details, please refer to our Interspeech 2025 paper: "PartialEdit: Identifying Partial Deepfakes in the Era of Neural Speech Editing".
In the paper, we curated four subsets: E1: VoiceCraft, E2: SSR-Speech, E3: Audiobox-Speech, and E4: Audiobox. Adhering to Audiobox's license, we cannot release the E3 and E4 subsets.
The folder structure is as follows:
PartialEdit/
βββ PartialEdit_E1E2.csv
βββ E1/
β βββ p225/
β β βββ p225_001_edited_partial_16k.wav
β β βββ p225_002_edited_partial_16k.wav
β β βββ ...
β βββ p231/
β β βββ p231_001_edited_partial_16k.wav
β β βββ p231_002_edited_partial_16k.wav
β β βββ ...
β βββ ...
βββ E1-Codec/
β βββ (same structure as E1)
βββ E2/
β βββ (same structure as E1)
βββ E2-Codec/
β βββ (same structure as E1)
βββ modified_txt/
βββ p225/
β βββ p225_001_modified.txt
β βββ p225_002_modified.txt
β βββ p225_003_modified.txt
β βββ ...
βββ p231/
β βββ p231_001_modified.txt
β βββ p231_002_modified.txt
β βββ ...
βββ ...
This is version 1.0, and we will include links to the paper and demo page soon.
The `PartialEdit_E1E2.csv` file contains information about the edited regions in each audio file. Each row represents the following columns:
- `filename`: The name of the audio file.
- `start of the edited region (s)`: The starting time (in seconds) of the first edited region.
- `end of the edited region (s)`: The ending time (in seconds) of the first edited region.
- `total duration (s)`: The total duration (in seconds) of the audio file.
If there are two edited regions within a file, the row format expands to include:
- `filename`: The name of the audio file.
- `start of the edited region (s)`: The starting time (in seconds) of the first edited region.
- `end of the edited region (s)`: The ending time (in seconds) of the first edited region.
- `start of the second edited region (s)`: The starting time (in seconds) of the second edited region.
- `end of the second edited region (s)`: The ending time (in seconds) of the second edited region.
- `total duration (s)`: The total duration (in seconds) of the audio file.
To make sure the download is complete, you can check the MD5 code with the following command:
md5sum *
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
OmniEdit
In this paper, we present OMNI-EDIT, which is an omnipotent editor to handle seven different image editing tasks with any aspect ratio seamlessly. Our contribution is in four folds: (1) OMNI-EDIT is trained by utilizing the supervision from seven different specialist models to ensure task coverage. (2) we utilize importance sampling based on the scores provided by large multimodal models (like GPT-4o) instead of CLIP-score to improve the data quality. πPaper | πWebsite |β¦ See the full description on the dataset page: https://huggingface.co/datasets/TIGER-Lab/OmniEdit-Filtered-1.2M.
https://lindat.mff.cuni.cz/repository/xmlui/page/licence-TAUS_QT21https://lindat.mff.cuni.cz/repository/xmlui/page/licence-TAUS_QT21
Human post-edited test sentences for the WMT 2017 Automatic post-editing task. This consists in 2,000 English sentences belonging to the IT domain and already tokenized. Source and target segments can be downloaded from: https://lindat.mff.cuni.cz/repository/xmlui/handle/11372/LRT-2132. All data is provided by the EU project QT21 (http://www.qt21.eu/).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Epigenetic modifications have gained attention since they can be potentially changed with environmental stimuli and can be associated with adverse health outcomes. Epitranscriptome field has begun to attract attention with several aspects since RNA modifications have been linked with critical biological processes and implicated in diseases. Several RNA modifications have been identified as reversible indicating the dynamic features of modification which can be altered by environmental cues. Currently, we know more than 150 RNA modifications in different organisms and on different bases which are modified by various chemical groups. RNA editing, which is one of the RNA modifications, occurs after transcription, which results in RNA sequence different from its corresponding DNA sequence. Emerging evidence reveals the functions of RNA editing as well as the association between RNA editing and diseases. However, the RNA editing field is beginning to grow up and needs more empirical evidence in regard to disease and toxicology. Thus, this review aims to provide the current evidence-based studies on RNA editing modifying genes for genotoxicity and cancer. The review presented the association between environmental xenobiotics exposure and RNA editing modifying genes and focused on the association between the expression of RNA editing modifying genes and cancer. Furthermore, we discussed the future directions of scientific studies in the area of RNA modifications, especially in the RNA editing field, and provided a knowledge-based framework for further studies.
Supplementary File 1A text file in a fasta format with the constructed squid coding sequences.EisenbergSI_Data1.txtSupplementary File 2A spreadsheet with all the A-to-G modification sites detected in the coding regions of the squid, along with their number of supporting reads in all the tissues studied.EisenbergSI_Table1.xlsx
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
filename is formated as : CellType_Repeat_Coding.h5ad
h5ad with
event frequency stored in adata.X event description saved in adata.var meta data for each Guide-Target pair saved in adata.obs
To download and unzip the file
copy the link address within terminal : wget --no-check-certificate LINK within terminal : mv 2 2.zip unzip 2.zip
Comprehensive dataset of 327 Video editing services in Illinois, United States as of July, 2025. Includes verified contact information (email, phone), geocoded addresses, customer ratings, reviews, business categories, and operational details. Perfect for market research, lead generation, competitive analysis, and business intelligence. Download a complimentary sample to evaluate data quality and completeness.
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Dataset Card for HQ-EDIT
HQ-Edit, a high-quality instruction-based image editing dataset with total 197,350 edits. Unlike prior approaches relying on attribute guidance or human feedback on building datasets, we devise a scalable data collection pipeline leveraging advanced foundation models, namely GPT-4V and DALL-E 3. HQ-Editβs high-resolution images, rich in detail and accompanied by comprehensive editing prompts, substantially enhance the capabilities of existing image editingβ¦ See the full description on the dataset page: https://huggingface.co/datasets/UCSC-VLAA/HQ-Edit-data-demo.