Facebook
TwitterAttribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
VIVID-10M
[project page] | [Paper] | [arXiv] VIVID-10M is the first large-scale hybrid image-video local editing dataset aimed at reducing data construction and model training costs, comprising 9.7M samples that encompass a wide range of video editing tasks.
Data Index
The data index is located at four .csv files: vivid-image-change.csv vivid-image-remove.csv vivid-video-change.csv vivid-video-remove.csv
VIVID-Video splits contains the columns: local_caption, #… See the full description on the dataset page: https://huggingface.co/datasets/KlingTeam/VIVID-10M.
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
MGen Dataset: Millions of Generics in Context
This repository hosts the MGen dataset: a collection of millions of naturally occuring generics and quantified sentences in context. This dataset is designed to be a resource for the empirical study of generic sentences. Find the details of the construction in the SCiL 2025 paper or the blogpost.
Data Structure
The dataset is released in several .csv files.
sentences.csv This is the main file, containing only the… See the full description on the dataset page: https://huggingface.co/datasets/ilyocoris/MGen.
Facebook
TwitterThis data release contains digital data generated by the U.S. Geological Survey under cooperative agreements with Sonoma County Water Agency and the California State Water Resources Control Board to characterize the three-dimensional hydrogeology and water quality of the Russian River Watershed, located in the northern part of the California Coast Ranges section of the Pacific Border province. This dataset contains borehole lithologic and hydrologic data used to support development of the framework model . The borehole dataset is released as a series of .csv ascii files including (1) individual borehole location and construction information, (2) downhole lithologic interval data derived from well driller’s lithology logs and parsed to a series of textural descriptors, and (3) calculated specific capacity from driller’s pumping tests. In addition to the borehole data, a folder containing drillers logs used in the dataset, but not in California Department of Water Resources database of scanned driller’s logs is contained within the data release for reference.
Facebook
TwitterThis data release contains digital data generated by the U.S. Geological Survey under cooperative agreements with Sonoma County Water Agency and the California State Water Resources Control Board to characterize the three-dimensional hydrogeology and water quality of the Russian River Watershed, located in the northern part of the California Coast Ranges section of the Pacific Border province. This dataset contains borehole lithologic and hydrologic data, geospatial data of a three-dimensional hydrogeologic framework model (3D HFM), and gravity data used to support development of the framework model. The borehole dataset is released as a series of .csv ascii files including (1) individual borehole location and construction information, (2) downhole lithologic interval data derived from well driller’s lithology logs and parsed to a series of textural descriptors, and (3) calculated specific capacity from driller’s pumping tests. In addition to the borehole data, a folder containing drillers logs used in the dataset, but not in California Department of Water Resources database of scanned driller’s logs is contained within the data release for reference. The geospatial database contains a polygon feature class that is a 2-dimensional representation of the 3D HFM. The polygon feature class is a cellular array where each model cell has multiple attributes including XY location, and interpolated elevations and thicknesses of hydrogeologic units. The 3D HFM was constructed using methods from previously published reports. Gravity data collected in support of 3D framework construction are present as a “child” item of the main data release. Sources of geologic data, 3D HFM construction methods, and additional gravity data can be found in the metadata.
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Facebook
TwitterAttribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
VIVID-10M
[project page] | [Paper] | [arXiv] VIVID-10M is the first large-scale hybrid image-video local editing dataset aimed at reducing data construction and model training costs, comprising 9.7M samples that encompass a wide range of video editing tasks.
Data Index
The data index is located at four .csv files: vivid-image-change.csv vivid-image-remove.csv vivid-video-change.csv vivid-video-remove.csv
VIVID-Video splits contains the columns: local_caption, #… See the full description on the dataset page: https://huggingface.co/datasets/KlingTeam/VIVID-10M.