31 datasets found
  1. FSDKaggle2018

    • zenodo.org
    • opendatalab.com
    • +2more
    zip
    Updated Jan 24, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Eduardo Fonseca; Eduardo Fonseca; Xavier Favory; Jordi Pons; Frederic Font; Frederic Font; Manoj Plakal; Daniel P. W. Ellis; Daniel P. W. Ellis; Xavier Serra; Xavier Serra; Xavier Favory; Jordi Pons; Manoj Plakal (2020). FSDKaggle2018 [Dataset]. http://doi.org/10.5281/zenodo.2552860
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jan 24, 2020
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Eduardo Fonseca; Eduardo Fonseca; Xavier Favory; Jordi Pons; Frederic Font; Frederic Font; Manoj Plakal; Daniel P. W. Ellis; Daniel P. W. Ellis; Xavier Serra; Xavier Serra; Xavier Favory; Jordi Pons; Manoj Plakal
    Description

    FSDKaggle2018 is an audio dataset containing 11,073 audio files annotated with 41 labels of the AudioSet Ontology. FSDKaggle2018 has been used for the DCASE Challenge 2018 Task 2, which was run as a Kaggle competition titled Freesound General-Purpose Audio Tagging Challenge.

    Citation

    If you use the FSDKaggle2018 dataset or part of it, please cite our DCASE 2018 paper:

    Eduardo Fonseca, Manoj Plakal, Frederic Font, Daniel P. W. Ellis, Xavier Favory, Jordi Pons, Xavier Serra. "General-purpose Tagging of Freesound Audio with AudioSet Labels: Task Description, Dataset, and Baseline". Proceedings of the DCASE 2018 Workshop (2018)

    You can also consider citing our ISMIR 2017 paper, which describes how we gathered the manual annotations included in FSDKaggle2018.

    Eduardo Fonseca, Jordi Pons, Xavier Favory, Frederic Font, Dmitry Bogdanov, Andres Ferraro, Sergio Oramas, Alastair Porter, and Xavier Serra, "Freesound Datasets: A Platform for the Creation of Open Audio Datasets", In Proceedings of the 18th International Society for Music Information Retrieval Conference, Suzhou, China, 2017

    Contact

    You are welcome to contact Eduardo Fonseca should you have any questions at eduardo.fonseca@upf.edu.

    About this dataset

    Freesound Dataset Kaggle 2018 (or FSDKaggle2018 for short) is an audio dataset containing 11,073 audio files annotated with 41 labels of the AudioSet Ontology [1]. FSDKaggle2018 has been used for the Task 2 of the Detection and Classification of Acoustic Scenes and Events (DCASE) Challenge 2018. Please visit the DCASE2018 Challenge Task 2 website for more information. This Task was hosted on the Kaggle platform as a competition titled Freesound General-Purpose Audio Tagging Challenge. It was organized by researchers from the Music Technology Group of Universitat Pompeu Fabra, and from Google Research’s Machine Perception Team.

    The goal of this competition was to build an audio tagging system that can categorize an audio clip as belonging to one of a set of 41 diverse categories drawn from the AudioSet Ontology.

    All audio samples in this dataset are gathered from Freesound [2] and are provided here as uncompressed PCM 16 bit, 44.1 kHz, mono audio files. Note that because Freesound content is collaboratively contributed, recording quality and techniques can vary widely.

    The ground truth data provided in this dataset has been obtained after a data labeling process which is described below in the Data labeling process section. FSDKaggle2018 clips are unequally distributed in the following 41 categories of the AudioSet Ontology:

    "Acoustic_guitar", "Applause", "Bark", "Bass_drum", "Burping_or_eructation", "Bus", "Cello", "Chime", "Clarinet", "Computer_keyboard", "Cough", "Cowbell", "Double_bass", "Drawer_open_or_close", "Electric_piano", "Fart", "Finger_snapping", "Fireworks", "Flute", "Glockenspiel", "Gong", "Gunshot_or_gunfire", "Harmonica", "Hi-hat", "Keys_jangling", "Knock", "Laughter", "Meow", "Microwave_oven", "Oboe", "Saxophone", "Scissors", "Shatter", "Snare_drum", "Squeak", "Tambourine", "Tearing", "Telephone", "Trumpet", "Violin_or_fiddle", "Writing".

    Some other relevant characteristics of FSDKaggle2018:

    • The dataset is split into a train set and a test set.

    • The train set is meant to be for system development and includes ~9.5k samples unequally distributed among 41 categories. The minimum number of audio samples per category in the train set is 94, and the maximum 300. The duration of the audio samples ranges from 300ms to 30s due to the diversity of the sound categories and the preferences of Freesound users when recording sounds. The total duration of the train set is roughly 18h.

    • Out of the ~9.5k samples from the train set, ~3.7k have manually-verified ground truth annotations and ~5.8k have non-verified annotations. The non-verified annotations of the train set have a quality estimate of at least 65-70% in each category. Checkout the Data labeling process section below for more information about this aspect.

    • Non-verified annotations in the train set are properly flagged in train.csv so that participants can opt to use this information during the development of their systems.

    • The test set is composed of 1.6k samples with manually-verified annotations and with a similar category distribution than that of the train set. The total duration of the test set is roughly 2h.

    • All audio samples in this dataset have a single label (i.e. are only annotated with one label). Checkout the Data labeling process section below for more information about this aspect. A single label should be predicted for each file in the test set.

    Data labeling process

    The data labeling process started from a manual mapping between Freesound tags and AudioSet Ontology categories (or labels), which was carried out by researchers at the Music Technology Group, Universitat Pompeu Fabra, Barcelona. Using this mapping, a number of Freesound audio samples were automatically annotated with labels from the AudioSet Ontology. These annotations can be understood as weak labels since they express the presence of a sound category in an audio sample.

    Then, a data validation process was carried out in which a number of participants did listen to the annotated sounds and manually assessed the presence/absence of an automatically assigned sound category, according to the AudioSet category description.

    Audio samples in FSDKaggle2018 are only annotated with a single ground truth label (see train.csv). A total of 3,710 annotations included in the train set of FSDKaggle2018 are annotations that have been manually validated as present and predominant (some with inter-annotator agreement but not all of them). This means that in most cases there is no additional acoustic material other than the labeled category. In few cases there may be some additional sound events, but these additional events won't belong to any of the 41 categories of FSDKaggle2018.

    The rest of the annotations have not been manually validated and therefore some of them could be inaccurate. Nonetheless, we have estimated that at least 65-70% of the non-verified annotations per category in the train set are indeed correct. It can happen that some of these non-verified audio samples present several sound sources even though only one label is provided as ground truth. These additional sources are typically out of the set of the 41 categories, but in a few cases they could be within.

    More details about the data labeling process can be found in [3].

    License

    FSDKaggle2018 has licenses at two different levels, as explained next.

    All sounds in Freesound are released under Creative Commons (CC) licenses, and each audio clip has its own license as defined by the audio clip uploader in Freesound. For attribution purposes and to facilitate attribution of these files to third parties, we include a relation of the audio clips included in FSDKaggle2018 and their corresponding license. The licenses are specified in the files train_post_competition.csv and test_post_competition_scoring_clips.csv.

    In addition, FSDKaggle2018 as a whole is the result of a curation process and it has an additional license. FSDKaggle2018 is released under CC-BY. This license is specified in the LICENSE-DATASET file downloaded with the FSDKaggle2018.doc zip file.

    Files

    FSDKaggle2018 can be downloaded as a series of zip files with the following directory structure:

    root
    │
    └───FSDKaggle2018.audio_train/ Audio clips in the train set │
    └───FSDKaggle2018.audio_test/ Audio clips in the test set │
    └───FSDKaggle2018.meta/ Files for evaluation setup │ │
    │ └───train_post_competition.csv Data split and ground truth for the train set │ │
    │ └───test_post_competition_scoring_clips.csv Ground truth for the test set

    └───FSDKaggle2018.doc/ │
    └───README.md The dataset description file you are reading │
    └───LICENSE-DATASET

  2. R

    RECOD.ai events dataset

    • redu.unicamp.br
    Updated Mar 21, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Repositório de Dados de Pesquisa da Unicamp (2025). RECOD.ai events dataset [Dataset]. http://doi.org/10.25824/redu/BLIYYR
    Explore at:
    Dataset updated
    Mar 21, 2025
    Dataset provided by
    Repositório de Dados de Pesquisa da Unicamp
    Dataset funded by
    Fundação de Amparo à Pesquisa do Estado de São Paulo
    Description

    Overview This data set consists of links to social network items for 34 different forensic events that took place between August 14th, 2018 and January 06th, 2021. The majority of the text and images are from Twitter (a minor part is from Flickr, Facebook and Google+), and every video is from YouTube. Data Collection We used Social Tracker, along with the social medias' APIs, to gather most of the collections. For a minor part, we used Twint. In both cases, we provided keywords related to the event to receive the data. It is important to mention that, in procedures like this one, usually only a small fraction of the collected data is in fact related to the event and useful for a further forensic analysis. Content We have data from 34 events, and for each of them we provide the files: items_full.csv: It contains links to any social media post that was collected. images.csv: Enlists the images collected. In some files there is a field called "ItemUrl", that refers to the social network post (e.g., a tweet) that mentions that media. video.csv: Urls of YouTube videos that were gathered about the event. video_tweet.csv: This file contains IDs of tweets and IDs of YouTube videos. A tweet whose ID is in this file has a video in its content. In turn, the link of a Youtube video whose ID is in this file was mentioned by at least one collected tweet. Only two collections have this file. description.txt: Contains some standard information about the event, and possibly some comments about any specific issue related to it. In fact, most of the collections do not have all the files above. Such an issue is due to changes in our collection procedure throughout the time of this work. Events We divided the events into six groups. They are: Fire: Devastating fire is the main issue of the event, therefore most of the informative pictures show flames or burned constructions. 14 Events Collapse: Most of the relevant images depict collapsed buildings, bridges, etc. (not caused by fire). 5 Events Shooting: Likely images of guns and police officers. Few or no destruction of the environment. 5 Events Demonstration: Plethora of people on the streets. Possibly some problem took place on that, but in most cases the demonstration is the actual event. 7 Events Collision: Traffic collision. Pictures of damaged vehicles on an urban landscape. Possibly there are images with victims on the street. 1 Event Flood: Events that range from fierce rain to a tsunami. Many pictures depict water. 2 Events Media Content Due to the terms of use from the social networks, we do not make publicly available the texts, images and videos that were collected. However, we can provide some extra piece of media content related to one (or more) events by contacting the authors.

  3. SOTorrent 2018-12-09

    • kaggle.com
    zip
    Updated Dec 18, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    SOTorrent (2018). SOTorrent 2018-12-09 [Dataset]. https://www.kaggle.com/datasets/sotorrent/2018-12-09
    Explore at:
    zip(0 bytes)Available download formats
    Dataset updated
    Dec 18, 2018
    Dataset authored and provided by
    SOTorrent
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    Please notice

    Tables TitleVersion and Votes are not yet visible in the Data preview page, but they are accessible in Kernels.

    Context

    Stack Overflow (SO) is the most popular question-and-answer website for software developers, providing a large amount of code snippets and free-form text on a wide variety of topics. Like other software artifacts, questions and answers on SO evolve over time, for example when bugs in code snippets are fixed, code is updated to work with a more recent library version, or text surrounding a code snippet is edited for clarity. To be able to analyze how content on SO evolves, we built SOTorrent, an open dataset based on the official SO data dump.

    Content

    SOTorrent provides access to the version history of SO content at the level of whole posts and individual text or code blocks. It connects SO posts to other platforms by aggregating URLs from text blocks and comments, and by collecting references from GitHub files to SO posts. Our vision is that researchers will use SOTorrent to investigate and understand the evolution of SO posts and their relation to other platforms such as GitHub. If you use this dataset in your work, please cite our MSR 2018 paper or our MSR 2019 mining challenge proposal.

    This version is based on the official Stack Overflow data dump released 2018-12-02 and the Google BigQuery GitHub data set queried 2018-12-09.

    Inspiration

    The goal of the MSR 2019 mining challenge is to study the origin, evolution, and usage of Stack Overflow code snippets. Questions that are, to the best of our knowledge, not sufficiently answered yet include:

    • How are code snippets on Stack Overflow maintained?
    • How many clones of code snippets exist inside Stack Overflow?
    • How can we detect buggy versions of Stack Overflow code snippets and find them in GitHub projects?
    • How frequently are code snippets copied from external sources into Stack Overflow and then co-evolve there?
    • How do snippets copied from Stack Overflow to GitHub co-evolve?
    • Does the evolution of Stack Overflow code snippets follow patterns?
    • Do these patterns differ between programming languages?
    • Are the licenses of external sources compatible with Stack Overflow’s license (CC BY-SA 3.0)?
    • How many code blocks on Stack Overflow do not contain source code (and are only used for markup)?
    • Can we reliably predict bug-fixing edits to code on Stack Overflow?
    • Can we reliably predict popularity of Stack Overflow code snippets on GitHub?

    These are just some of the questions that could be answered using SOTorrent. We encourage challenge participants to adapt the above questions or formulate their own research questions about the origin, evolution, and usage of content on Stack Overflow.

  4. Medicare and Medicaid Services

    • kaggle.com
    zip
    Updated Apr 22, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Google BigQuery (2020). Medicare and Medicaid Services [Dataset]. https://www.kaggle.com/datasets/bigquery/sdoh-hrsa-shortage-areas
    Explore at:
    zip(0 bytes)Available download formats
    Dataset updated
    Apr 22, 2020
    Dataset provided by
    BigQueryhttps://cloud.google.com/bigquery
    Authors
    Google BigQuery
    Description

    Context

    This public dataset was created by the Centers for Medicare & Medicaid Services. The data summarize counts of enrollees who are dually-eligible for both Medicare and Medicaid program, including those in Medicare Savings Programs. “Duals” represent 20 percent of all Medicare beneficiaries, yet they account for 34 percent of all spending by the program, according to the Commonwealth Fund . As a representation of this high-needs, high-cost population, these data offer a view of regions ripe for more intensive care coordination that can address complex social and clinical needs. In addition to the high cost savings opportunity to deliver upstream clinical interventions, this population represents the county-by-county volume of patients who are eligible for both state level (Medicaid) and federal level (Medicare) reimbursements and potential funding streams to address unmet social needs across various programs, waivers, and other projects. The dataset includes eligibility type and enrollment by quarter, at both the state and county level. These data represent monthly snapshots submitted by states to the CMS, which are inherently lower than ever-enrolled counts (which include persons enrolled at any time during a calendar year.) For more information on dually eligible beneficiaries

    Querying BigQuery tables

    You can use the BigQuery Python client library to query tables in this dataset in Kernels. Note that methods available in Kernels are limited to querying data. Tables are at bigquery-public-data.sdoh_cms_dual_eligible_enrollment.

    Sample Query

    In what counties in Michigan has the number of dual-eligible individuals increased the most from 2015 to 2018? Find the counties in Michigan which have experienced the largest increase of dual enrollment households

    duals_Jan_2015 AS ( SELECT Public_Total AS duals_2015, County_Name, FIPS FROM bigquery-public-data.sdoh_cms_dual_eligible_enrollment.dual_eligible_enrollment_by_county_and_program WHERE State_Abbr = "MI" AND Date = '2015-12-01' ),

    duals_increase AS ( SELECT d18.FIPS, d18.County_Name, d15.duals_2015, d18.duals_2018, (d18.duals_2018 - d15.duals_2015) AS total_duals_diff FROM duals_Jan_2018 d18 JOIN duals_Jan_2015 d15 ON d18.FIPS = d15.FIPS )

    SELECT * FROM duals_increase WHERE total_duals_diff IS NOT NULL ORDER BY total_duals_diff DESC

  5. f

    Travel time to cities and ports in the year 2015

    • figshare.com
    tiff
    Updated May 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Andy Nelson (2023). Travel time to cities and ports in the year 2015 [Dataset]. http://doi.org/10.6084/m9.figshare.7638134.v4
    Explore at:
    tiffAvailable download formats
    Dataset updated
    May 30, 2023
    Dataset provided by
    figshare
    Authors
    Andy Nelson
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The dataset and the validation are fully described in a Nature Scientific Data Descriptor https://www.nature.com/articles/s41597-019-0265-5

    If you want to use this dataset in an interactive environment, then use this link https://mybinder.org/v2/gh/GeographerAtLarge/TravelTime/HEAD

    The following text is a summary of the information in the above Data Descriptor.

    The dataset is a suite of global travel-time accessibility indicators for the year 2015, at approximately one-kilometre spatial resolution for the entire globe. The indicators show an estimated (and validated), land-based travel time to the nearest city and nearest port for a range of city and port sizes.

    The datasets are in GeoTIFF format and are suitable for use in Geographic Information Systems and statistical packages for mapping access to cities and ports and for spatial and statistical analysis of the inequalities in access by different segments of the population.

    These maps represent a unique global representation of physical access to essential services offered by cities and ports.

    The datasets travel_time_to_cities_x.tif (where x has values from 1 to 12) The value of each pixel is the estimated travel time in minutes to the nearest urban area in 2015. There are 12 data layers based on different sets of urban areas, defined by their population in year 2015 (see PDF report).

    travel_time_to_ports_x (x ranges from 1 to 5)

    The value of each pixel is the estimated travel time to the nearest port in 2015. There are 5 data layers based on different port sizes.

    Format Raster Dataset, GeoTIFF, LZW compressed Unit Minutes

    Data type Byte (16 bit Unsigned Integer)

    No data value 65535

    Flags None

    Spatial resolution 30 arc seconds

    Spatial extent

    Upper left -180, 85

    Lower left -180, -60 Upper right 180, 85 Lower right 180, -60 Spatial Reference System (SRS) EPSG:4326 - WGS84 - Geographic Coordinate System (lat/long)

    Temporal resolution 2015

    Temporal extent Updates may follow for future years, but these are dependent on the availability of updated inputs on travel times and city locations and populations.

    Methodology Travel time to the nearest city or port was estimated using an accumulated cost function (accCost) in the gdistance R package (van Etten, 2018). This function requires two input datasets: (i) a set of locations to estimate travel time to and (ii) a transition matrix that represents the cost or time to travel across a surface.

    The set of locations were based on populated urban areas in the 2016 version of the Joint Research Centre’s Global Human Settlement Layers (GHSL) datasets (Pesaresi and Freire, 2016) that represent low density (LDC) urban clusters and high density (HDC) urban areas (https://ghsl.jrc.ec.europa.eu/datasets.php). These urban areas were represented by points, spaced at 1km distance around the perimeter of each urban area.

    Marine ports were extracted from the 26th edition of the World Port Index (NGA, 2017) which contains the location and physical characteristics of approximately 3,700 major ports and terminals. Ports are represented as single points

    The transition matrix was based on the friction surface (https://map.ox.ac.uk/research-project/accessibility_to_cities) from the 2015 global accessibility map (Weiss et al, 2018).

    Code The R code used to generate the 12 travel time maps is included in the zip file that can be downloaded with these data layers. The processing zones are also available.

    Validation The underlying friction surface was validated by comparing travel times between 47,893 pairs of locations against journey times from a Google API. Our estimated journey times were generally shorter than those from the Google API. Across the tiles, the median journey time from our estimates was 88 minutes within an interquartile range of 48 to 143 minutes while the median journey time estimated by the Google API was 106 minutes within an interquartile range of 61 to 167 minutes. Across all tiles, the differences were skewed to the left and our travel time estimates were shorter than those reported by the Google API in 72% of the tiles. The median difference was −13.7 minutes within an interquartile range of −35.5 to 2.0 minutes while the absolute difference was 30 minutes or less for 60% of the tiles and 60 minutes or less for 80% of the tiles. The median percentage difference was −16.9% within an interquartile range of −30.6% to 2.7% while the absolute percentage difference was 20% or less in 43% of the tiles and 40% or less in 80% of the tiles.

    This process and results are included in the validation zip file.

    Usage Notes The accessibility layers can be visualised and analysed in many Geographic Information Systems or remote sensing software such as QGIS, GRASS, ENVI, ERDAS or ArcMap, and also by statistical and modelling packages such as R or MATLAB. They can also be used in cloud-based tools for geospatial analysis such as Google Earth Engine.

    The nine layers represent travel times to human settlements of different population ranges. Two or more layers can be combined into one layer by recording the minimum pixel value across the layers. For example, a map of travel time to the nearest settlement of 5,000 to 50,000 people could be generated by taking the minimum of the three layers that represent the travel time to settlements with populations between 5,000 and 10,000, 10,000 and 20,000 and, 20,000 and 50,000 people.

    The accessibility layers also permit user-defined hierarchies that go beyond computing the minimum pixel value across layers. A user-defined complete hierarchy can be generated when the union of all categories adds up to the global population, and the intersection of any two categories is empty. Everything else is up to the user in terms of logical consistency with the problem at hand.

    The accessibility layers are relative measures of the ease of access from a given location to the nearest target. While the validation demonstrates that they do correspond to typical journey times, they cannot be taken to represent actual travel times. Errors in the friction surface will be accumulated as part of the accumulative cost function and it is likely that locations that are further away from targets will have greater a divergence from a plausible travel time than those that are closer to the targets. Care should be taken when referring to travel time to the larger cities when the locations of interest are extremely remote, although they will still be plausible representations of relative accessibility. Furthermore, a key assumption of the model is that all journeys will use the fastest mode of transport and take the shortest path.

  6. Chicago Taxi Trips

    • kaggle.com
    • data.cityofchicago.org
    zip
    Updated Apr 18, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    City of Chicago (2018). Chicago Taxi Trips [Dataset]. https://www.kaggle.com/datasets/chicago/chicago-taxi-trips-bq
    Explore at:
    zip(0 bytes)Available download formats
    Dataset updated
    Apr 18, 2018
    Dataset authored and provided by
    City of Chicago
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Area covered
    Chicago
    Description

    Context

    Taxicabs in Chicago, Illinois, are operated by private companies and licensed by the city. There are about seven thousand licensed cabs operating within the city limits. Licenses are obtained through the purchase or lease of a taxi medallion which is then affixed to the top right hood of the car. Source: https://en.wikipedia.org/wiki/Taxicabs_of_the_United_States#Chicago

    Content

    This dataset includes taxi trips from 2013 to the present, reported to the City of Chicago in its role as a regulatory agency. To protect privacy but allow for aggregate analyses, the Taxi ID is consistent for any given taxi medallion number but does not show the number, Census Tracts are suppressed in some cases, and times are rounded to the nearest 15 minutes. Due to the data reporting process, not all trips are reported but the City believes that most are. See http://digital.cityofchicago.org/index.php/chicago-taxi-data-released for more information about this dataset and how it was created.

    Fork this kernel to get started.

    Acknowledgements

    https://bigquery.cloud.google.com/dataset/bigquery-public-data:chicago_taxi_trips

    https://cloud.google.com/bigquery/public-data/chicago-taxi

    Dataset Source: City of Chicago

    This dataset is publicly available for anyone to use under the following terms provided by the Dataset Source —https://data.cityofchicago.org — and is provided "AS IS" without any warranty, express or implied, from Google. Google disclaims all liability for any damages, direct or indirect, resulting from the use of the dataset.

    Banner Photo by Ferdinand Stohr from Unplash.

    Inspiration

    What are the maximum, minimum and average fares for rides lasting 10 minutes or more? Which drop-off areas have the highest average tip? How does trip duration affect fare rates for trips lasting less than 90 minutes?

    https://cloud.google.com/bigquery/images/chicago-taxi-fares-by-duration.png" alt=""> https://cloud.google.com/bigquery/images/chicago-taxi-fares-by-duration.png

  7. Area Deprivation Index (ADI)

    • console.cloud.google.com
    Updated May 27, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    https://console.cloud.google.com/marketplace/browse?filter=partner:BroadStreet&hl=ko&inv=1&invt=Ab4C6A (2023). Area Deprivation Index (ADI) [Dataset]. https://console.cloud.google.com/marketplace/product/broadstreet-public-data/adi?hl=ko
    Explore at:
    Dataset updated
    May 27, 2023
    Dataset provided by
    Googlehttp://google.com/
    Description

    The Area Deprivation Index (ADI) can show where areas of deprivation and affluence exist within a community. The ADI is calculated with 17 indicators from the American Community Survey (ACS) having been well-studied in the peer-reviewed literature since 2003, and used for 20 years by the Health Resources and Services Administration (HRSA). High levels of deprivation have been linked to health outcomes such as 30-day hospital readmission rates, cardiovascular disease deaths, cervical cancer incidence, cancer deaths, and all-cause mortality. The 17 indicators from the ADI encompass income, education, employment, and housing conditions at the Census Block Group level.The ADI is available on BigQuery for release years 2018-2020 and is reported as a percentile that is 0-100% with 50% indicating a "middle of the nation" percentile. Data is provided at the county, ZIP, and Census Block Group levels. Neighborhood and racial disparities occur when some neighborhoods have high ADI scores and others have low scores. A low ADI score indicates affluence or prosperity. A high ADI score is indicative of high levels of deprivation. Raw ADI scores and additional statistics and dataviz can be seen in this ADI story with a BroadStreet free account.Much of the ADI research and popularity would not be possible without the excellent work of Dr. Amy Kind and colleagues at HIPxChange and at The University of Wisconsin Madison.This public dataset is hosted in Google BigQuery and is included in BigQuery's 1TB/mo of free tier processing. This means that each user receives 1TB of free BigQuery processing every month, which can be used to run queries on this public dataset. Watch this short video to learn how to get started quickly using BigQuery to access public datasets. What is BigQuery. 자세히 알아보기

  8. I

    Dataset for: "A Dual-Frequency Radar Retrieval of Snowfall Properties Using...

    • aws-databank-alb.library.illinois.edu
    • databank.illinois.edu
    Updated Nov 18, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Randy Chase (2020). Dataset for: "A Dual-Frequency Radar Retrieval of Snowfall Properties Using a Neural Network" [Dataset]. http://doi.org/10.13012/B2IDB-0791318_V2
    Explore at:
    Dataset updated
    Nov 18, 2020
    Authors
    Randy Chase
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Dataset funded by
    U.S. National Aeronautics and Space Administration (NASA)
    Description

    This is the dataset that accompanies the paper titled "A Dual-Frequency Radar Retrieval of Snowfall Properties Using a Neural Network", submitted for peer review in August 2020. Please see the github for the most up-to-date data after the revision process: https://github.com/dopplerchase/Chase_et_al_2021_NN Authors: Randy J. Chase, Stephen W. Nesbitt and Greg M. McFarquhar Corresponding author: Randy J. Chase (randyjc2@illinois.edu) Here we have the data used in the manuscript. Please email me if you have specific questions about units etc. 1) DDA/GMM database of scattering properties: base_df_DDA.csv This is the combined dataset from the following papers: Leinonen & Moisseev, 2015; Leinonen & Szyrmer, 2015; Lu et al., 2016; Kuo et al., 2016; Eriksson et al., 2018. The column names are D: Maximum dimension in meters, M: particle mass in grams kg, sigma_ku: backscatter cross-section at ku in m^2, sigma_ka: backscatter cross-section at ka in m^2, sigma_w: backscatter cross-section at w in m^2. The first column is just an index column. 2) Synthetic Data used to train and test the neural network: Unrimed_simulation_wholespecturm_train_V2.nc, Unrimed_simulation_wholespecturm_test_V2.nc This was the result of combining the PSDs and DDA/GMM particles randomly to build the training and test dataset. 3) Notebook for training the network using the synthetic database and Google Colab (tensorflow): Train_Neural_Network_Chase2020.ipynb This is the notebook used to train the neural network. 4)Trained tensorflow neural network: NN_6by8.h5 This is the hdf5 tensorflow model that resulted from the training. You will need this to run the retrieval. 5) Scalers needed to apply the neural network: scaler_X_V2.pkl, scaler_y_V2.pkl These are the sklearn scalers used in training the neural network. You will need these to scale your data if you wish to run the retrieval. 6) New in this version - Example notebook of how to run the trained neural network on Ku- Ka- band observations. We showed this with the 3rd case in the paper: Run_Chase2021_NN.ipynb 7) New in this version - APR data used to show how to run the neural network retrieval: Chase_2021_NN_APR03Dec2015.nc The data for the analysis on the observations are not provided here because of the size of the radar data. Please see the GHRC website (https://ghrc.nsstc.nasa.gov/home/) if you wish to download the radar and in-situ data or contact me. We can coordinate transferring the exact datafiles used. The GPM-DPR data are avail. here: http://dx.doi.org/10.5067/GPM/DPR/GPM/2A/05

  9. Bitcoin Blockchain Historical Data

    • kaggle.com
    zip
    Updated Feb 12, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Google BigQuery (2019). Bitcoin Blockchain Historical Data [Dataset]. https://www.kaggle.com/datasets/bigquery/bitcoin-blockchain
    Explore at:
    zip(0 bytes)Available download formats
    Dataset updated
    Feb 12, 2019
    Dataset provided by
    BigQueryhttps://cloud.google.com/bigquery
    Authors
    Google BigQuery
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Context

    Blockchain technology, first implemented by Satoshi Nakamoto in 2009 as a core component of Bitcoin, is a distributed, public ledger recording transactions. Its usage allows secure peer-to-peer communication by linking blocks containing hash pointers to a previous block, a timestamp, and transaction data. Bitcoin is a decentralized digital currency (cryptocurrency) which leverages the Blockchain to store transactions in a distributed manner in order to mitigate against flaws in the financial industry.

    Nearly ten years after its inception, Bitcoin and other cryptocurrencies experienced an explosion in popular awareness. The value of Bitcoin, on the other hand, has experienced more volatility. Meanwhile, as use cases of Bitcoin and Blockchain grow, mature, and expand, hype and controversy have swirled.

    Content

    In this dataset, you will have access to information about blockchain blocks and transactions. All historical data are in the bigquery-public-data:crypto_bitcoin dataset. It’s updated it every 10 minutes. The data can be joined with historical prices in kernels. See available similar datasets here: https://www.kaggle.com/datasets?search=bitcoin.

    Querying BigQuery tables

    You can use the BigQuery Python client library to query tables in this dataset in Kernels. Note that methods available in Kernels are limited to querying data. Tables are at bigquery-public-data.crypto_bitcoin.[TABLENAME]. Fork this kernel to get started.

    Method & Acknowledgements

    Allen Day (Twitter | Medium), Google Cloud Developer Advocate & Colin Bookman, Google Cloud Customer Engineer retrieve data from the Bitcoin network using a custom client available on GitHub that they built with the bitcoinj Java library. Historical data from the origin block to 2018-01-31 were loaded in bulk to two BigQuery tables, blocks_raw and transactions. These tables contain fresh data, as they are now appended when new blocks are broadcast to the Bitcoin network. For additional information visit the Google Cloud Big Data and Machine Learning Blog post "Bitcoin in BigQuery: Blockchain analytics on public data".

    Photo by Andre Francois on Unsplash.

    Inspiration

    • How many bitcoins are sent each day?
    • How many addresses receive bitcoin each day?
    • Compare transaction volume to historical prices by joining with other available data sources
  10. Cewe fotokniha brand search 2018

    • kaggle.com
    Updated Aug 31, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Cewe Slovakia (2018). Cewe fotokniha brand search 2018 [Dataset]. https://www.kaggle.com/datasets/cewe-slovakia/cewe-fotokniha-2018/discussion
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 31, 2018
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Cewe Slovakia
    Description

    Context

    Quick overview how brand is performing in Slovakia during 2018.

    Content

    First touch with Google Trends led us to make quick brand recognition research with this tool. We have been looking for brand searches connected with "cewe fotokniha" topic. Google Trends provided us broad insight into search behavior within search for web, images and youtube content.

    Acknowledgements

    These data are public to make quick research how any brand is performing within certain area or reqion.

  11. u

    Pinterest Fashion Compatibility

    • cseweb.ucsd.edu
    • beta.data.urbandatacentre.ca
    json
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    UCSD CSE Research Project, Pinterest Fashion Compatibility [Dataset]. https://cseweb.ucsd.edu/~jmcauley/datasets.html
    Explore at:
    jsonAvailable download formats
    Dataset authored and provided by
    UCSD CSE Research Project
    Description

    This dataset contains images (scenes) containing fashion products, which are labeled with bounding boxes and links to the corresponding products.

    Metadata includes

    • product IDs

    • bounding boxes

    Basic Statistics:

    • Scenes: 47,739

    • Products: 38,111

    • Scene-Product Pairs: 93,274

  12. T

    speech_commands

    • tensorflow.org
    • datasets.activeloop.ai
    • +1more
    Updated Jan 13, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2023). speech_commands [Dataset]. http://identifiers.org/arxiv:1804.03209
    Explore at:
    Dataset updated
    Jan 13, 2023
    Description

    An audio dataset of spoken words designed to help train and evaluate keyword spotting systems. Its primary goal is to provide a way to build and test small models that detect when a single word is spoken, from a set of ten target words, with as few false positives as possible from background noise or unrelated speech. Note that in the train and validation set, the label "unknown" is much more prevalent than the labels of the target words or background noise. One difference from the release version is the handling of silent segments. While in the test set the silence segments are regular 1 second files, in the training they are provided as long segments under "background_noise" folder. Here we split these background noise into 1 second clips, and also keep one of the files for the validation set.

    To use this dataset:

    import tensorflow_datasets as tfds
    
    ds = tfds.load('speech_commands', split='train')
    for ex in ds.take(4):
     print(ex)
    

    See the guide for more informations on tensorflow_datasets.

  13. A

    ‘Travel Review Rating Dataset’ analyzed by Analyst-2

    • analyst-2.ai
    Updated Sep 30, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2021). ‘Travel Review Rating Dataset’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-travel-review-rating-dataset-d315/6c6ad6b1/?iid=003-929&v=presentation
    Explore at:
    Dataset updated
    Sep 30, 2021
    Dataset authored and provided by
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Analysis of ‘Travel Review Rating Dataset’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/wirachleelakiatiwong/travel-review-rating-dataset on 30 September 2021.

    --- Dataset description provided by original source is as follows ---

    Context

    This data set has been sourced from the Machine Learning Repository of University of California, Irvine (UC Irvine) : Travel Review Ratings Data Set. This data set is populated by capturing user ratings from Google reviews. Reviews on attractions from 24 categories across Europe are considered. Google user rating ranges from 1 to 5 and average user rating per category is calculated.

    Content

    Attribute 1 : Unique user id Attribute 2 : Average ratings on churches Attribute 3 : Average ratings on resorts Attribute 4 : Average ratings on beaches Attribute 5 : Average ratings on parks Attribute 6 : Average ratings on theatres Attribute 7 : Average ratings on museums Attribute 8 : Average ratings on malls Attribute 9 : Average ratings on zoo Attribute 10 : Average ratings on restaurants Attribute 11 : Average ratings on pubs/bars Attribute 12 : Average ratings on local services Attribute 13 : Average ratings on burger/pizza shops Attribute 14 : Average ratings on hotels/other lodgings Attribute 15 : Average ratings on juice bars Attribute 16 : Average ratings on art galleries Attribute 17 : Average ratings on dance clubs Attribute 18 : Average ratings on swimming pools Attribute 19 : Average ratings on gyms Attribute 20 : Average ratings on bakeries Attribute 21 : Average ratings on beauty & spas Attribute 22 : Average ratings on cafes Attribute 23 : Average ratings on view points Attribute 24 : Average ratings on monuments Attribute 25 : Average ratings on gardens

    Acknowledgements

    This data set has been sourced from the Machine Learning Repository of University of California, Irvine (UC Irvine) : Travel Review Ratings Data Set

    The UCI page mentions the following publication as the original source of the data set: Renjith, Shini, A. Sreekumar, and M. Jathavedan. 2018. Evaluation of Partitioning Clustering Algorithms for Processing Social Media Data in Tourism Domain. In 2018 IEEE Recent Advances in Intelligent Computational Systems (RAICS), 12731. IEEE

    Inspiration

    I'm kind of people who love traveling. But sometimes I've problems like where should I visit? Are there somewhere interesting places matched with my lifestyle? Often I spent hours to search for interesting place to go out. Such a waste of time.

    What if we can build a recommender system which can recommend you several interesting venue based on your preferences. With information from Google review, I'll try to divide Google review user into cluster of similar interest for further work of building recommender system based on thier preference.

    --- Original source retains full ownership of the source dataset ---

  14. n

    Repository Analytics and Metrics Portal (RAMP) 2017 data

    • data.niaid.nih.gov
    • datadryad.org
    • +1more
    zip
    Updated Jul 27, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jonathan Wheeler; Kenning Arlitsch (2021). Repository Analytics and Metrics Portal (RAMP) 2017 data [Dataset]. http://doi.org/10.5061/dryad.r7sqv9scf
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jul 27, 2021
    Dataset provided by
    Montana State University
    University of New Mexico
    Authors
    Jonathan Wheeler; Kenning Arlitsch
    License

    https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html

    Description

    The Repository Analytics and Metrics Portal (RAMP) is a web service that aggregates use and performance use data of institutional repositories. The data are a subset of data from RAMP, the Repository Analytics and Metrics Portal (http://rampanalytics.org), consisting of data from all participating repositories for the calendar year 2017. For a description of the data collection, processing, and output methods, please see the "methods" section below.

    Methods RAMP Data Documentation – January 1, 2017 through August 18, 2018

    Data Collection

    RAMP data are downloaded for participating IR from Google Search Console (GSC) via the Search Console API. The data consist of aggregated information about IR pages which appeared in search result pages (SERP) within Google properties (including web search and Google Scholar).

    Data from January 1, 2017 through August 18, 2018 were downloaded in one dataset per participating IR. The following fields were downloaded for each URL, with one row per URL:

    url: This is returned as a 'page' by the GSC API, and is the URL of the page which was included in an SERP for a Google property.
    impressions: The number of times the URL appears within the SERP.
    clicks: The number of clicks on a URL which took users to a page outside of the SERP.
    clickThrough: Calculated as the number of clicks divided by the number of impressions.
    position: The position of the URL within the SERP.
    country: The country from which the corresponding search originated.
    device: The device used for the search.
    date: The date of the search.
    

    Following data processing describe below, on ingest into RAMP an additional field, citableContent, is added to the page level data.

    Note that no personally identifiable information is downloaded by RAMP. Google does not make such information available.

    More information about click-through rates, impressions, and position is available from Google's Search Console API documentation: https://developers.google.com/webmaster-tools/search-console-api-original/v3/searchanalytics/query and https://support.google.com/webmasters/answer/7042828?hl=en

    Data Processing

    Upon download from GSC, data are processed to identify URLs that point to citable content. Citable content is defined within RAMP as any URL which points to any type of non-HTML content file (PDF, CSV, etc.). As part of the daily download of statistics from Google Search Console (GSC), URLs are analyzed to determine whether they point to HTML pages or actual content files. URLs that point to content files are flagged as "citable content." In addition to the fields downloaded from GSC described above, following this brief analysis one more field, citableContent, is added to the data which records whether each URL in the GSC data points to citable content. Possible values for the citableContent field are "Yes" and "No."

    Processed data are then saved in a series of Elasticsearch indices. From January 1, 2017, through August 18, 2018, RAMP stored data in one index per participating IR.

    About Citable Content Downloads

    Data visualizations and aggregations in RAMP dashboards present information about citable content downloads, or CCD. As a measure of use of institutional repository content, CCD represent click activity on IR content that may correspond to research use.

    CCD information is summary data calculated on the fly within the RAMP web application. As noted above, data provided by GSC include whether and how many times a URL was clicked by users. Within RAMP, a "click" is counted as a potential download, so a CCD is calculated as the sum of clicks on pages/URLs that are determined to point to citable content (as defined above).

    For any specified date range, the steps to calculate CCD are:

    Filter data to only include rows where "citableContent" is set to "Yes."
    Sum the value of the "clicks" field on these rows.
    

    Output to CSV

    Published RAMP data are exported from the production Elasticsearch instance and converted to CSV format. The CSV data consist of one "row" for each page or URL from a specific IR which appeared in search result pages (SERP) within Google properties as described above.

    The data in these CSV files include the following fields:

    url: This is returned as a 'page' by the GSC API, and is the URL of the page which was included in an SERP for a Google property.
    impressions: The number of times the URL appears within the SERP.
    clicks: The number of clicks on a URL which took users to a page outside of the SERP.
    clickThrough: Calculated as the number of clicks divided by the number of impressions.
    position: The position of the URL within the SERP.
    country: The country from which the corresponding search originated.
    device: The device used for the search.
    date: The date of the search.
    citableContent: Whether or not the URL points to a content file (ending with pdf, csv, etc.) rather than HTML wrapper pages. Possible values are Yes or No.
    index: The Elasticsearch index corresponding to page click data for a single IR.
    repository_id: This is a human readable alias for the index and identifies the participating repository corresponding to each row. As RAMP has undergone platform and version migrations over time, index names as defined for the index field have not remained consistent. That is, a single participating repository may have multiple corresponding Elasticsearch index names over time. The repository_id is a canonical identifier that has been added to the data to provide an identifier that can be used to reference a single participating repository across all datasets. Filtering and aggregation for individual repositories or groups of repositories should be done using this field.
    

    Filenames for files containing these data follow the format 2017-01_RAMP_all.csv. Using this example, the file 2017-01_RAMP_all.csv contains all data for all RAMP participating IR for the month of January, 2017.

    References

    Google, Inc. (2021). Search Console APIs. Retrieved from https://developers.google.com/webmaster-tools/search-console-api-original.

  15. u

    Data from: A dataset of spatiotemporally sampled MODIS Leaf Area Index with...

    • agdatacommons.nal.usda.gov
    application/csv
    Updated May 1, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yanghui Kang; Mutlu Ozdogan; Feng Gao; Martha C. Anderson; William A. White; Yun Yang; Yang Yang; Tyler A. Erickson (2025). A dataset of spatiotemporally sampled MODIS Leaf Area Index with corresponding Landsat surface reflectance over the contiguous US [Dataset]. http://doi.org/10.15482/USDA.ADC/1521097
    Explore at:
    application/csvAvailable download formats
    Dataset updated
    May 1, 2025
    Dataset provided by
    Ag Data Commons
    Authors
    Yanghui Kang; Mutlu Ozdogan; Feng Gao; Martha C. Anderson; William A. White; Yun Yang; Yang Yang; Tyler A. Erickson
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Contiguous United States, United States
    Description

    Leaf Area Index (LAI) is a fundamental vegetation structural variable that drives energy and mass exchanges between the plant and the atmosphere. Moderate-resolution (300m – 7km) global LAI data products have been widely applied to track global vegetation changes, drive Earth system models, monitor crop growth and productivity, etc. Yet, cutting-edge applications in climate adaptation, hydrology, and sustainable agriculture require LAI information at higher spatial resolution (< 100m) to model and understand heterogeneous landscapes. This dataset was built to assist a machine-learning-based approach for mapping LAI from 30m-resolution Landsat images across the contiguous US (CONUS). The data was derived from the Moderate Resolution Imaging Spectroradiometer (MODIS) Version 6 LAI/FPAR, Landsat Collection 1 surface reflectance, and NLCD Land Cover datasets over 2006 – 2018 using Google Earth Engine. Each record/sample/row includes a MODIS LAI value, corresponding Landsat surface reflectance in green, red, NIR, SWIR1 bands, a land cover (biome) type, geographic location, and other auxiliary information. Each sample represents a MODIS LAI pixel (500m) within which a single biome type dominates 90% of the area. The spatial homogeneity of the samples was further controlled by a screening process based on the coefficient of variation of the Landsat surface reflectance. In total, there are approximately 1.6 million samples, stratified by biome, Landsat sensor, and saturation status from the MODIS LAI algorithm. This dataset can be used to train machine learning models and generate LAI maps for Landsat 5, 7, 8 surface reflectance images within CONUS. Detailed information on the sample generation and quality control can be found in the related journal article. Resources in this dataset:Resource Title: README. File Name: LAI_train_samples_CONUS_README.txtResource Description: Description and metadata of the main datasetResource Software Recommended: Notepad,url: https://www.microsoft.com/en-us/p/windows-notepad/9msmlrh6lzf3?activetab=pivot:overviewtab Resource Title: LAI_training_samples_CONUS. File Name: LAI_train_samples_CONUS_v0.1.1.csvResource Description: This CSV file consists of the training samples for estimating Leaf Area Index based on Landsat surface reflectance images (Collection 1 Tire 1). Each sample has a MODIS LAI value and corresponding surface reflectance derived from Landsat pixels within the MODIS pixel. Contact: Yanghui Kang (kangyanghui@gmail.com)
    Column description

    UID: Unique identifier. Format: LATITUDE_LONGITUDE_SENSOR_PATHROW_DATE
    Landsat_ID: Landsat image ID Date: Landsat image date in "YYYYMMDD" Latitude: Latitude (WGS84) of the MODIS LAI pixel center Longitude: Longitude (WGS84) of the MODIS LAI pixel center MODIS_LAI: MODIS LAI value in "m2/m2" MODIS_LAI_std: MODIS LAI standard deviation in "m2/m2" MODIS_LAI_sat: 0 - MODIS Main (RT) method used no saturation; 1 - MODIS Main (RT) method with saturation NLCD_class: Majority class code from the National Land Cover Dataset (NLCD) NLCD_frequency: Percentage of the area cover by the majority class from NLCD Biome: Biome type code mapped from NLCD (see below for more information) Blue: Landsat surface reflectance in the blue band Green: Landsat surface reflectance in the green band Red: Landsat surface reflectance in the red band Nir: Landsat surface reflectance in the near infrared band Swir1: Landsat surface reflectance in the shortwave infrared 1 band Swir2: Landsat surface reflectance in the shortwave infrared 2 band Sun_zenith: Solar zenith angle from the Landsat image metadata. This is a scene-level value. Sun_azimuth: Solar azimuth angle from the Landsat image metadata. This is a scene-level value. NDVI: Normalized Difference Vegetation Index computed from Landsat surface reflectance EVI: Enhanced Vegetation Index computed from Landsat surface reflectance NDWI: Normalized Difference Water Index computed from Landsat surface reflectance GCI: Green Chlorophyll Index = Nir/Green - 1

    Biome code

    1 - Deciduous Forest
    2 - Evergreen Forest
    3 - Mixed Forest
    4 - Shrubland
    5 - Grassland/Pasture
    6 - Cropland
    7 - Woody Wetland
    8 - Herbaceous Wetland

    Reference Dataset: All data was accessed through Google Earth Engine Gorelick, N., Hancher, M., Dixon, M., Ilyushchenko, S., Thau, D., & Moore, R. (2017). Google Earth Engine: Planetary-scale geospatial analysis for everyone. Remote Sensing of Environment. MODIS Version 6 Leaf Area Index/FPAR 4-day L5 Global 500m Myneni, R., Y. Knyazikhin, T. Park. MOD15A2H MODIS/Terra Leaf Area Index/FPAR 8-Day L4 Global 500m SIN Grid V006. 2015, distributed by NASA EOSDIS Land Processes DAAC, https://doi.org/10.5067/MODIS/MOD15A2H.006 Landsat 5/7/8 Collection 1 Surface Reflectance Landsat Level-2 Surface Reflectance Science Product courtesy of the U.S. Geological Survey. Masek, J.G., Vermote, E.F., Saleous N.E., Wolfe, R., Hall, F.G., Huemmrich, K.F., Gao, F., Kutler, J., and Lim, T-K. (2006). A Landsat surface reflectance dataset for North America, 1990–2000. IEEE Geoscience and Remote Sensing Letters 3(1):68-72. http://dx.doi.org/10.1109/LGRS.2005.857030. Vermote, E., Justice, C., Claverie, M., & Franch, B. (2016). Preliminary analysis of the performance of the Landsat 8/OLI land surface reflectance product. Remote Sensing of Environment. http://dx.doi.org/10.1016/j.rse.2016.04.008. National Land Cover Dataset (NLCD) Yang, Limin, Jin, Suming, Danielson, Patrick, Homer, Collin G., Gass, L., Bender, S.M., Case, Adam, Costello, C., Dewitz, Jon A., Fry, Joyce A., Funk, M., Granneman, Brian J., Liknes, G.C., Rigge, Matthew B., Xian, George, A new generation of the United States National Land Cover Database—Requirements, research priorities, design, and implementation strategies: ISPRS Journal of Photogrammetry and Remote Sensing, v. 146, p. 108–123, at https://doi.org/10.1016/j.isprsjprs.2018.09.006 Resource Software Recommended: Microsoft Excel,url: https://www.microsoft.com/en-us/microsoft-365/excel

  16. T

    rlu_atari_checkpoints_ordered

    • tensorflow.org
    Updated Dec 9, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2021). rlu_atari_checkpoints_ordered [Dataset]. https://www.tensorflow.org/datasets/catalog/rlu_atari_checkpoints_ordered
    Explore at:
    Dataset updated
    Dec 9, 2021
    Description

    RL Unplugged is suite of benchmarks for offline reinforcement learning. The RL Unplugged is designed around the following considerations: to facilitate ease of use, we provide the datasets with a unified API which makes it easy for the practitioner to work with all data in the suite once a general pipeline has been established.

    The datasets follow the RLDS format to represent steps and episodes.

    We are releasing a large and diverse dataset of gameplay following the protocol described by Agarwal et al., 2020, which can be used to evaluate several discrete offline RL algorithms. The dataset is generated by running an online DQN agent and recording transitions from its replay during training with sticky actions Machado et al., 2018. As stated in Agarwal et al., 2020, for each game we use data from five runs with 50 million transitions each. We release datasets for 46 Atari games. For details on how the dataset was generated, please refer to the paper. Please see this note about the ROM versions used to generate the datasets.

    Atari is a standard RL benchmark. We recommend you to try offline RL methods on Atari if you are interested in comparing your approach to other state of the art offline RL methods with discrete actions.

    The reward of each step is clipped (obtained with [-1, 1] clipping) and the episode includes the sum of the clipped reward per episode.

    Each of the configurations is broken into splits. Splits correspond to checkpoints of 1M steps (note that the number of episodes may difer). Checkpoints are ordered in time (so checkpoint 0 ran before checkpoint 1).

    Episodes within each split are ordered. Check https://www.tensorflow.org/datasets/determinism if you want to ensure that you read episodes in order.

    This dataset corresponds to the one used in the DQN replay paper. https://research.google/tools/datasets/dqn-replay/

    To use this dataset:

    import tensorflow_datasets as tfds
    
    ds = tfds.load('rlu_atari_checkpoints_ordered', split='train')
    for ex in ds.take(4):
     print(ex)
    

    See the guide for more informations on tensorflow_datasets.

  17. T

    open_images_v4

    • tensorflow.org
    • opendatalab.com
    Updated Jun 1, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). open_images_v4 [Dataset]. https://www.tensorflow.org/datasets/catalog/open_images_v4
    Explore at:
    Dataset updated
    Jun 1, 2024
    Description

    Open Images is a dataset of ~9M images that have been annotated with image-level labels and object bounding boxes.

    The training set of V4 contains 14.6M bounding boxes for 600 object classes on 1.74M images, making it the largest existing dataset with object location annotations. The boxes have been largely manually drawn by professional annotators to ensure accuracy and consistency. The images are very diverse and often contain complex scenes with several objects (8.4 per image on average). Moreover, the dataset is annotated with image-level labels spanning thousands of classes.

    To use this dataset:

    import tensorflow_datasets as tfds
    
    ds = tfds.load('open_images_v4', split='train')
    for ex in ds.take(4):
     print(ex)
    

    See the guide for more informations on tensorflow_datasets.

    https://storage.googleapis.com/tfds-data/visualization/fig/open_images_v4-original-2.0.0.png" alt="Visualization" width="500px">

  18. d

    MLP-based Learnable Window Size Dataset for Bitcoin Market Price

    • search.dataone.org
    • dataverse.harvard.edu
    Updated Nov 8, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rajabi, Shahab (2023). MLP-based Learnable Window Size Dataset for Bitcoin Market Price [Dataset]. http://doi.org/10.7910/DVN/5YBLKV
    Explore at:
    Dataset updated
    Nov 8, 2023
    Dataset provided by
    Harvard Dataverse
    Authors
    Rajabi, Shahab
    Description

    The dataset of this paper is collected based on Google, Blockchain, and the Bitcoin market. Generally, there is a total of 26 features, however, a feature whose correlation rate is lower than 0.3 between the variations of price and the variations of feature has been eliminated. Hence, a total of 21 practical features including Market capitalization, Trade-volume, Transaction-fees USD, Average confirmation time, Difficulty, High price, Low price, Total hash rate, Block-size, Miners-revenue, N-transactions-total, Google searches, Open price, N-payments-per Block, Total circulating Bitcoin, Cost-per-transaction percent, Fees-USD-per transaction, N-unique-addresses, N-transactions-per block, and Output-volume have been selected. In addition to the values of these features, for each feature, a new one is created that includes the difference between the previous day and the day before the previous day as a supportive feature. From the point of view of the number and history of the dataset used, a total of 1275 training data were used in the proposed model to extract patterns of Bitcoin price and they were collected from 12 Nov 2018 to 4 Jun 2021.

  19. e

    Tweets used to explore the potential role of social media data in responding...

    • b2find.eudat.eu
    Updated Oct 22, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2023). Tweets used to explore the potential role of social media data in responding to new and emerging forms of food fraud 2018 - Dataset - B2FIND [Dataset]. https://b2find.eudat.eu/dataset/a7d0b792-8f3b-59e7-939b-f3351ef2506e
    Explore at:
    Dataset updated
    Oct 22, 2023
    Description

    Data collected from Twitter social media platform (6 May 2018 - 16 May 2018) to explore the potential role of social media data in responding to new and emerging forms of food fraud reported on social media from posts originating in the UK. The dataset contains Tweet IDs and keywords used to search for Tweets using a programatic access via the public Twitter API. Keywords used in this search were generated using a machine learning tool and consisted of a combinations of keywords describing terms related to food and outrage.Social media and other forms of online content have enormous potential as a way to understand people's opinions and attitudes, and as a means to observe emerging phenomena - such as disease outbreaks. How might policy makers use such new forms of data to better assess existing policies and help formulate new ones? This one year demonstrator project is a partnership between computer science academics at the University of Aberdeen and officers from Food Standards Scotland which aims to answer this question. Food Standards Scotland is the public-sector food body for Scotland created by the Food (Scotland) Act 2015. It regularly provides policy guidance to ministers in areas such as food hygiene monitoring and reporting, food-related health risks, and food fraud. The project will develop a software tool (the Food Sentiment Observatory) that will be used to explore the role of data from sources such as Twitter, Facebook, and TripAdvisor in three policy areas selected by Food Standards Scotland: - attitudes to the differing food hygiene information systems used in Scotland and the other UK nations; - study of an historical E.coli outbreak to understand effectiveness of monitoring and decision making protocols; - understanding the potential role of social media data in responding to new and emerging forms of food fraud. The Observatory will integrate a number of existing software tools (developed in our recent research) to allow us to mine large volumes of data to identify important textual signals, extract opinions held by individuals or groups, and crucially, to document these data processing operations - to aid transparency of policy decision-making. Given the amount of noise appearing in user-generated online content (such as fake restaurant reviews) it is our intention to investigate methods to extract meaningful and reliable knowledge, to better support policy making. The search for relevant data content was performed using a custom built data collection module within the Observatory platform (https://sites.google.com/view/foobs/the-observatory). A public API provided by Twitter was utilised to gather all social media messages (Tweets) matching a specific set of keywords. Each line in the food-keywords.txt file (group 1) and in the in the outrage-keywords.txt file (group 2) contains a search keyword/phrase. A list of search keywords was then created from all possible combinations of individual keywords/phrases form group 1 and group 2. A matching Tweet, returned by the search had to include at least one combination of such search keywords/phrases. Therefore, the search string used by the API was constructed as follows: ( ) OR ( ) OR ... *Note: the space between represents a logical AND in terms of the Twitter API service. The Twitter API allows historical searches to be restricted to Tweets associated with a specific location, however, this can be only specified as a specific radius from a given latitude and longitude geo-point. We used Twitter's geo-resticted search by defining a Lat/Long point and radius (in kilometres). In order to cover major areas in the UK we used the following four geo-restrictions: Latitude =57.334942 Longitude=-4.395858 Radius = 253 km; Latitude =55.288000 Longitude=-2.374374 Radius = 282 km; Latitude =52.250808 Longitude=-0.660507 Radius = 198 km; Latitude =51.953880 Longitude=-2.989608 Radius = 198 km.

  20. CCIHP dataset

    • kaggle.com
    zip
    Updated Oct 29, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Angelique Loesch (2022). CCIHP dataset [Dataset]. https://www.kaggle.com/angeliqueloesch/ccihp-characterized-crowd-instancelevel-hp
    Explore at:
    zip(24041 bytes)Available download formats
    Dataset updated
    Oct 29, 2022
    Authors
    Angelique Loesch
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    Characterized Crowd Instance-level Human Parsing (CCIHP) dataset

    CCIHP dataset is devoted to fine-grained description of people in the wild with localized & characterized semantic attributes. It contains 20 attribute classes and 20 characteristic classes split into 3 categories (size, pattern and color). The annotations were made with Pixano, an opensource, smart annotation tool for computer vision applications: https://pixano.cea.fr/

    CCIHP dataset provides pixelwise image annotations for:

    • human segmentation,
    • semantic attribute segmentation and
    • semantic attribute characterization.

    Dataset description

    • Images:
      The image data are the same as CIHP dataset (see Section Related work) proposed at the LIP (Look Into Person) challenge. They are available at google drive and baidu drive. (Baidu link does not need access right).

    • Annotations:
      Please download and unzip the CCIHP_icip.zip file. The CCIHP annotations can be found in the Training and Validation sub-folders of CCIHP_icip2021/dataset/ folder. They correspond to, respectively, 28,280 training images and 5,000 validation images. Annotations consist of:

      • Human_ids: person instance labels
      • Instance_ids: attribute instance labels
      • Category_ids: attribute category labels
      • Size_ids: size category labels
      • Pattern_ids: pattern category labels
      • Color_ids: color category labels

      Label meaning for semantic attribute/body parts:

      1. Hat: Hat, helmet, cap, hood, veil, headscarf, part covering the skull and hair of a hood/balaclava, crown...
      2. Hair
      3. Glove
      4. Sunglasses/Glasses: Sunglasses, eyewear, protective glasses...
      5. UpperClothes: T-shirt, shirt, tank top, sweater under a coat, top of a dress...
      6. Face Mask: Protective mask, surgical mask, carnival mask, facial part of a balaclava, visor of a helmet...
      7. Coat: Coat, jacket worn without anything on it, vest with nothing on it, a sweater with nothing on it...
      8. Socks
      9. Pants: Pants, shorts, tights, leggings, swimsuit bottoms... (clothing with 2 legs)
      10. Torso-skin
      11. Scarf: Scarf, bow tie, tie...
      12. Skirt: Skirt, kilt, bottom of a dress...
      13. Face
      14. Left-arm (naked part)
      15. Right-arm (naked part)
      16. Left-leg (naked part)
      17. Right-leg (naked part)
      18. Left-shoe
      19. Right-shoe
      20. Bag: Backpack, shoulder bag, fanny pack... (bag carried on oneself)
      21. Others: Jewelry, tags, bibs, belts, ribbons, pins, head decorations, headphones...

      Label meaning for size characterization:

      1. Short: Small, short, narrow
      2. Long: Long, large, big
      3. Undetermined: If the attribute is partially hidden
      4. Sparse/bald: For hair attribute only

      Label meaning for pattern characterization:

      1. Solid
      2. Geometrical: Stripes, Checks, Dots...
      3. Fancy: Flowers, Military...
      4. Letters: Letters, numbers, symbols...

      Label meaning for color characterization:

      1. Dark: No dominant color, includes black, navy blue
      2. Medium: No dominant color, including gray
      3. Light: No dominant color, including white
      4. Brown
      5. Red
      6. Pink
      7. Yellow
      8. Orange
      9. Green
      10. Blue
      11. Purple
      12. Multicolor: When there is a pattern with several colors

    Related work

    Our work is based on CIHP image dataset from: Ke Gong, Xiaodan Liang, Yicheng Li, Yimin Chen, Ming Yang and Liang Lin, "Instance-level Human Parsing via Part Grouping Network", ECCV 2018.

    Evaluation

    To evaluate the predictions given by a Human Parsing with Characteristics model, you can run the python scripts in CCIHP_icip2021/evaluation/ folder.

    Requirements

    • python==3.6+
    • opencv-python
    • pillow

    Evaluation steps

    1. Run generate_characteristic_instance_part_ccihp.py
    2. Run eval_test_characteristic_inst_part_ap_ccihp.py for mean Average Precision based on characterized region (AP^(cr)_(vol)). It evaluates the prediction of characteristic (class & score) relative to each instanced and characterized attribute mask, independently of the attribute class prediction.
    3. Run metric_ccihp_miou_evaluation.py for a mIoU performance evaluation of semantic predictions (attribute or characteristics).

    License

    Data annotations are under Creative Commons Attribution Non Commercial 4.0 license (see LICENSE file).

    Evaluation codes are under MIT license.

    Citation

    A. Loesch and R. Audigier, "Describe Me If You Can! Characterized Instance-Level Human Parsing," 2021 IEEE International Conference on Image Processing (ICIP), 2021, pp. 2528-2532, doi: 10.1109/ICIP42928.2021.9506509.

    @INPROCEEDINGS{ccihp_dataset_2021,
     author={Loesch, Angelique and Audigier, Romaric},
     booktitle={2021 IEEE International Conference on Image Processing (ICIP)}, 
     title={Describe Me If You Can! Characterized Instance-Level Human Parsing}, 
     year={2021},
     volume={},
     number={},
     pages={2528-2532},
     doi={10.1109/ICIP42928.2021.9506509}},
    

    Contact

    If you have any question about this dataset, you can contact us by email at: ccihp-dataset@cea.fr

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Eduardo Fonseca; Eduardo Fonseca; Xavier Favory; Jordi Pons; Frederic Font; Frederic Font; Manoj Plakal; Daniel P. W. Ellis; Daniel P. W. Ellis; Xavier Serra; Xavier Serra; Xavier Favory; Jordi Pons; Manoj Plakal (2020). FSDKaggle2018 [Dataset]. http://doi.org/10.5281/zenodo.2552860
Organization logo

FSDKaggle2018

Explore at:
44 scholarly articles cite this dataset (View in Google Scholar)
zipAvailable download formats
Dataset updated
Jan 24, 2020
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Eduardo Fonseca; Eduardo Fonseca; Xavier Favory; Jordi Pons; Frederic Font; Frederic Font; Manoj Plakal; Daniel P. W. Ellis; Daniel P. W. Ellis; Xavier Serra; Xavier Serra; Xavier Favory; Jordi Pons; Manoj Plakal
Description

FSDKaggle2018 is an audio dataset containing 11,073 audio files annotated with 41 labels of the AudioSet Ontology. FSDKaggle2018 has been used for the DCASE Challenge 2018 Task 2, which was run as a Kaggle competition titled Freesound General-Purpose Audio Tagging Challenge.

Citation

If you use the FSDKaggle2018 dataset or part of it, please cite our DCASE 2018 paper:

Eduardo Fonseca, Manoj Plakal, Frederic Font, Daniel P. W. Ellis, Xavier Favory, Jordi Pons, Xavier Serra. "General-purpose Tagging of Freesound Audio with AudioSet Labels: Task Description, Dataset, and Baseline". Proceedings of the DCASE 2018 Workshop (2018)

You can also consider citing our ISMIR 2017 paper, which describes how we gathered the manual annotations included in FSDKaggle2018.

Eduardo Fonseca, Jordi Pons, Xavier Favory, Frederic Font, Dmitry Bogdanov, Andres Ferraro, Sergio Oramas, Alastair Porter, and Xavier Serra, "Freesound Datasets: A Platform for the Creation of Open Audio Datasets", In Proceedings of the 18th International Society for Music Information Retrieval Conference, Suzhou, China, 2017

Contact

You are welcome to contact Eduardo Fonseca should you have any questions at eduardo.fonseca@upf.edu.

About this dataset

Freesound Dataset Kaggle 2018 (or FSDKaggle2018 for short) is an audio dataset containing 11,073 audio files annotated with 41 labels of the AudioSet Ontology [1]. FSDKaggle2018 has been used for the Task 2 of the Detection and Classification of Acoustic Scenes and Events (DCASE) Challenge 2018. Please visit the DCASE2018 Challenge Task 2 website for more information. This Task was hosted on the Kaggle platform as a competition titled Freesound General-Purpose Audio Tagging Challenge. It was organized by researchers from the Music Technology Group of Universitat Pompeu Fabra, and from Google Research’s Machine Perception Team.

The goal of this competition was to build an audio tagging system that can categorize an audio clip as belonging to one of a set of 41 diverse categories drawn from the AudioSet Ontology.

All audio samples in this dataset are gathered from Freesound [2] and are provided here as uncompressed PCM 16 bit, 44.1 kHz, mono audio files. Note that because Freesound content is collaboratively contributed, recording quality and techniques can vary widely.

The ground truth data provided in this dataset has been obtained after a data labeling process which is described below in the Data labeling process section. FSDKaggle2018 clips are unequally distributed in the following 41 categories of the AudioSet Ontology:

"Acoustic_guitar", "Applause", "Bark", "Bass_drum", "Burping_or_eructation", "Bus", "Cello", "Chime", "Clarinet", "Computer_keyboard", "Cough", "Cowbell", "Double_bass", "Drawer_open_or_close", "Electric_piano", "Fart", "Finger_snapping", "Fireworks", "Flute", "Glockenspiel", "Gong", "Gunshot_or_gunfire", "Harmonica", "Hi-hat", "Keys_jangling", "Knock", "Laughter", "Meow", "Microwave_oven", "Oboe", "Saxophone", "Scissors", "Shatter", "Snare_drum", "Squeak", "Tambourine", "Tearing", "Telephone", "Trumpet", "Violin_or_fiddle", "Writing".

Some other relevant characteristics of FSDKaggle2018:

  • The dataset is split into a train set and a test set.

  • The train set is meant to be for system development and includes ~9.5k samples unequally distributed among 41 categories. The minimum number of audio samples per category in the train set is 94, and the maximum 300. The duration of the audio samples ranges from 300ms to 30s due to the diversity of the sound categories and the preferences of Freesound users when recording sounds. The total duration of the train set is roughly 18h.

  • Out of the ~9.5k samples from the train set, ~3.7k have manually-verified ground truth annotations and ~5.8k have non-verified annotations. The non-verified annotations of the train set have a quality estimate of at least 65-70% in each category. Checkout the Data labeling process section below for more information about this aspect.

  • Non-verified annotations in the train set are properly flagged in train.csv so that participants can opt to use this information during the development of their systems.

  • The test set is composed of 1.6k samples with manually-verified annotations and with a similar category distribution than that of the train set. The total duration of the test set is roughly 2h.

  • All audio samples in this dataset have a single label (i.e. are only annotated with one label). Checkout the Data labeling process section below for more information about this aspect. A single label should be predicted for each file in the test set.

Data labeling process

The data labeling process started from a manual mapping between Freesound tags and AudioSet Ontology categories (or labels), which was carried out by researchers at the Music Technology Group, Universitat Pompeu Fabra, Barcelona. Using this mapping, a number of Freesound audio samples were automatically annotated with labels from the AudioSet Ontology. These annotations can be understood as weak labels since they express the presence of a sound category in an audio sample.

Then, a data validation process was carried out in which a number of participants did listen to the annotated sounds and manually assessed the presence/absence of an automatically assigned sound category, according to the AudioSet category description.

Audio samples in FSDKaggle2018 are only annotated with a single ground truth label (see train.csv). A total of 3,710 annotations included in the train set of FSDKaggle2018 are annotations that have been manually validated as present and predominant (some with inter-annotator agreement but not all of them). This means that in most cases there is no additional acoustic material other than the labeled category. In few cases there may be some additional sound events, but these additional events won't belong to any of the 41 categories of FSDKaggle2018.

The rest of the annotations have not been manually validated and therefore some of them could be inaccurate. Nonetheless, we have estimated that at least 65-70% of the non-verified annotations per category in the train set are indeed correct. It can happen that some of these non-verified audio samples present several sound sources even though only one label is provided as ground truth. These additional sources are typically out of the set of the 41 categories, but in a few cases they could be within.

More details about the data labeling process can be found in [3].

License

FSDKaggle2018 has licenses at two different levels, as explained next.

All sounds in Freesound are released under Creative Commons (CC) licenses, and each audio clip has its own license as defined by the audio clip uploader in Freesound. For attribution purposes and to facilitate attribution of these files to third parties, we include a relation of the audio clips included in FSDKaggle2018 and their corresponding license. The licenses are specified in the files train_post_competition.csv and test_post_competition_scoring_clips.csv.

In addition, FSDKaggle2018 as a whole is the result of a curation process and it has an additional license. FSDKaggle2018 is released under CC-BY. This license is specified in the LICENSE-DATASET file downloaded with the FSDKaggle2018.doc zip file.

Files

FSDKaggle2018 can be downloaded as a series of zip files with the following directory structure:

root
│
└───FSDKaggle2018.audio_train/ Audio clips in the train set │
└───FSDKaggle2018.audio_test/ Audio clips in the test set │
└───FSDKaggle2018.meta/ Files for evaluation setup │ │
│ └───train_post_competition.csv Data split and ground truth for the train set │ │
│ └───test_post_competition_scoring_clips.csv Ground truth for the test set

└───FSDKaggle2018.doc/ │
└───README.md The dataset description file you are reading │
└───LICENSE-DATASET

Search
Clear search
Close search
Google apps
Main menu