100+ datasets found
  1. Data generation volume worldwide 2010-2029

    • statista.com
    Updated Nov 19, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2025). Data generation volume worldwide 2010-2029 [Dataset]. https://www.statista.com/statistics/871513/worldwide-data-created/
    Explore at:
    Dataset updated
    Nov 19, 2025
    Dataset authored and provided by
    Statistahttp://statista.com/
    Area covered
    Worldwide
    Description

    The total amount of data created, captured, copied, and consumed globally is forecast to increase rapidly. While it was estimated at ***** zettabytes in 2025, the forecast for 2029 stands at ***** zettabytes. Thus, global data generation will triple between 2025 and 2029. Data creation has been expanding continuously over the past decade. In 2020, the growth was higher than previously expected, caused by the increased demand due to the coronavirus (COVID-19) pandemic, as more people worked and learned from home and used home entertainment options more often.

  2. Rental Bikes Volume Prediction

    • kaggle.com
    zip
    Updated Jul 25, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Gaurav Dutta (2024). Rental Bikes Volume Prediction [Dataset]. https://www.kaggle.com/datasets/gauravduttakiit/rental-bikes-volume-prediction
    Explore at:
    zip(379234 bytes)Available download formats
    Dataset updated
    Jul 25, 2024
    Authors
    Gaurav Dutta
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    This week presents an exciting new data science challenge: creating a model to predict the volume of rented bikes at an hour-level granularity for the given instances within the test data. This time series analysis problem opens up numerous possibilities for innovative and effective solutions as applicable within the mobility space and services. We eagerly anticipate your creative approaches and outcomes.

    Challenge Details Your task is to develop a time series regression model capable of accurately predicting the volume of rented bikes at an hour-level granularity for the given dates in test data. This problem requires participants to apply their time series analysis and machine learning knowledge.

  3. f

    Execution times for processing 2 hours of audio for different distributed...

    • plos.figshare.com
    • datasetcatalog.nlm.nih.gov
    xls
    Updated Jun 2, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Alexander Brown; Saurabh Garg; James Montgomery (2023). Execution times for processing 2 hours of audio for different distributed system configurations. [Dataset]. http://doi.org/10.1371/journal.pone.0201542.t008
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 2, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Alexander Brown; Saurabh Garg; James Montgomery
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The master and each slave have 4 cores, so 16 cores uses 4 virtual machines. Improvement rate is given by execution time of 1 core/Execution time of x cores.

  4. Child and Family Services Reviews Update Volume 17, Issue 2, March 2023

    • catalog.data.gov
    • data.virginia.gov
    Updated Sep 6, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Administration for Children and Families (2025). Child and Family Services Reviews Update Volume 17, Issue 2, March 2023 [Dataset]. https://catalog.data.gov/dataset/child-and-family-services-reviews-update-volume-17-issue-2-march-2023
    Explore at:
    Dataset updated
    Sep 6, 2025
    Dataset provided by
    Administration for Children and Families
    Description

    This issue of Child and Family Services Reviews Update contains the following sections: Round 4 Year 1 States are Gearing Up, New Mock Case Course Released on E-Learning Academy, Spanish Translations Added to Portal, New Round 4 FAQs Posted, and Child Welfare Reviews Project Upcoming Presentations. Metadata-only record linking to the original dataset. Open original dataset below.

  5. Sound and Audio Data in Vietnam

    • kaggle.com
    zip
    Updated Apr 3, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Techsalerator (2025). Sound and Audio Data in Vietnam [Dataset]. https://www.kaggle.com/datasets/techsalerator/sound-and-audio-data-in-vietnam
    Explore at:
    zip(12171329 bytes)Available download formats
    Dataset updated
    Apr 3, 2025
    Authors
    Techsalerator
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Area covered
    Vietnam
    Description

    Techsalerator’s Location Sentiment Data for Vietnam

    Techsalerator’s Location Sentiment Data for Vietnam provides a robust dataset tailored for businesses, researchers, and developers. This collection offers valuable insights into how individuals perceive different locations across Vietnam, with a focus on sentiment analysis derived from social media, news, and location-based data.

    For access to the full dataset, contact us at info@techsalerator.com or visit Techsalerator Contact Us.

    Techsalerator’s Location Sentiment Data for Vietnam

    Techsalerator’s Location Sentiment Data for Vietnam delivers a detailed breakdown of sentiment trends in various regions of the country. This dataset is crucial for market research, urban development, tourism, and political analysis, providing valuable insights into public perception across urban, suburban, and rural areas.

    Top 5 Key Data Fields

    • Location Sentiment Score – Provides a numerical representation of sentiment for different geographic areas, including positive, neutral, and negative sentiments.
    • Geographic Region – Identifies the specific location within Vietnam where the sentiment was captured, from major cities to rural areas, supporting granular analysis.
    • Sentiment Type – Categorizes sentiment into positive, negative, or neutral, helping to assess public opinion about a location.
    • Time of Sentiment Capture – Records the timestamp for when the sentiment data was captured, allowing for trend analysis over different periods.
    • Source of Sentiment Data – Highlights the origin of the sentiment (social media platforms, news outlets, forums, etc.), ensuring data reliability and accuracy.

    Top 5 Sentiment Trends in Vietnam

    • Urban Growth Perception – Increasingly positive sentiment in urban areas like Ho Chi Minh City and Hanoi due to economic development, leading to shifts in real estate and business strategies.
    • Tourism Sentiment – Positive shifts in tourist destination areas, such as Da Nang and Phu Quoc, as Vietnam becomes a popular travel hub, influencing the hospitality industry.
    • Environmental Concerns – Growing negative sentiment related to environmental issues in coastal regions, due to pollution and climate change awareness, impacting policy and advocacy.
    • Tech and Innovation Sentiment – Highly positive sentiment in tech hubs such as Hanoi, where innovation and startups are thriving, influencing venture capital investments.
    • Political Sentiment – Public sentiment analysis around government policies and political developments, with notable variations between urban and rural perceptions.

    Top 5 Applications of Location Sentiment Data in Vietnam

    • Market Research – Brands can use sentiment data to tailor marketing campaigns to specific regions, adjusting their strategies based on local perceptions.
    • Urban Planning – Governments and urban developers use sentiment data to understand public opinion on infrastructure projects, aiding in better decision-making.
    • Tourism Promotion – Tourism authorities can use sentiment analysis to identify high-interest locations and optimize marketing for visitor attractions.
    • Public Policy Analysis – Political organizations use sentiment data to gauge public opinion on policies and campaigns, ensuring alignment with voter concerns.
    • Corporate Strategy – Businesses use sentiment data to understand the emotional connection customers have with different regions, guiding product development and expansion decisions.

    Accessing Techsalerator’s Location Sentiment Data

    To obtain Techsalerator’s Location Sentiment Data for Vietnam, contact info@techsalerator.com with your specific requirements. Techsalerator provides customized datasets based on requested fields, with delivery available within 24 hours. Ongoing access options can also be discussed.

    Included Data Fields

    • Location Sentiment Score
    • Geographic Region
    • Sentiment Type (Positive, Neutral, Negative)
    • Time of Sentiment Capture
    • Source of Sentiment Data
    • Sentiment Breakdown by Demographics
    • Trending Topics in Locations
    • Event-Based Sentiment Shifts
    • Influencer Impact on Sentiment
    • Contact Information

    For comprehensive insights into public sentiment across various locations in Vietnam, Techsalerator’s dataset is a valuable resource for researchers, marketers, urban planners, and policymakers.

  6. V

    Patent AT-E401630-T1: [Translated] VOLUME MEASUREMENTS IN THREE-DIMENSIONAL...

    • data.virginia.gov
    • healthdata.gov
    • +1more
    html
    Updated Sep 6, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    National Center for Biotechnology Information (NCBI) (2025). Patent AT-E401630-T1: [Translated] VOLUME MEASUREMENTS IN THREE-DIMENSIONAL DATA SETS [Dataset]. https://data.virginia.gov/dataset/patent-at-e401630-t1-translated-volume-measurements-in-three-dimensional-data-sets
    Explore at:
    htmlAvailable download formats
    Dataset updated
    Sep 6, 2025
    Dataset provided by
    National Center for Biotechnology Information (NCBI)
    Description

    Volume measurement of for example a tumor in a 3D image dataset is an important and often performed task. The problem is to segment the tumor out of this volume in order to measure its dimensions. This problem is complicated by the fact that the tumors are often connected to vessels and other organs. According to the present invention, an automated method and corresponding device and computer software are provided, which analyze a volume of interest around a singled out tumor, and which, by virtue of a 3D distance transform and a region drawing scheme advantageously allow to automatically segment a tumor out of a given volume.

  7. Speech Synthesis Data | 400 Hours | TTS Data | Audio Data | AI Training...

    • datarade.ai
    Updated Dec 10, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nexdata (2023). Speech Synthesis Data | 400 Hours | TTS Data | Audio Data | AI Training Data| AI Datasets [Dataset]. https://datarade.ai/data-products/nexdata-multilingual-speech-synthesis-data-400-hours-a-nexdata
    Explore at:
    .bin, .json, .xml, .csv, .xls, .sql, .txtAvailable download formats
    Dataset updated
    Dec 10, 2023
    Dataset authored and provided by
    Nexdata
    Area covered
    Canada, Austria, Colombia, Malaysia, Belgium, Sweden, Singapore, China, Philippines, Hong Kong
    Description
    1. Specifications Format : 44.1 kHz/48 kHz, 16bit/24bit, uncompressed wav, mono channel.

    Recording environment : professional recording studio.

    Recording content : general narrative sentences, interrogative sentences, etc.

    Speaker : native speaker

    Annotation Feature : word transcription, part-of-speech, phoneme boundary, four-level accents, four-level prosodic boundary.

    Device : Microphone

    Language : American English, British English, Japanese, French, Dutch, Catonese, Canadian French,Australian English, Italian, New Zealand English, Spanish, Mexican Spanish

    Application scenarios : speech synthesis

    Accuracy rate: Word transcription: the sentences accuracy rate is not less than 99%. Part-of-speech annotation: the sentences accuracy rate is not less than 98%. Phoneme annotation: the sentences accuracy rate is not less than 98% (the error rate of voiced and swallowed phonemes is not included, because the labelling is more subjective). Accent annotation: the word accuracy rate is not less than 95%. Prosodic boundary annotation: the sentences accuracy rate is not less than 97% Phoneme boundary annotation: the phoneme accuracy rate is not less than 95% (the error range of boundary is within 5%)

    1. About Nexdata Nexdata owns off-the-shelf PB-level Large Language Model(LLM) Data, 3 million hours of Audio Data and 800TB of Annotated Imagery Data. These ready-to-go AI & ML Training Data support instant delivery, quickly improve the accuracy of AI models. For more details, please visit us at https://www.nexdata.ai/datasets/tts?source=Datarade
  8. Child and Family Services Reviews Update Volume 14, Issue 4, September 2020

    • catalog.data.gov
    • data.virginia.gov
    Updated Sep 30, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Administration for Children and Families (2025). Child and Family Services Reviews Update Volume 14, Issue 4, September 2020 [Dataset]. https://catalog.data.gov/dataset/child-and-family-services-reviews-update-volume-14-issue-4-september-2020
    Explore at:
    Dataset updated
    Sep 30, 2025
    Dataset provided by
    Administration for Children and Families
    Description

    This issue of Child and Family Services Reviews Update contains the following sections:Technical Bulletin #12 Issued, Systematic Factors Report for Round 3 Released, Statewide Data indicators Updated, and CFSR Information Portal Refresh: New Look and Feel! Metadata-only record linking to the original dataset. Open original dataset below.

  9. Load balance distribution across different numbers of machines.

    • plos.figshare.com
    • figshare.com
    xls
    Updated Jun 4, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Alexander Brown; Saurabh Garg; James Montgomery (2023). Load balance distribution across different numbers of machines. [Dataset]. http://doi.org/10.1371/journal.pone.0201542.t009
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 4, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Alexander Brown; Saurabh Garg; James Montgomery
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The maximum and minimum load refer to the average load of VMs with the largest and smallest loads (in terms of percentage of files processed) over five trials. The p-value is derived from a single-factor ANOVA test. p ≤ 0.05 indicates, with 95% confidence, that processing loads are not equal.

  10. Divide and Remaster (DnR)

    • zenodo.org
    • data.niaid.nih.gov
    application/gzip
    Updated Mar 22, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Darius Petermann; Gordon Wichern; Zhong-Qiu Wang; Jonathan Le Roux; Darius Petermann; Gordon Wichern; Zhong-Qiu Wang; Jonathan Le Roux (2023). Divide and Remaster (DnR) [Dataset]. http://doi.org/10.5281/zenodo.5574713
    Explore at:
    application/gzipAvailable download formats
    Dataset updated
    Mar 22, 2023
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Darius Petermann; Gordon Wichern; Zhong-Qiu Wang; Jonathan Le Roux; Darius Petermann; Gordon Wichern; Zhong-Qiu Wang; Jonathan Le Roux
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Introduction:

    Divide and Remaster (DnR) is a source separation dataset for training and testing algorithms that separate a monaural audio signal into speech, music, and sound effects/background stems. The dataset is composed of artificial mixtures using audio from the librispeech, free music archive (FMA), and Freesound Dataset 50k (FSD50k). We introduce it as part of the Cocktail Fork Problem paper.

    At a Glance:

    • The size of the unzipped dataset is ~174GB
    • Each mixture is 60 seconds long and sources are not fully overlapped
    • Audio is encoded as 16-bit .wav files at a sampling rate of 44.1 kHz
    • The data is split into training tr (3295 mixtues), validation cv (440 mixtures) and testing tt (652 mixtures) subsets
    • The directory for each mixture contains four .wav files, mix.wav, music.wav, speech.wav, sfx.wav, and annots.csv which contains the metadata for the original audio used to compose the mixture (transcriptions for speech, sound classes for sfx, and genre labels for music)

    Other Resources:

    Demo examples and additional information are available at: https://cocktail-fork.github.io/

    For more details about the data generation process, the code used to generate our dataset can be found at the following: https://github.com/darius522/dnr-utils

    Contact and Support:

    Have an issue, concern, or question about DnR ? If so, please open an issue here.

    For any other inquiries, feel free to shoot an email at: firstname.lastname@gmail.com, my name is Darius Petermann ;)

    Citation:

    If you use DnR please cite [our paper](https://arxiv.org/abs/2110.09958) in which we introduce the dataset as part of the Cocktail Fork Problem:

    @article{Petermann2021cocktail,
      title={The Cocktail Fork Problem: Three-Stem Audio Separation for Real-World Soundtracks}, 
      author={Darius Petermann and Gordon Wichern and Zhong-Qiu Wang and Jonathan {Le Roux}},
      year={2021},
      journal={arXiv preprint arXiv:2110.09958},
      archivePrefix={arXiv},
      primaryClass={eess.AS}
    }

  11. Child and Family Services Reviews Update Volume 14, Issue 2, April 2020

    • catalog.data.gov
    • data.virginia.gov
    Updated Sep 6, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Administration for Children and Families (2025). Child and Family Services Reviews Update Volume 14, Issue 2, April 2020 [Dataset]. https://catalog.data.gov/dataset/child-and-family-services-reviews-update-volume-14-issue-2-april-2020
    Explore at:
    Dataset updated
    Sep 6, 2025
    Dataset provided by
    Administration for Children and Families
    Description

    This issue of Child and Family Services Reviews Update contains the following sections: CB Response to COVID-19 Pandemic, CFSR Technical Bulletin #11, Enhancements to the Data Indicator Visualizations, State Data Profiles Released, ACYF-CB-PI-20-02— State Guidance on FY 2021 APSR, Status of Approved PIPs, and PIP Evaluation During Non-Overlapping Data Periods. Metadata-only record linking to the original dataset. Open original dataset below.

  12. FSD50K

    • data.niaid.nih.gov
    • opendatalab.com
    • +2more
    Updated Apr 24, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Eduardo Fonseca; Xavier Favory; Jordi Pons; Frederic Font; Xavier Serra (2022). FSD50K [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_4060431
    Explore at:
    Dataset updated
    Apr 24, 2022
    Dataset provided by
    Music Technology Grouphttps://www.upf.edu/web/mtg
    Authors
    Eduardo Fonseca; Xavier Favory; Jordi Pons; Frederic Font; Xavier Serra
    Description

    FSD50K is an open dataset of human-labeled sound events containing 51,197 Freesound clips unequally distributed in 200 classes drawn from the AudioSet Ontology. FSD50K has been created at the Music Technology Group of Universitat Pompeu Fabra.

    Citation

    If you use the FSD50K dataset, or part of it, please cite our TASLP paper (available from [arXiv] [TASLP]):

    @article{fonseca2022FSD50K, title={{FSD50K}: an open dataset of human-labeled sound events}, author={Fonseca, Eduardo and Favory, Xavier and Pons, Jordi and Font, Frederic and Serra, Xavier}, journal={IEEE/ACM Transactions on Audio, Speech, and Language Processing}, volume={30}, pages={829--852}, year={2022}, publisher={IEEE} }

    Paper update: This paper has been published in TASLP at the beginning of 2022. The accepted camera-ready version includes a number of improvements with respect to the initial submission. The main updates include: estimation of the amount of label noise in FSD50K, SNR comparison between FSD50K and AudioSet, improved description of evaluation metrics including equations, clarification of experimental methodology and some results, some content moved to Appendix for readability. The TASLP-accepted camera-ready version is available from arXiv (in particular, it is v2 in arXiv, displayed by default).

    Data curators

    Eduardo Fonseca, Xavier Favory, Jordi Pons, Mercedes Collado, Ceren Can, Rachit Gupta, Javier Arredondo, Gary Avendano and Sara Fernandez

    Contact

    You are welcome to contact Eduardo Fonseca should you have any questions, at efonseca@google.com.

    ABOUT FSD50K

    Freesound Dataset 50k (or FSD50K for short) is an open dataset of human-labeled sound events containing 51,197 Freesound clips unequally distributed in 200 classes drawn from the AudioSet Ontology [1]. FSD50K has been created at the Music Technology Group of Universitat Pompeu Fabra.

    What follows is a brief summary of FSD50K's most important characteristics. Please have a look at our paper (especially Section 4) to extend the basic information provided here with relevant details for its usage, as well as discussion, limitations, applications and more.

    Basic characteristics:

    FSD50K contains 51,197 audio clips from Freesound, totalling 108.3 hours of multi-labeled audio

    The dataset encompasses 200 sound classes (144 leaf nodes and 56 intermediate nodes) hierarchically organized with a subset of the AudioSet Ontology.

    The audio content is composed mainly of sound events produced by physical sound sources and production mechanisms, including human sounds, sounds of things, animals, natural sounds, musical instruments and more. The vocabulary can be inspected in vocabulary.csv (see Files section below).

    The acoustic material has been manually labeled by humans following a data labeling process using the Freesound Annotator platform [2].

    Clips are of variable length from 0.3 to 30s, due to the diversity of the sound classes and the preferences of Freesound users when recording sounds.

    All clips are provided as uncompressed PCM 16 bit 44.1 kHz mono audio files.

    Ground truth labels are provided at the clip-level (i.e., weak labels).

    The dataset poses mainly a large-vocabulary multi-label sound event classification problem, but also allows development and evaluation of a variety of machine listening approaches (see Sec. 4D in our paper).

    In addition to audio clips and ground truth, additional metadata is made available (including raw annotations, sound predominance ratings, Freesound metadata, and more), allowing a variety of analyses and sound event research tasks (see Files section below).

    The audio clips are grouped into a development (dev) set and an evaluation (eval) set such that they do not have clips from the same Freesound uploader.

    Dev set:

    40,966 audio clips totalling 80.4 hours of audio

    Avg duration/clip: 7.1s

    114,271 smeared labels (i.e., labels propagated in the upwards direction to the root of the ontology)

    Labels are correct but could be occasionally incomplete

    A train/validation split is provided (Sec. 3H). If a different split is used, it should be specified for reproducibility and fair comparability of results (see Sec. 5C of our paper)

    Eval set:

    10,231 audio clips totalling 27.9 hours of audio

    Avg duration/clip: 9.8s

    38,596 smeared labels

    Eval set is labeled exhaustively (labels are correct and complete for the considered vocabulary)

    Note: All classes in FSD50K are represented in AudioSet, except Crash cymbal, Human group actions, Human voice, Respiratory sounds, and Domestic sounds, home sounds.

    LICENSE

    All audio clips in FSD50K are released under Creative Commons (CC) licenses. Each clip has its own license as defined by the clip uploader in Freesound, some of them requiring attribution to their original authors and some forbidding further commercial reuse. Specifically:

    The development set consists of 40,966 clips with the following licenses:

    CC0: 14,959

    CC-BY: 20,017

    CC-BY-NC: 4616

    CC Sampling+: 1374

    The evaluation set consists of 10,231 clips with the following licenses:

    CC0: 4914

    CC-BY: 3489

    CC-BY-NC: 1425

    CC Sampling+: 403

    For attribution purposes and to facilitate attribution of these files to third parties, we include a mapping from the audio clips to their corresponding licenses. The licenses are specified in the files dev_clips_info_FSD50K.json and eval_clips_info_FSD50K.json.

    In addition, FSD50K as a whole is the result of a curation process and it has an additional license: FSD50K is released under CC-BY. This license is specified in the LICENSE-DATASET file downloaded with the FSD50K.doc zip file. We note that the choice of one license for the dataset as a whole is not straightforward as it comprises items with different licenses (such as audio clips, annotations, or data split). The choice of a global license in these cases may warrant further investigation (e.g., by someone with a background in copyright law).

    Usage of FSD50K for commercial purposes:

    If you'd like to use FSD50K for commercial purposes, please contact Eduardo Fonseca and Frederic Font at efonseca@google.com and frederic.font@upf.edu.

    Also, if you are interested in using FSD50K for machine learning competitions, please contact Eduardo Fonseca and Frederic Font at efonseca@google.com and frederic.font@upf.edu.

    FILES

    FSD50K can be downloaded as a series of zip files with the following directory structure:

    root │
    └───FSD50K.dev_audio/ Audio clips in the dev set │
    └───FSD50K.eval_audio/ Audio clips in the eval set │
    └───FSD50K.ground_truth/ Files for FSD50K's ground truth │ │
    │ └─── dev.csv Ground truth for the dev set │ │
    │ └─── eval.csv Ground truth for the eval set
    │ │
    │ └─── vocabulary.csv List of 200 sound classes in FSD50K │
    └───FSD50K.metadata/ Files for additional metadata │ │
    │ └─── class_info_FSD50K.json Metadata about the sound classes │ │
    │ └─── dev_clips_info_FSD50K.json Metadata about the dev clips │ │
    │ └─── eval_clips_info_FSD50K.json Metadata about the eval clips │ │
    │ └─── pp_pnp_ratings_FSD50K.json PP/PNP ratings
    │ │
    │ └─── collection/ Files for the sound collection format

    └───FSD50K.doc/ │
    └───README.md The dataset description file that you are reading │
    └───LICENSE-DATASET License of the FSD50K dataset as an entity

    Each row (i.e. audio clip) of dev.csv contains the following information:

    fname: the file name without the .wav extension, e.g., the fname 64760 corresponds to the file 64760.wav in disk. This number is the Freesound id. We always use Freesound ids as filenames.

    labels: the class labels (i.e., the ground truth). Note these class labels are smeared, i.e., the labels have been propagated in the upwards direction to the root of the ontology. More details about the label smearing process can be found in Appendix D of our paper.

    mids: the Freebase identifiers corresponding to the class labels, as defined in the AudioSet Ontology specification

    split: whether the clip belongs to train or val (see paper for details on the proposed split)

    Rows in eval.csv follow the same format, except that there is no split column.

    Note: We use a slightly different format than AudioSet for the naming of class labels in order to avoid potential problems with spaces, commas, etc. Example: we use Accelerating_and_revving_and_vroom instead of the original Accelerating, revving, vroom. You can go back to the original AudioSet naming using the information provided in vocabulary.csv (class label and mid for the 200 classes of FSD50K) and the AudioSet Ontology specification.

    Files with additional metadata (FSD50K.metadata/)

    To allow a variety of analysis and approaches with FSD50K, we provide the following metadata:

    class_info_FSD50K.json: python dictionary where each entry corresponds to one sound class and contains: FAQs utilized during the annotation of the class, examples (representative audio clips), and verification_examples (audio clips presented to raters during annotation as a quality control mechanism). Audio clips are described by the Freesound id. Note: It may be that some of these examples are not included in the FSD50K release.

    dev_clips_info_FSD50K.json: python dictionary where each entry corresponds to one dev clip and contains: title,

  13. Child and Family Services Reviews Update Volume 15, Issue 4, September 2021

    • data.virginia.gov
    • catalog.data.gov
    html
    Updated Sep 6, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Administration for Children and Families (2025). Child and Family Services Reviews Update Volume 15, Issue 4, September 2021 [Dataset]. https://data.virginia.gov/dataset/child-and-family-services-reviews-update-volume-15-issue-4-september-2021
    Explore at:
    htmlAvailable download formats
    Dataset updated
    Sep 6, 2025
    Dataset provided by
    Administration for Children and Families
    Description

    This issue of Child and Family Services Reviews Update contains the following sections: National Call on Round 4 Emphasizes Inclusion, Statewide Data Indicator Profiles Disseminated, Multi-Item Data Analysis Tool Launched, and New Training Content Coming Soon on ELA.

    Metadata-only record linking to the original dataset. Open original dataset below.

  14. Forecast revenue big data market worldwide 2011-2027

    • statista.com
    Updated Mar 15, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2018). Forecast revenue big data market worldwide 2011-2027 [Dataset]. https://www.statista.com/statistics/254266/global-big-data-market-forecast/
    Explore at:
    Dataset updated
    Mar 15, 2018
    Dataset authored and provided by
    Statistahttp://statista.com/
    Area covered
    Worldwide
    Description

    The global big data market is forecasted to grow to 103 billion U.S. dollars by 2027, more than double its expected market size in 2018. With a share of 45 percent, the software segment would become the large big data market segment by 2027. What is Big data? Big data is a term that refers to the kind of data sets that are too large or too complex for traditional data processing applications. It is defined as having one or some of the following characteristics: high volume, high velocity or high variety. Fast-growing mobile data traffic, cloud computing traffic, as well as the rapid development of technologies such as artificial intelligence (AI) and the Internet of Things (IoT) all contribute to the increasing volume and complexity of data sets. Big data analytics Advanced analytics tools, such as predictive analytics and data mining, help to extract value from the data and generate new business insights. The global big data and business analytics market was valued at 169 billion U.S. dollars in 2018 and is expected to grow to 274 billion U.S. dollars in 2022. As of November 2018, 45 percent of professionals in the market research industry reportedly used big data analytics as a research method.

  15. U

    Ukraine Corporate Bonds: Annual: Volume of Issue

    • ceicdata.com
    Updated Sep 15, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    CEICdata.com (2025). Ukraine Corporate Bonds: Annual: Volume of Issue [Dataset]. https://www.ceicdata.com/en/ukraine/corporate-bonds/corporate-bonds-annual-volume-of-issue
    Explore at:
    Dataset updated
    Sep 15, 2025
    Dataset provided by
    CEICdata.com
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Dec 1, 2006 - Dec 1, 2017
    Area covered
    Ukraine
    Variables measured
    Securities Issuance
    Description

    Ukraine Corporate Bonds: Annual: Volume of Issue data was reported at 8,350.300 UAH mn in 2017. This records an increase from the previous number of 6,760.490 UAH mn for 2016. Ukraine Corporate Bonds: Annual: Volume of Issue data is updated yearly, averaging 8,922.080 UAH mn from Dec 1996 (Median) to 2017, with 22 observations. The data reached an all-time high of 51,386.610 UAH mn in 2012 and a record low of 8.190 UAH mn in 1998. Ukraine Corporate Bonds: Annual: Volume of Issue data remains active status in CEIC and is reported by NATIONAL SECURITIES AND STOCK MARKET COMMISSION. The data is categorized under Global Database’s Ukraine – Table UA.Z004: Corporate Bonds.

  16. SoundDesc: Cleaned and Group-Filtered Splits

    • zenodo.org
    • data.niaid.nih.gov
    zip
    Updated Aug 26, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Benno Weck; Benno Weck; Xavier Serra; Xavier Serra (2023). SoundDesc: Cleaned and Group-Filtered Splits [Dataset]. http://doi.org/10.5281/zenodo.7665917
    Explore at:
    zipAvailable download formats
    Dataset updated
    Aug 26, 2023
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Benno Weck; Benno Weck; Xavier Serra; Xavier Serra
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This upload contains dataset splits of SoundDesc [1] and other supporting material for our paper:

    Data leakage in cross-modal retrieval training: A case study [arXiv] [ieeexplore]

    In our paper, we demonstrated that a data leakage problem in the previously published splits of SoundDesc leads to overly optimistic retrieval results.
    Using an off-the-shelf audio fingerprinting software, we identified that the data leakage stems from duplicates in the dataset.
    We define two new splits for the dataset: a cleaned split to remove the leakage and a group-filtered to avoid other kinds of weak contamination of the test data.

    SoundDesc is a dataset which was automatically sourced from the BBC Sound Effects web page [2]. The results from our paper can be reproduced using clean_split01 and group_filtered_split01.

    If you use the splits, please cite our work:

    Benno Weck, Xavier Serra, "Data Leakage in Cross-Modal Retrieval Training: A Case Study," ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Rhodes Island, Greece, 2023, pp. 1-5, doi: 10.1109/ICASSP49357.2023.10094617.

    @INPROCEEDINGS{10094617,
     author={Weck, Benno and Serra, Xavier},
     booktitle={ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)}, 
     title={Data Leakage in Cross-Modal Retrieval Training: A Case Study}, 
     year={2023},
     volume={},
     number={},
     pages={1-5},
     doi={10.1109/ICASSP49357.2023.10094617}}
    

    References:

    [1] A. S. Koepke, A. -M. Oncescu, J. Henriques, Z. Akata and S. Albanie, "Audio Retrieval with Natural Language Queries: A Benchmark Study," in IEEE Transactions on Multimedia, doi: 10.1109/TMM.2022.3149712.

    [2] https://sound-effects.bbcrewind.co.uk/

  17. f

    Table_1_Lesion Induced Error on Automated Measures of Brain Volume: Data...

    • datasetcatalog.nlm.nih.gov
    • frontiersin.figshare.com
    Updated Nov 30, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Beare, Richard; Novak, Jan; Anderson, Vicki A.; Wood, Amanda G.; King, Daniel J.; Shephard, Adam J. (2020). Table_1_Lesion Induced Error on Automated Measures of Brain Volume: Data From a Pediatric Traumatic Brain Injury Cohort.DOCX [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0000528925
    Explore at:
    Dataset updated
    Nov 30, 2020
    Authors
    Beare, Richard; Novak, Jan; Anderson, Vicki A.; Wood, Amanda G.; King, Daniel J.; Shephard, Adam J.
    Description

    Structural segmentation of T1-weighted (T1w) MRI has shown morphometric differences, both compared to controls and longitudinally, following a traumatic brain injury (TBI). While many patients with TBI present with abnormalities on structural MRI images, most neuroimaging software packages have not been systematically evaluated for accuracy in the presence of these pathology-related MRI abnormalities. The current study aimed to assess whether acute MRI lesions (MRI acquired 7–71 days post-injury) cause error in the estimates of brain volume produced by the semi-automated segmentation tool, Freesurfer. More specifically, to investigate whether this error was global, the presence of lesion-induced error in the contralesional hemisphere, where no abnormal signal was present, was measured. A dataset of 176 simulated lesion cases was generated using actual lesions from 16 pediatric TBI (pTBI) cases recruited from the emergency department and 11 typically-developing controls. Simulated lesion cases were compared to the “ground truth” of the non-lesion control-case T1w images. Using linear mixed-effects models, results showed that hemispheric measures of cortex volume were significantly lower in the contralesional-hemisphere compared to the ground truth. Interestingly, however, cortex volume (and cerebral white matter volume) were not significantly different in the lesioned hemisphere. However, percent volume difference (PVD) between the simulated lesion and ground truth showed that the magnitude of difference of cortex volume in the contralesional-hemisphere (mean PVD = 0.37%) was significantly smaller than that in the lesioned hemisphere (mean PVD = 0.47%), suggesting a small, but systematic lesion-induced error. Lesion characteristics that could explain variance in the PVD for each hemisphere were investigated. Taken together, these results suggest that the lesion-induced error caused by simulated lesions was not focal, but globally distributed. Previous post-processing approaches to adjust for lesions in structural analyses address the focal region where the lesion was located however, our results suggest that focal correction approaches are insufficient for the global error in morphometric measures of the injured brain.

  18. Z

    Audio problems dataset

    • data.niaid.nih.gov
    Updated Mar 29, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Alex Albas (2022). Audio problems dataset [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_6388055
    Explore at:
    Dataset updated
    Mar 29, 2022
    Dataset provided by
    Universitat Pompeu Fabra
    Authors
    Alex Albas
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Audio problems dataset for academic purposes.

  19. Child and Family Services Reviews Update Volume 17, Issue 3, June 2023

    • data.virginia.gov
    • catalog.data.gov
    html
    Updated Sep 6, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Administration for Children and Families (2025). Child and Family Services Reviews Update Volume 17, Issue 3, June 2023 [Dataset]. https://data.virginia.gov/dataset/child-and-family-services-reviews-update-volume-17-issue-3-june-2023
    Explore at:
    htmlAvailable download formats
    Dataset updated
    Sep 6, 2025
    Dataset provided by
    Administration for Children and Families
    Description

    This issue of Child and Family Services Reviews Update contains the following sections:CFSR Round 4 Is Underway!, APSR Process Integrated With CFSP, New OSRI Course Released on E-Learning Academy, New CFSR Overview Video Released, Spanish Translations Added to Portal, and New Round 4 FAQs Posted.

    Metadata-only record linking to the original dataset. Open original dataset below.

  20. Sound and Audio Data in Philippines

    • kaggle.com
    zip
    Updated Apr 1, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Techsalerator (2025). Sound and Audio Data in Philippines [Dataset]. https://www.kaggle.com/datasets/techsalerator/sound-and-audio-data-in-philippines
    Explore at:
    zip(12171329 bytes)Available download formats
    Dataset updated
    Apr 1, 2025
    Authors
    Techsalerator
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Area covered
    Philippines
    Description

    Techsalerator’s Location Sentiment Data for the Philippines

    Techsalerator’s Location Sentiment Data for the Philippines offers a detailed and structured analysis of sentiment trends across different geographic areas. This dataset is crucial for businesses, researchers, and policymakers aiming to understand public opinions, emotions, and attitudes in various locations.

    For access to the full dataset, contact us at info@techsalerator.com or visit Techsalerator Contact Us.

    Top 5 Key Data Fields

    • Geographic Location – Captures sentiment data across provinces, cities, and rural areas, allowing for localized sentiment analysis.
    • Sentiment Score (Positive, Neutral, Negative) – Quantifies public sentiment across different locations, providing insights into regional emotional trends.
    • Topic Categorization – Groups sentiments based on themes such as politics, economy, tourism, public services, and social issues.
    • Source of Sentiment Data – Aggregates insights from social media, news articles, online reviews, and surveys to ensure data diversity.
    • Time of Data Capture – Tracks sentiment trends over time, enabling seasonal and event-based analysis.

    Top 5 Location Sentiment Trends in the Philippines

    • Urban vs. Rural Sentiment Differences – Metro Manila exhibits more dynamic sentiment changes compared to rural areas due to higher digital engagement.
    • Tourism Sentiment Fluctuations – Tourist hotspots like Boracay and Palawan experience seasonal sentiment shifts based on travel experiences and environmental issues.
    • Economic and Business Perception – Sentiment in business hubs such as Makati and Cebu is influenced by job opportunities, inflation rates, and investment trends.
    • Political and Social Issues – Regional sentiment varies widely on governance, policies, and social movements, particularly during election periods.
    • Natural Disasters and Crisis Sentiment – Locations affected by typhoons and earthquakes show rapid sentiment shifts based on disaster response and recovery efforts.

    Top 5 Applications of Location Sentiment Data in the Philippines

    • Brand and Market Research – Helps businesses tailor marketing strategies based on regional consumer sentiment.
    • Government and Policy Making – Aids in crafting policies that align with public opinion and regional concerns.
    • Crisis Management – Assists disaster response teams in identifying areas with high distress and need for aid.
    • Tourism Development – Enables tourism boards to track and enhance visitor experiences in key destinations.
    • Investment and Economic Planning – Supports investors in understanding market sentiment before making location-based business decisions.

    Accessing Techsalerator’s Location Sentiment Data

    To obtain Techsalerator’s Location Sentiment Data for the Philippines, contact info@techsalerator.com with your specific requirements. Techsalerator provides customized datasets based on requested fields, with delivery available within 24 hours. Ongoing access options can also be discussed.

    Included Data Fields

    • Geographic Location
    • Sentiment Score (Positive, Neutral, Negative)
    • Topic Categorization
    • Source of Sentiment Data
    • Time of Data Capture
    • Event-Based Sentiment Trends
    • Social Media and News Analysis
    • Consumer and Business Sentiment
    • Crisis and Disaster Sentiment
    • Contact Information

    For in-depth insights into location-based sentiment trends in the Philippines, Techsalerator’s dataset is an invaluable resource for businesses, analysts, and government agencies.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Statista (2025). Data generation volume worldwide 2010-2029 [Dataset]. https://www.statista.com/statistics/871513/worldwide-data-created/
Organization logo

Data generation volume worldwide 2010-2029

Explore at:
Dataset updated
Nov 19, 2025
Dataset authored and provided by
Statistahttp://statista.com/
Area covered
Worldwide
Description

The total amount of data created, captured, copied, and consumed globally is forecast to increase rapidly. While it was estimated at ***** zettabytes in 2025, the forecast for 2029 stands at ***** zettabytes. Thus, global data generation will triple between 2025 and 2029. Data creation has been expanding continuously over the past decade. In 2020, the growth was higher than previously expected, caused by the increased demand due to the coronavirus (COVID-19) pandemic, as more people worked and learned from home and used home entertainment options more often.

Search
Clear search
Close search
Google apps
Main menu