Facebook
TwitterThe total amount of data created, captured, copied, and consumed globally is forecast to increase rapidly. While it was estimated at ***** zettabytes in 2025, the forecast for 2029 stands at ***** zettabytes. Thus, global data generation will triple between 2025 and 2029. Data creation has been expanding continuously over the past decade. In 2020, the growth was higher than previously expected, caused by the increased demand due to the coronavirus (COVID-19) pandemic, as more people worked and learned from home and used home entertainment options more often.
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
This week presents an exciting new data science challenge: creating a model to predict the volume of rented bikes at an hour-level granularity for the given instances within the test data. This time series analysis problem opens up numerous possibilities for innovative and effective solutions as applicable within the mobility space and services. We eagerly anticipate your creative approaches and outcomes.
Challenge Details Your task is to develop a time series regression model capable of accurately predicting the volume of rented bikes at an hour-level granularity for the given dates in test data. This problem requires participants to apply their time series analysis and machine learning knowledge.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The master and each slave have 4 cores, so 16 cores uses 4 virtual machines. Improvement rate is given by execution time of 1 core/Execution time of x cores.
Facebook
TwitterThis issue of Child and Family Services Reviews Update contains the following sections: Round 4 Year 1 States are Gearing Up, New Mock Case Course Released on E-Learning Academy, Spanish Translations Added to Portal, New Round 4 FAQs Posted, and Child Welfare Reviews Project Upcoming Presentations. Metadata-only record linking to the original dataset. Open original dataset below.
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Techsalerator’s Location Sentiment Data for Vietnam
Techsalerator’s Location Sentiment Data for Vietnam provides a robust dataset tailored for businesses, researchers, and developers. This collection offers valuable insights into how individuals perceive different locations across Vietnam, with a focus on sentiment analysis derived from social media, news, and location-based data.
For access to the full dataset, contact us at info@techsalerator.com or visit Techsalerator Contact Us.
Techsalerator’s Location Sentiment Data for Vietnam delivers a detailed breakdown of sentiment trends in various regions of the country. This dataset is crucial for market research, urban development, tourism, and political analysis, providing valuable insights into public perception across urban, suburban, and rural areas.
To obtain Techsalerator’s Location Sentiment Data for Vietnam, contact info@techsalerator.com with your specific requirements. Techsalerator provides customized datasets based on requested fields, with delivery available within 24 hours. Ongoing access options can also be discussed.
For comprehensive insights into public sentiment across various locations in Vietnam, Techsalerator’s dataset is a valuable resource for researchers, marketers, urban planners, and policymakers.
Facebook
TwitterVolume measurement of for example a tumor in a 3D image dataset is an important and often performed task. The problem is to segment the tumor out of this volume in order to measure its dimensions. This problem is complicated by the fact that the tumors are often connected to vessels and other organs. According to the present invention, an automated method and corresponding device and computer software are provided, which analyze a volume of interest around a singled out tumor, and which, by virtue of a 3D distance transform and a region drawing scheme advantageously allow to automatically segment a tumor out of a given volume.
Facebook
TwitterRecording environment : professional recording studio.
Recording content : general narrative sentences, interrogative sentences, etc.
Speaker : native speaker
Annotation Feature : word transcription, part-of-speech, phoneme boundary, four-level accents, four-level prosodic boundary.
Device : Microphone
Language : American English, British English, Japanese, French, Dutch, Catonese, Canadian French,Australian English, Italian, New Zealand English, Spanish, Mexican Spanish
Application scenarios : speech synthesis
Accuracy rate: Word transcription: the sentences accuracy rate is not less than 99%. Part-of-speech annotation: the sentences accuracy rate is not less than 98%. Phoneme annotation: the sentences accuracy rate is not less than 98% (the error rate of voiced and swallowed phonemes is not included, because the labelling is more subjective). Accent annotation: the word accuracy rate is not less than 95%. Prosodic boundary annotation: the sentences accuracy rate is not less than 97% Phoneme boundary annotation: the phoneme accuracy rate is not less than 95% (the error range of boundary is within 5%)
Facebook
TwitterThis issue of Child and Family Services Reviews Update contains the following sections:Technical Bulletin #12 Issued, Systematic Factors Report for Round 3 Released, Statewide Data indicators Updated, and CFSR Information Portal Refresh: New Look and Feel! Metadata-only record linking to the original dataset. Open original dataset below.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The maximum and minimum load refer to the average load of VMs with the largest and smallest loads (in terms of percentage of files processed) over five trials. The p-value is derived from a single-factor ANOVA test. p ≤ 0.05 indicates, with 95% confidence, that processing loads are not equal.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Introduction:
Divide and Remaster (DnR) is a source separation dataset for training and testing algorithms that separate a monaural audio signal into speech, music, and sound effects/background stems. The dataset is composed of artificial mixtures using audio from the librispeech, free music archive (FMA), and Freesound Dataset 50k (FSD50k). We introduce it as part of the Cocktail Fork Problem paper.
At a Glance:
.wav files at a sampling rate of 44.1 kHztr (3295 mixtues), validation cv (440 mixtures) and testing tt (652 mixtures) subsets.wav files, mix.wav, music.wav, speech.wav, sfx.wav, and annots.csv which contains the metadata for the original audio used to compose the mixture (transcriptions for speech, sound classes for sfx, and genre labels for music)
Other Resources:
Demo examples and additional information are available at: https://cocktail-fork.github.io/
For more details about the data generation process, the code used to generate our dataset can be found at the following: https://github.com/darius522/dnr-utils
Contact and Support:
Have an issue, concern, or question about DnR ? If so, please open an issue here.
For any other inquiries, feel free to shoot an email at: firstname.lastname@gmail.com, my name is Darius Petermann ;)
Citation:
If you use DnR please cite [our paper](https://arxiv.org/abs/2110.09958) in which we introduce the dataset as part of the Cocktail Fork Problem:
@article{Petermann2021cocktail,
title={The Cocktail Fork Problem: Three-Stem Audio Separation for Real-World Soundtracks},
author={Darius Petermann and Gordon Wichern and Zhong-Qiu Wang and Jonathan {Le Roux}},
year={2021},
journal={arXiv preprint arXiv:2110.09958},
archivePrefix={arXiv},
primaryClass={eess.AS}
}
Facebook
TwitterThis issue of Child and Family Services Reviews Update contains the following sections: CB Response to COVID-19 Pandemic, CFSR Technical Bulletin #11, Enhancements to the Data Indicator Visualizations, State Data Profiles Released, ACYF-CB-PI-20-02— State Guidance on FY 2021 APSR, Status of Approved PIPs, and PIP Evaluation During Non-Overlapping Data Periods. Metadata-only record linking to the original dataset. Open original dataset below.
Facebook
TwitterFSD50K is an open dataset of human-labeled sound events containing 51,197 Freesound clips unequally distributed in 200 classes drawn from the AudioSet Ontology. FSD50K has been created at the Music Technology Group of Universitat Pompeu Fabra.
Citation
If you use the FSD50K dataset, or part of it, please cite our TASLP paper (available from [arXiv] [TASLP]):
@article{fonseca2022FSD50K, title={{FSD50K}: an open dataset of human-labeled sound events}, author={Fonseca, Eduardo and Favory, Xavier and Pons, Jordi and Font, Frederic and Serra, Xavier}, journal={IEEE/ACM Transactions on Audio, Speech, and Language Processing}, volume={30}, pages={829--852}, year={2022}, publisher={IEEE} }
Paper update: This paper has been published in TASLP at the beginning of 2022. The accepted camera-ready version includes a number of improvements with respect to the initial submission. The main updates include: estimation of the amount of label noise in FSD50K, SNR comparison between FSD50K and AudioSet, improved description of evaluation metrics including equations, clarification of experimental methodology and some results, some content moved to Appendix for readability. The TASLP-accepted camera-ready version is available from arXiv (in particular, it is v2 in arXiv, displayed by default).
Data curators
Eduardo Fonseca, Xavier Favory, Jordi Pons, Mercedes Collado, Ceren Can, Rachit Gupta, Javier Arredondo, Gary Avendano and Sara Fernandez
Contact
You are welcome to contact Eduardo Fonseca should you have any questions, at efonseca@google.com.
ABOUT FSD50K
Freesound Dataset 50k (or FSD50K for short) is an open dataset of human-labeled sound events containing 51,197 Freesound clips unequally distributed in 200 classes drawn from the AudioSet Ontology [1]. FSD50K has been created at the Music Technology Group of Universitat Pompeu Fabra.
What follows is a brief summary of FSD50K's most important characteristics. Please have a look at our paper (especially Section 4) to extend the basic information provided here with relevant details for its usage, as well as discussion, limitations, applications and more.
Basic characteristics:
FSD50K contains 51,197 audio clips from Freesound, totalling 108.3 hours of multi-labeled audio
The dataset encompasses 200 sound classes (144 leaf nodes and 56 intermediate nodes) hierarchically organized with a subset of the AudioSet Ontology.
The audio content is composed mainly of sound events produced by physical sound sources and production mechanisms, including human sounds, sounds of things, animals, natural sounds, musical instruments and more. The vocabulary can be inspected in vocabulary.csv (see Files section below).
The acoustic material has been manually labeled by humans following a data labeling process using the Freesound Annotator platform [2].
Clips are of variable length from 0.3 to 30s, due to the diversity of the sound classes and the preferences of Freesound users when recording sounds.
All clips are provided as uncompressed PCM 16 bit 44.1 kHz mono audio files.
Ground truth labels are provided at the clip-level (i.e., weak labels).
The dataset poses mainly a large-vocabulary multi-label sound event classification problem, but also allows development and evaluation of a variety of machine listening approaches (see Sec. 4D in our paper).
In addition to audio clips and ground truth, additional metadata is made available (including raw annotations, sound predominance ratings, Freesound metadata, and more), allowing a variety of analyses and sound event research tasks (see Files section below).
The audio clips are grouped into a development (dev) set and an evaluation (eval) set such that they do not have clips from the same Freesound uploader.
Dev set:
40,966 audio clips totalling 80.4 hours of audio
Avg duration/clip: 7.1s
114,271 smeared labels (i.e., labels propagated in the upwards direction to the root of the ontology)
Labels are correct but could be occasionally incomplete
A train/validation split is provided (Sec. 3H). If a different split is used, it should be specified for reproducibility and fair comparability of results (see Sec. 5C of our paper)
Eval set:
10,231 audio clips totalling 27.9 hours of audio
Avg duration/clip: 9.8s
38,596 smeared labels
Eval set is labeled exhaustively (labels are correct and complete for the considered vocabulary)
Note: All classes in FSD50K are represented in AudioSet, except Crash cymbal, Human group actions, Human voice, Respiratory sounds, and Domestic sounds, home sounds.
LICENSE
All audio clips in FSD50K are released under Creative Commons (CC) licenses. Each clip has its own license as defined by the clip uploader in Freesound, some of them requiring attribution to their original authors and some forbidding further commercial reuse. Specifically:
The development set consists of 40,966 clips with the following licenses:
CC0: 14,959
CC-BY: 20,017
CC-BY-NC: 4616
CC Sampling+: 1374
The evaluation set consists of 10,231 clips with the following licenses:
CC0: 4914
CC-BY: 3489
CC-BY-NC: 1425
CC Sampling+: 403
For attribution purposes and to facilitate attribution of these files to third parties, we include a mapping from the audio clips to their corresponding licenses. The licenses are specified in the files dev_clips_info_FSD50K.json and eval_clips_info_FSD50K.json.
In addition, FSD50K as a whole is the result of a curation process and it has an additional license: FSD50K is released under CC-BY. This license is specified in the LICENSE-DATASET file downloaded with the FSD50K.doc zip file. We note that the choice of one license for the dataset as a whole is not straightforward as it comprises items with different licenses (such as audio clips, annotations, or data split). The choice of a global license in these cases may warrant further investigation (e.g., by someone with a background in copyright law).
Usage of FSD50K for commercial purposes:
If you'd like to use FSD50K for commercial purposes, please contact Eduardo Fonseca and Frederic Font at efonseca@google.com and frederic.font@upf.edu.
Also, if you are interested in using FSD50K for machine learning competitions, please contact Eduardo Fonseca and Frederic Font at efonseca@google.com and frederic.font@upf.edu.
FILES
FSD50K can be downloaded as a series of zip files with the following directory structure:
root
│
└───FSD50K.dev_audio/ Audio clips in the dev set
│
└───FSD50K.eval_audio/ Audio clips in the eval set
│
└───FSD50K.ground_truth/ Files for FSD50K's ground truth
│ │
│ └─── dev.csv Ground truth for the dev set
│ │
│ └─── eval.csv Ground truth for the eval set
│ │
│ └─── vocabulary.csv List of 200 sound classes in FSD50K
│
└───FSD50K.metadata/ Files for additional metadata
│ │
│ └─── class_info_FSD50K.json Metadata about the sound classes
│ │
│ └─── dev_clips_info_FSD50K.json Metadata about the dev clips
│ │
│ └─── eval_clips_info_FSD50K.json Metadata about the eval clips
│ │
│ └─── pp_pnp_ratings_FSD50K.json PP/PNP ratings
│ │
│ └─── collection/ Files for the sound collection format
│
└───FSD50K.doc/
│
└───README.md The dataset description file that you are reading
│
└───LICENSE-DATASET License of the FSD50K dataset as an entity
Each row (i.e. audio clip) of dev.csv contains the following information:
fname: the file name without the .wav extension, e.g., the fname 64760 corresponds to the file 64760.wav in disk. This number is the Freesound id. We always use Freesound ids as filenames.
labels: the class labels (i.e., the ground truth). Note these class labels are smeared, i.e., the labels have been propagated in the upwards direction to the root of the ontology. More details about the label smearing process can be found in Appendix D of our paper.
mids: the Freebase identifiers corresponding to the class labels, as defined in the AudioSet Ontology specification
split: whether the clip belongs to train or val (see paper for details on the proposed split)
Rows in eval.csv follow the same format, except that there is no split column.
Note: We use a slightly different format than AudioSet for the naming of class labels in order to avoid potential problems with spaces, commas, etc. Example: we use Accelerating_and_revving_and_vroom instead of the original Accelerating, revving, vroom. You can go back to the original AudioSet naming using the information provided in vocabulary.csv (class label and mid for the 200 classes of FSD50K) and the AudioSet Ontology specification.
Files with additional metadata (FSD50K.metadata/)
To allow a variety of analysis and approaches with FSD50K, we provide the following metadata:
class_info_FSD50K.json: python dictionary where each entry corresponds to one sound class and contains: FAQs utilized during the annotation of the class, examples (representative audio clips), and verification_examples (audio clips presented to raters during annotation as a quality control mechanism). Audio clips are described by the Freesound id. Note: It may be that some of these examples are not included in the FSD50K release.
dev_clips_info_FSD50K.json: python dictionary where each entry corresponds to one dev clip and contains: title,
Facebook
TwitterThis issue of Child and Family Services Reviews Update contains the following sections: National Call on Round 4 Emphasizes Inclusion, Statewide Data Indicator Profiles Disseminated, Multi-Item Data Analysis Tool Launched, and New Training Content Coming Soon on ELA.
Metadata-only record linking to the original dataset. Open original dataset below.
Facebook
TwitterThe global big data market is forecasted to grow to 103 billion U.S. dollars by 2027, more than double its expected market size in 2018. With a share of 45 percent, the software segment would become the large big data market segment by 2027. What is Big data? Big data is a term that refers to the kind of data sets that are too large or too complex for traditional data processing applications. It is defined as having one or some of the following characteristics: high volume, high velocity or high variety. Fast-growing mobile data traffic, cloud computing traffic, as well as the rapid development of technologies such as artificial intelligence (AI) and the Internet of Things (IoT) all contribute to the increasing volume and complexity of data sets. Big data analytics Advanced analytics tools, such as predictive analytics and data mining, help to extract value from the data and generate new business insights. The global big data and business analytics market was valued at 169 billion U.S. dollars in 2018 and is expected to grow to 274 billion U.S. dollars in 2022. As of November 2018, 45 percent of professionals in the market research industry reportedly used big data analytics as a research method.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Ukraine Corporate Bonds: Annual: Volume of Issue data was reported at 8,350.300 UAH mn in 2017. This records an increase from the previous number of 6,760.490 UAH mn for 2016. Ukraine Corporate Bonds: Annual: Volume of Issue data is updated yearly, averaging 8,922.080 UAH mn from Dec 1996 (Median) to 2017, with 22 observations. The data reached an all-time high of 51,386.610 UAH mn in 2012 and a record low of 8.190 UAH mn in 1998. Ukraine Corporate Bonds: Annual: Volume of Issue data remains active status in CEIC and is reported by NATIONAL SECURITIES AND STOCK MARKET COMMISSION. The data is categorized under Global Database’s Ukraine – Table UA.Z004: Corporate Bonds.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This upload contains dataset splits of SoundDesc [1] and other supporting material for our paper:
Data leakage in cross-modal retrieval training: A case study [arXiv] [ieeexplore]
In our paper, we demonstrated that a data leakage problem in the previously published splits of SoundDesc leads to overly optimistic retrieval results.
Using an off-the-shelf audio fingerprinting software, we identified that the data leakage stems from duplicates in the dataset.
We define two new splits for the dataset: a cleaned split to remove the leakage and a group-filtered to avoid other kinds of weak contamination of the test data.
SoundDesc is a dataset which was automatically sourced from the BBC Sound Effects web page [2]. The results from our paper can be reproduced using clean_split01 and group_filtered_split01.
If you use the splits, please cite our work:
Benno Weck, Xavier Serra, "Data Leakage in Cross-Modal Retrieval Training: A Case Study," ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Rhodes Island, Greece, 2023, pp. 1-5, doi: 10.1109/ICASSP49357.2023.10094617.
@INPROCEEDINGS{10094617,
author={Weck, Benno and Serra, Xavier},
booktitle={ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
title={Data Leakage in Cross-Modal Retrieval Training: A Case Study},
year={2023},
volume={},
number={},
pages={1-5},
doi={10.1109/ICASSP49357.2023.10094617}}
References:
[1] A. S. Koepke, A. -M. Oncescu, J. Henriques, Z. Akata and S. Albanie, "Audio Retrieval with Natural Language Queries: A Benchmark Study," in IEEE Transactions on Multimedia, doi: 10.1109/TMM.2022.3149712.
Facebook
TwitterStructural segmentation of T1-weighted (T1w) MRI has shown morphometric differences, both compared to controls and longitudinally, following a traumatic brain injury (TBI). While many patients with TBI present with abnormalities on structural MRI images, most neuroimaging software packages have not been systematically evaluated for accuracy in the presence of these pathology-related MRI abnormalities. The current study aimed to assess whether acute MRI lesions (MRI acquired 7–71 days post-injury) cause error in the estimates of brain volume produced by the semi-automated segmentation tool, Freesurfer. More specifically, to investigate whether this error was global, the presence of lesion-induced error in the contralesional hemisphere, where no abnormal signal was present, was measured. A dataset of 176 simulated lesion cases was generated using actual lesions from 16 pediatric TBI (pTBI) cases recruited from the emergency department and 11 typically-developing controls. Simulated lesion cases were compared to the “ground truth” of the non-lesion control-case T1w images. Using linear mixed-effects models, results showed that hemispheric measures of cortex volume were significantly lower in the contralesional-hemisphere compared to the ground truth. Interestingly, however, cortex volume (and cerebral white matter volume) were not significantly different in the lesioned hemisphere. However, percent volume difference (PVD) between the simulated lesion and ground truth showed that the magnitude of difference of cortex volume in the contralesional-hemisphere (mean PVD = 0.37%) was significantly smaller than that in the lesioned hemisphere (mean PVD = 0.47%), suggesting a small, but systematic lesion-induced error. Lesion characteristics that could explain variance in the PVD for each hemisphere were investigated. Taken together, these results suggest that the lesion-induced error caused by simulated lesions was not focal, but globally distributed. Previous post-processing approaches to adjust for lesions in structural analyses address the focal region where the lesion was located however, our results suggest that focal correction approaches are insufficient for the global error in morphometric measures of the injured brain.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Audio problems dataset for academic purposes.
Facebook
TwitterThis issue of Child and Family Services Reviews Update contains the following sections:CFSR Round 4 Is Underway!, APSR Process Integrated With CFSP, New OSRI Course Released on E-Learning Academy, New CFSR Overview Video Released, Spanish Translations Added to Portal, and New Round 4 FAQs Posted.
Metadata-only record linking to the original dataset. Open original dataset below.
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Techsalerator’s Location Sentiment Data for the Philippines offers a detailed and structured analysis of sentiment trends across different geographic areas. This dataset is crucial for businesses, researchers, and policymakers aiming to understand public opinions, emotions, and attitudes in various locations.
For access to the full dataset, contact us at info@techsalerator.com or visit Techsalerator Contact Us.
To obtain Techsalerator’s Location Sentiment Data for the Philippines, contact info@techsalerator.com with your specific requirements. Techsalerator provides customized datasets based on requested fields, with delivery available within 24 hours. Ongoing access options can also be discussed.
For in-depth insights into location-based sentiment trends in the Philippines, Techsalerator’s dataset is an invaluable resource for businesses, analysts, and government agencies.
Facebook
TwitterThe total amount of data created, captured, copied, and consumed globally is forecast to increase rapidly. While it was estimated at ***** zettabytes in 2025, the forecast for 2029 stands at ***** zettabytes. Thus, global data generation will triple between 2025 and 2029. Data creation has been expanding continuously over the past decade. In 2020, the growth was higher than previously expected, caused by the increased demand due to the coronavirus (COVID-19) pandemic, as more people worked and learned from home and used home entertainment options more often.