Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Emilia: An Extensive, Multilingual, and Diverse Speech Dataset for Large-Scale Speech Generation
This is the official repository 👑 for the Emilia dataset and the source code for the Emilia-Pipe speech data preprocessing pipeline.
News 🔥
2025/02/26: The Emilia-Large dataset, featuring over 200,000 hours of data, is now available!!! Emilia-Large combines the original 101k-hour Emilia dataset (licensed under CC BY-NC 4.0) with the brand-new 114k-hour Emilia-YODAS… See the full description on the dataset page: https://huggingface.co/datasets/amphion/Emilia-Dataset.
Facebook
TwitterAttribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
C2SER: Paper | Code | HuggingFace
Emo-Emilia Dataset
To better simulate real-world context, we introduce a new SER test set, Emo-Emilia. Specifically, we apply the automated labeling approach to annotate Emilia, a large-scale multilingual and diverse speech generation resource with over 100,000 hours of speech data that captures a wide range of emotional contexts. We then manually verify the accuracy of the emotion labels. Each utterance is checked by at least two experts to ensure… See the full description on the dataset page: https://huggingface.co/datasets/ASLP-lab/Emo-Emilia.
Facebook
TwitterAttribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
The graph shows the changes in ^'s g-index and the corresponding percentile for the sake of comparison with all authors. The g-index is a scientometric index similar to the h-index but puts more weight on the sum of citations. The g-index of an author is g if the author has published at least g papers with total citations of g2.
Facebook
TwitterWith the creation of the geological map on a scale of 1:10,000, the Region intended to equip itself with a detailed cognitive tool, such as to represent the reference base for targeted analyzes and insights into specific areas and themes. It constitutes the indispensable premise of any planning and intervention design, both public and private; it is the basis for the preparation of urban plans, for an effective soil protection policy, for the planning of extractive activities, for the planning of the use and of surface and deep water resources, for the protection of groundwater from pollution, for civil protection, etc. Each sheet is accompanied by: a detailed legend, a list of conventional signs, one or more geological sections and often by diagrams of various kinds (tectonic, stratigraphic, etc.). The methodological and scientific coordination of the entire project was carried out by the Regional Geological Office, by professors from 7 university institutes (Departments or Institutes of Geology of Bologna, Modena, Parma, Pavia, Pisa, Florence and Padua) and by researchers from the CNR of Pisa. The field survey and the cartographic drafting were mainly carried out by young professional geologists.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Comprehensive dataset containing 1 verified Classified ads newspaper publisher businesses in Province of Reggio Emilia, Italy with complete contact information, ratings, reviews, and location data.
Facebook
TwitterAttribution 3.0 (CC BY 3.0)https://creativecommons.org/licenses/by/3.0/
License information was derived automatically
It represents the area distribution in the subsoil (90-140 cm deep) of the nickel content in soils for agricultural use. This depth is considered representative of the natural background content ('pedo-geochemical content' according to ISO/DIS 19258, 2005). The cartographic units are represented by groups of polygons belonging to concentration classes. Factors that regulate the natural content of metals in soils are: origin of the sediment in which the soil was formed, texture, and evolutionary degree. For nickel as well as for chromium the dominant factor is the origin of the sediments that originate the soil. The paper has an original setting as it uses genetic-environmental interpretation for the evaluation and geographical extension of geochemical data, instead of the more traditional geostatistical analysis. Concentration values are obtained by the XRF (X-ray Fluorescence Spectrometry) analytical method in order to determine the total content.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The paper of organic carbon stored in the soils of the Emilia-Romagna Apennines, is understood as the first processing through polygons that describe the average content of organic carbon expressed in Mg*ha-1 in the first 100 cm of soil plus the content of the organic surface horizons in the case of forest soils. It provides a spatial data. The CO content in the first 100 cm of soil and organic horizons in forest soils considers the distribution of different types of soil and the incidence of non-soil areas, understood as areas occupied by surface water, urban and infrastructure. The representation of the territory takes place through a knitted structure consisting of cells with 1Km side. The value attributed takes into account the zero contribution due to non-soil areas. The attribution of the value to the cell takes into account the distribution of soils according to the Land Charter of the Emilia-Romagna Region (scale 1:250,000, ed. 1994 and updated.); the distinction between land-occupied and non-soil-occupied areas (i.e. urban, infrastructure or surface water) derives from the Land Use Charter 2003 on a scale of 1:25,000 drawn up by the Geographic Information Systems Service of the Emilia-Romagna Region.
Facebook
TwitterWMS of the Regional Technical Paper 1:10.000 - Emilia-Romagna Region
Facebook
TwitterAttribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
This dataset was introduced for the work presented in the paper called "A comparison of several AI techniques for authorship attribution on Romanian texts". Here, several AI techniques were compared for classifying literary texts written by multiple authors by taking into account a limited number of speech parts (prepositions, adverbs, and conjunctions). The compared methods are Artificial Neural Networks, Multi Expression Programming, k-Nearest Neighbour, Support Vector Machines, and Decision Trees with C5.0.
The source code is available at https://github.com/sanda-avram/ROST-source-code.
The dataset contains stories, short stories, fairy tales, novels, articles, and sketches written by Ion Creangă, Barbu Ştefănescu Delavrancea, Mihai Eminescu, Nicolae Filimon, Emil Gârleanu, Petre Ispirescu, Mihai Oltean, Emilia Plugaru, Liviu Rebreanu, Ioan Slavici.
ROST-csv/ - dataset representations as vectors of occurrence frequencies of Inflexible Parts of Speech (IPoS)
IPoS/ - lists of Inflexible Parts of Speech (IPoS); the initial ones and the ones that were used because they appeared in the texts
ROST-details.csv - contains detailed information about the year of publishing, type of writing, used file name, author and title of the text, and (website) source of the text
ROST-stats.csv - contains statistics for all 400 considered texts, pertaining to the number of occurring:
usedPrepositions.txtusedPrepositionsAndAdverbs.txtusedPrepositionsAdverbsAndConjunctions.txtAvram, Sanda Maria, and Mihai Oltean. "A comparison of several AI techniques for authorship attribution on Romanian texts." arXiv preprint arXiv:2211.05180 (2022). The paper introduces the dataset and compares multiple AI techniques trained to recognize the authors of the texts, based on a number of speech parts (prepositions, adverbs, and conjunctions). The compared methods are Artificial Neural Networks, Support Vector Machines, Multi Expression Programming, Decision Trees with C5.0, and k-Nearest Neighbour
MIT License
Copyright (c) 2022 Sanda Avram
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NON-INFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES, OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
IntroductionClinical research with remote monitoring technologies (RMTs) has multiple advantages over standard paper-pencil tests, but also raises several ethical concerns. While several studies have addressed the issue of governance of big data in clinical research from the legal or ethical perspectives, the viewpoint of local research ethics committee (REC) members is underrepresented in the current literature. The aim of this study is therefore to find which specific ethical challenges are raised by RECs in the context of a large European study on remote monitoring in all syndromic stages of Alzheimer’s disease, and what gaps remain.MethodsDocuments describing the REC review process at 10 sites in 9 European countries from the project Remote Assessment of Disease and Relapse–Alzheimer’s Disease (RADAR-AD) were collected and translated. Main themes emerging in the documents were identified using a qualitative analysis approach.ResultsFour main themes emerged after analysis: data management, participant’s wellbeing, methodological issues, and the issue of defining the regulatory category of RMTs. Review processes differed across sites: process duration varied from 71 to 423 days, some RECs did not raise any issues, whereas others raised up to 35 concerns, and the approval of a data protection officer was needed in half of the sites.DiscussionThe differences in the ethics review process of the same study protocol across different local settings suggest that a multi-site study would benefit from a harmonization in research ethics governance processes. More specifically, some best practices could be included in ethical reviews across institutional and national contexts, such as the opinion of an institutional data protection officer, patient advisory board reviews of the protocol and plans for how ethical reflection is embedded within the study.
Facebook
TwitterWMS of the Regional Technical Paper 1:5.000 taken from DBTR2008 (Full Version) - Emilia-Romagna Region.
Facebook
TwitterSoil pH is a fundamental property capable of influencing many physical, chemical and biological processes. It regulates the availability of many nutrients for plants, it influences the activity of the microorganisms responsible for the decomposition of organic matter and most of the chemical transformations that take place in the soil, it has a determining role in influencing the mobility and therefore the bioavailability of heavy metals . Furthermore, some physical characteristics of the soil are influenced by the pH, such as permeability, the stability of the aggregates, the degree of compaction and the dispersion of the clayey fraction. The map represents the areal distribution in lowland soils of the pH value in the surface layer (0- 30cm). The map was drawn up starting from data extrapolated from the Soil Database of the Emilia-Romagna Region for the period 1974-2017.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Decisions under uncertainty often emerge from the interaction of affective and cognitive processes. Using the Balloon Analogue Risk Task (BART), this study investigated how incidental emotions (happiness, sadness, anger, fear) and prior outcomes shape risk-taking. Sixty-six participants performed the BART while exposed to images evoking emotions or neutral affect. Results revealed that exposure to anger- and fear-evoking stimuli significantly reduced risk-taking, suggesting these highly arousing negative emotions may disrupt engagement and promote avoidance behaviors. Additionally, participants demonstrated heightened risk propensity and prolonged decision times following a successful trial, indicating a cognitive reframing of subsequent decisions after gains. These findings highlight how emotional and contextual cues jointly shape risky behavior in uncertain environments, advancing our understanding of affect-cognition interplay in decision processes.
Facebook
TwitterWMS of the Regional Technical Map 1:5.000 taken from DBTR2008 (Light version) - Emilia-Romagna Region
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Inclusion and exclusion criteria for assessing the retrieved papers.
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
NumDB benchmark: set of tables originally extracted from DBpedia, from which different value samples have been selected and various degrees of errors have been added in order to simulate actual tables on the Web.The dataset has been created for Kacprzak, E., Giménez-García, J. M., Piscopo, A., Koesten, L., Ibáñez, L. D., Tennison, J., & Simperl, E. (2018, November). Making Sense of Numerical Data-Semantic Labelling of Web Tables. In European Knowledge Acquisition Workshop (pp. 163-178). Springer, Cham.A description of the data generation process is in the paper.
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Emilia: An Extensive, Multilingual, and Diverse Speech Dataset for Large-Scale Speech Generation
This is the official repository 👑 for the Emilia dataset and the source code for the Emilia-Pipe speech data preprocessing pipeline.
News 🔥
2025/02/26: The Emilia-Large dataset, featuring over 200,000 hours of data, is now available!!! Emilia-Large combines the original 101k-hour Emilia dataset (licensed under CC BY-NC 4.0) with the brand-new 114k-hour Emilia-YODAS… See the full description on the dataset page: https://huggingface.co/datasets/amphion/Emilia-Dataset.