19 datasets found
  1. h

    vqa

    • huggingface.co
    Updated Oct 9, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    World Cuisines (2024). vqa [Dataset]. https://huggingface.co/datasets/worldcuisines/vqa
    Explore at:
    Dataset updated
    Oct 9, 2024
    Dataset authored and provided by
    World Cuisines
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    WorldCuisines: A Massive-Scale Benchmark for Multilingual and Multicultural Visual Question Answering on Global Cuisines

    WorldCuisines is a massive-scale visual question answering (VQA) benchmark for multilingual and multicultural understanding through global cuisines. The dataset contains text-image pairs across 30 languages and dialects, spanning 9 language families and featuring over 1 million data points, making it the largest multicultural VQA benchmark as of 17 October 2024.… See the full description on the dataset page: https://huggingface.co/datasets/worldcuisines/vqa.

  2. d

    Data from: Knowledge from non-English-language studies broadens...

    • search.dataone.org
    • data.niaid.nih.gov
    • +1more
    Updated May 20, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Filipe Serrano; Valentina Marconi; Stefanie Deinet; Hannah Puleston; Helga Correa; Juan C. DÃaz-Ricaurte; Carolina Farhat; Ricardo Luria-Manzano; Marcio Martins; Eletra Souza; Sergio Souza; Joao Vieira-Alencar; Paula Valdujo; Robin Freeman; Louise McRae (2025). Knowledge from non-English-language studies broadens contributions to conservation policy and helps to tackle bias in biodiversity data [Dataset]. http://doi.org/10.5061/dryad.ngf1vhj68
    Explore at:
    Dataset updated
    May 20, 2025
    Dataset provided by
    Dryad Digital Repository
    Authors
    Filipe Serrano; Valentina Marconi; Stefanie Deinet; Hannah Puleston; Helga Correa; Juan C. Díaz-Ricaurte; Carolina Farhat; Ricardo Luria-Manzano; Marcio Martins; Eletra Souza; Sergio Souza; Joao Vieira-Alencar; Paula Valdujo; Robin Freeman; Louise McRae
    Description

    Local ecological evidence is key to informing conservation. However, many global biodiversity indicators often neglect local ecological evidence published in languages other than English, potentially biassing our understanding of biodiversity trends in areas where English is not the dominant language. Brazil is a megadiverse country with a thriving national scientific publishing landscape. Here, using Brazil and a species abundance indicator as examples, we assess how well bilingual literature searches can both improve data coverage for a country where English is not the primary language and help tackle biases in biodiversity datasets. We conducted a comprehensive screening of articles containing abundance data for vertebrates published in 59 Brazilian journals (articles in Portuguese or English) and 79 international English-only journals. These were grouped into three datasets according to journal origin and article language (Brazilian-Portuguese, Brazilian-English and International). ..., Data collection We collected time-series of vertebrate population abundance suitable for entry into the LPD (livingplanetindex.org), which provides the repository for one of the indicators in the GBF, the Living Planet Index (LPI, Ledger et al., 2023). Despite the continuous addition of new data, LPI coverage remains incomplete for some regions (Living Planet Report 2024 – A System in Peril, 2024). We collected data from three sets of sources: a) Portuguese-language articles from Brazilian journals (hereafter “Brazilian-Portuguese†dataset), b) English-language articles from Brazilian journals (“Brazilian-English†dataset) and c) English-language articles from non-Brazilian journals (“International†dataset). For a) and b), we first compiled a list of Brazilian biodiversity-related journals using the list of non-English-language journals in ecology and conservation published by the translatE project (www.translatesciences.com) as a starting point. The International dataset was obtained ..., # Knowledge from non-English-language studies broadens contributions to conservation policy and helps to tackle bias in biodiversity data

    Dataset DOI: 10.5061/dryad.ngf1vhj68

    Description of the data and file structure

    We collected time-series of vertebrate population abundance suitable for entry into the LPD (livingplanetindex.org), which provides the repository for one of the indicators in the GBF, the Living Planet Index (LPI, Ledger et al., 2023).

    We collected data from three sets of sources: a) Portuguese-language articles from Brazilian journals (hereafter “Brazilian-Portuguese†dataset), b) English-language articles from Brazilian journals (“Brazilian-English†dataset) and c) English-language articles from non-Brazilian journals (“International†dataset). For a) and b), we first compiled a list of Brazilian biodiversity-related journals using the list of non-English-language journals in ecology and conservat...,

  3. Most popular database management systems worldwide 2024

    • statista.com
    Updated Jun 30, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2025). Most popular database management systems worldwide 2024 [Dataset]. https://www.statista.com/statistics/809750/worldwide-popularity-ranking-database-management-systems/
    Explore at:
    Dataset updated
    Jun 30, 2025
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    Jun 2024
    Area covered
    Worldwide
    Description

    As of June 2024, the most popular database management system (DBMS) worldwide was Oracle, with a ranking score of *******; MySQL and Microsoft SQL server rounded out the top three. Although the database management industry contains some of the largest companies in the tech industry, such as Microsoft, Oracle and IBM, a number of free and open-source DBMSs such as PostgreSQL and MariaDB remain competitive. Database Management Systems As the name implies, DBMSs provide a platform through which developers can organize, update, and control large databases. Given the business world’s growing focus on big data and data analytics, knowledge of SQL programming languages has become an important asset for software developers around the world, and database management skills are seen as highly desirable. In addition to providing developers with the tools needed to operate databases, DBMS are also integral to the way that consumers access information through applications, which further illustrates the importance of the software.

  4. f

    Data from: Mpox Narrative on Instagram: A Labeled Multilingual Dataset of...

    • figshare.com
    xlsx
    Updated Oct 12, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nirmalya Thakur (2024). Mpox Narrative on Instagram: A Labeled Multilingual Dataset of Instagram Posts on Mpox for Sentiment, Hate Speech, and Anxiety Analysis [Dataset]. http://doi.org/10.6084/m9.figshare.27072247.v1
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Oct 12, 2024
    Dataset provided by
    figshare
    Authors
    Nirmalya Thakur
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Please cite this paper when using this dataset: N. Thakur, “Mpox narrative on Instagram: A labeled multilingual dataset of Instagram posts on mpox for sentiment, hate speech, and anxiety analysis,” arXiv [cs.LG], 2024, URL: https://arxiv.org/abs/2409.05292Abstract: The world is currently experiencing an outbreak of mpox, which has been declared a Public Health Emergency of International Concern by WHO. During recent virus outbreaks, social media platforms have played a crucial role in keeping the global population informed and updated regarding various aspects of the outbreaks. As a result, in the last few years, researchers from different disciplines have focused on the development of social media datasets focusing on different virus outbreaks. No prior work in this field has focused on the development of a dataset of Instagram posts about the mpox outbreak. The work presented in this paper (stated above) aims to address this research gap. It presents this multilingual dataset of 60,127 Instagram posts about mpox, published between July 23, 2022, and September 5, 2024. This dataset contains Instagram posts about mpox in 52 languages.For each of these posts, the Post ID, Post Description, Date of publication, language, and translated version of the post (translation to English was performed using the Google Translate API) are presented as separate attributes in the dataset. After developing this dataset, sentiment analysis, hate speech detection, and anxiety or stress detection were also performed. This process included classifying each post intoone of the fine-grain sentiment classes, i.e., fear, surprise, joy, sadness, anger, disgust, or neutralhate or not hateanxiety/stress detected or no anxiety/stress detected.These results are presented as separate attributes in the dataset for the training and testing of machine learning algorithms for sentiment, hate speech, and anxiety or stress detection, as well as for other applications.The 52 distinct languages in which Instagram posts are present in the dataset are English, Portuguese, Indonesian, Spanish, Korean, French, Hindi, Finnish, Turkish, Italian, German, Tamil, Urdu, Thai, Arabic, Persian, Tagalog, Dutch, Catalan, Bengali, Marathi, Malayalam, Swahili, Afrikaans, Panjabi, Gujarati, Somali, Lithuanian, Norwegian, Estonian, Swedish, Telugu, Russian, Danish, Slovak, Japanese, Kannada, Polish, Vietnamese, Hebrew, Romanian, Nepali, Czech, Modern Greek, Albanian, Croatian, Slovenian, Bulgarian, Ukrainian, Welsh, Hungarian, and Latvian.The following is a description of the attributes present in this dataset:Post ID: Unique ID of each Instagram postPost Description: Complete description of each post in the language in which it was originally publishedDate: Date of publication in MM/DD/YYYY formatLanguage: Language of the post as detected using the Google Translate APITranslated Post Description: Translated version of the post description. All posts which were not in English were translated into English using the Google Translate API. No language translation was performed for English posts.Sentiment: Results of sentiment analysis (using the preprocessed version of the translated Post Description) where each post was classified into one of the sentiment classes: fear, surprise, joy, sadness, anger, disgust, and neutralHate: Results of hate speech detection (using the preprocessed version of the translated Post Description) where each post was classified as hate or not hateAnxiety or Stress: Results of anxiety or stress detection (using the preprocessed version of the translated Post Description) where each post was classified as stress/anxiety detected or no stress/anxiety detected.All the Instagram posts that were collected during this data mining process to develop this dataset were publicly available on Instagram and did not require a user to log in to Instagram to view the same (at the time of writing this paper).

  5. h

    MultiFin

    • huggingface.co
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ashwin Mathur, MultiFin [Dataset]. https://huggingface.co/datasets/awinml/MultiFin
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Authors
    Ashwin Mathur
    Description

    MultiFin

    MultiFin – a publicly available financial dataset consisting of real-world article headlines covering 15 languages across different writing systems and language families. The dataset consists of hierarchical label structure providing two classification tasks: multi-label and multi-class.

      Dataset Description
    

    The MULTIFIN dataset is a multilingual corpus, consisting of real-world article headlines covering 15 languages. The corpus is annotated using hierarchical… See the full description on the dataset page: https://huggingface.co/datasets/awinml/MultiFin.

  6. h

    aime_2024_multilingual

    • huggingface.co
    Updated Jun 4, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Shan Chen (2025). aime_2024_multilingual [Dataset]. https://huggingface.co/datasets/shanchen/aime_2024_multilingual
    Explore at:
    Dataset updated
    Jun 4, 2025
    Authors
    Shan Chen
    Description

    When Models Reason in Your Language: Controlling Thinking Trace Language Comes at the Cost of Accuracy https://arxiv.org/abs/2505.22888 Jirui Qi, Shan Chen, Zidi Xiong, Raquel Fernández, Danielle S. Bitterman, Arianna Bisazza Recent Large Reasoning Models (LRMs) with thinking traces have shown strong performance on English reasoning tasks. However, their ability to think in other languages is less studied. This capability is as important as answer accuracy for real world applications because… See the full description on the dataset page: https://huggingface.co/datasets/shanchen/aime_2024_multilingual.

  7. Z

    MoreFixes: Largest CVE dataset with fixes

    • data.niaid.nih.gov
    • explore.openaire.eu
    • +1more
    Updated Oct 23, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Akhoundali, Jafar (2024). MoreFixes: Largest CVE dataset with fixes [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_11199119
    Explore at:
    Dataset updated
    Oct 23, 2024
    Dataset provided by
    Rietveld, Kristian F. D.
    Rahim Nouri, Sajad
    Akhoundali, Jafar
    GADYATSKAYA, Olga
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    In our work, we have designed and implemented a novel workflow with several heuristic methods to combine state-of-the-art methods related to CVE fix commits gathering. As a consequence of our improvements, we have been able to gather the largest programming language-independent real-world dataset of CVE vulnerabilities with the associated fix commits. Our dataset containing 29,203 unique CVEs coming from 7,238 unique GitHub projects is, to the best of our knowledge, by far the biggest CVE vulnerability dataset with fix commits available today. These CVEs are associated with 35,276 unique commits as sql and 39,931 patch commit files that fixed those vulnerabilities(some patch files can't be saved as sql due to several techincal reasons) Our larger dataset thus substantially improves over the current real-world vulnerability datasets and enables further progress in research on vulnerability detection and software security. We used NVD(nvd.nist.gov) and Github Secuirty advisory Database as the main sources of our pipeline.

    We release to the community a 16GB PostgreSQL database that contains information on CVEs up to 2024-09-26, CWEs of each CVE, files and methods changed by each commit, and repository metadata. Additionally, patch files related to the fix commits are available as a separate package. Furthermore, we make our dataset collection tool also available to the community.

    cvedataset-patches.zip file contains fix patches, and postgrescvedumper.sql.zip contains a postgtesql dump of fixes, together with several other fields such as CVEs, CWEs, repository meta-data, commit data, file changes, method changed, etc.

    MoreFixes data-storage strategy is based on CVEFixes to store CVE commits fixes from open-source repositories, and uses a modified version of Porspector(part of ProjectKB from SAP) as a module to detect commit fixes of a CVE. Our full methodology is presented in the paper, with the title of "MoreFixes: A Large-Scale Dataset of CVE Fix Commits Mined through Enhanced Repository Discovery", which will be published in the Promise conference (2024).

    For more information about usage and sample queries, visit the Github repository: https://github.com/JafarAkhondali/Morefixes

    If you are using this dataset, please be aware that the repositories that we mined contain different licenses and you are responsible to handle any licesnsing issues. This is also the similar case with CVEFixes.

    This product uses the NVD API but is not endorsed or certified by the NVD.

    This research was partially supported by the Dutch Research Council (NWO) under the project NWA.1215.18.008 Cyber Security by Integrated Design (C-SIDe).

    To restore the dataset, you can use the docker-compose file available at the gitub repository. Dataset default credentials after restoring dump:

    POSTGRES_USER=postgrescvedumper POSTGRES_DB=postgrescvedumper POSTGRES_PASSWORD=a42a18537d74c3b7e584c769152c3d

    Please use this for citation:

     title={MoreFixes: A large-scale dataset of CVE fix commits mined through enhanced repository discovery},
     author={Akhoundali, Jafar and Nouri, Sajad Rahim and Rietveld, Kristian and Gadyatskaya, Olga},
     booktitle={Proceedings of the 20th International Conference on Predictive Models and Data Analytics in Software Engineering},
     pages={42--51},
     year={2024}
    }
    
  8. t

    Lohmann, Aaron, Békés, Gábor, Hinz, Julian, Koren, Miklós (2024). Dataset:...

    • service.tib.eu
    Updated Nov 28, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). Lohmann, Aaron, Békés, Gábor, Hinz, Julian, Koren, Miklós (2024). Dataset: Open source software input output tables (ossio). https://doi.org/10.22000/SaNahyIFpqpJVFbb [Dataset]. https://service.tib.eu/ldmservice/dataset/rdr-doi-10-22000-sanahyifpqpjvfbb
    Explore at:
    Dataset updated
    Nov 28, 2024
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Abstract: The global Open- Source Software Input Output (OSSIO) tables were built including five different programming languages and 15 countries. The researchers used knowledge of geographical location of software developers and linkages between software projects (dependencies) to aggregate these to flows between countries. The OSSIO tables were built as part of the EU-funded research project 'Rethinking Global Supply Chains: Measurement, Impact and Policy' (RETHINK-GSC; https://rethink-gsc.eu/), which captures the impact of knowledge flows and service inputs in global supply chains (GSCs).

  9. P

    THAR Dataset Dataset

    • paperswithcode.com
    Updated Mar 22, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). THAR Dataset Dataset [Dataset]. https://paperswithcode.com/dataset/thar-dataset
    Explore at:
    Dataset updated
    Mar 22, 2024
    Description

    The increase in religiously motivated hate on social media is clear and ongoing. These platforms have become fertile ground for the dissemination of hate speech directed at religious communities, resulting in tangible repercussions in the real world. Much of the current research concerning the automated identification of hateful content on social media focuses on English-language content. There is comparatively less exploration in low-resource languages such as Hindi. As social media users increasingly utilize their regional languages for expression, it becomes crucial to dedicate appropriate research efforts to hate speech detection in these languages.

    Hence, this work aims to fill this research void by introducing a meticulously curated and annotated dataset of YouTube comments in Hindi-English code-mixed language, specifically designed to identify instances of religious hate.

    Citation: Sharma, D., Singh, A., & Singh, V. K. (2024). THAR-Targeted Hate Speech Against Religion: A high-quality Hindi-English code-mixed Dataset with the Application of Deep Learning Models for Automatic Detection. ACM Transactions on Asian and Low-Resource Language Information Processing. (https://doi.org/10.1145/3653017)

  10. c

    The global cloud database and DBaaS market size is USD 21.9 billion in 2024...

    • cognitivemarketresearch.com
    pdf,excel,csv,ppt
    Updated May 24, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Cognitive Market Research (2024). The global cloud database and DBaaS market size is USD 21.9 billion in 2024 and will grow at a compound annual growth rate (CAGR) of 21.6% from 2024 to 2031. [Dataset]. https://www.cognitivemarketresearch.com/cloud-database-and-dbaas-market-report
    Explore at:
    pdf,excel,csv,pptAvailable download formats
    Dataset updated
    May 24, 2024
    Dataset authored and provided by
    Cognitive Market Research
    License

    https://www.cognitivemarketresearch.com/privacy-policyhttps://www.cognitivemarketresearch.com/privacy-policy

    Time period covered
    2021 - 2033
    Area covered
    Global
    Description

    According to Cognitive Market Research, the global cloud database and DBaaS marketsize will be USD 21.9 billion in 2024 and will increase at a compound annual growth rate (CAGR) of 21.6% from 2024 to 2031. Market Dynamics of Cloud Database and DBaaS Market Key Drivers for Cloud Database and DBaaS Market Mobile and IoT Adoption - The rise of mobile and IoT technologies fuels demand for cloud databases and DBaaS solutions. Data generation surges as mobile usage skyrockets and IoT devices flourish, necessitating scalable, accessible storage options. Cloud databases offer flexibility and scalability to accommodate these dynamic workloads while enabling seamless integration with mobile and IoT applications. The shift towards digital transformation initiatives also amplifies the need for agile, cloud-native database solutions to support modernization efforts across industries. Automated administration reduces operational complexity, which drives the cloud database and DBaaS market's expansion in the years ahead. Key Restraints for Cloud Database and DBaaS Market Compatibility issues with existing systems hinder the adoption of the cloud database and DBaaS in the industry. The market also faces significant difficulties related to data migration challenges that hinder adoption and scalability.. Introduction of the Cloud Database and DBaaS Market Cloud databases and Database-as-a-Service (DBaaS) offer scalable and managed storage solutions where data is hosted and accessed over the internet. Market drivers for these services include the imperative for scalability to accommodate growing data volumes, cost efficiencies achieved through a shift from capital to operational expenditure, enhanced accessibility enabling collaboration and innovation from any location, heightened demand for robust security features to address data privacy concerns, simplified management through automated administration, and elasticity to handle fluctuating workloads seamlessly. These drivers collectively address modern business needs for flexibility, cost-effectiveness, security, and performance. As organizations increasingly depend on data as a strategic asset, cloud databases, and DBaaS solutions provide the agility and efficiency required to meet evolving demands while leveraging the benefits of cloud computing infrastructure.

  11. Dataset - CORE-MD Post-Market Surveillance Tool

    • zenodo.org
    • data.niaid.nih.gov
    csv
    Updated Apr 24, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yijun Ren; Yijun Ren; Enrico Gianluca Caiani; Enrico Gianluca Caiani (2024). Dataset - CORE-MD Post-Market Surveillance Tool [Dataset]. http://doi.org/10.5281/zenodo.10864069
    Explore at:
    csvAvailable download formats
    Dataset updated
    Apr 24, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Yijun Ren; Yijun Ren; Enrico Gianluca Caiani; Enrico Gianluca Caiani
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Mar 25, 2024
    Description

    WP3 of CORE-MD investigated how to aggregate and extract maximal value for post-market surveillance from medical device registries, big data, clinical practices and experience, and the internet. This data collection was created by the Task 3.2 of the CORE-MD project, as the result of the proposed methodological framework to transform unstructured and dispersed publicly available safety information (Field Safety Notices, recalls, alerts) into a standardized and harmonized database. The databases includes 137,720 historical safety notices (updated to February 2024) safety notices published by different competent national authorities (16 EU Member States and 5 extra EU jurisdictions).

  12. h

    xcopa

    • huggingface.co
    Updated Jun 20, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    SEACrowd (2024). xcopa [Dataset]. https://huggingface.co/datasets/SEACrowd/xcopa
    Explore at:
    Dataset updated
    Jun 20, 2024
    Dataset authored and provided by
    SEACrowd
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    XCOPA: A Multilingual Dataset for Causal Commonsense Reasoning The Cross-lingual Choice of Plausible Alternatives dataset is a benchmark to evaluate the ability of machine learning models to transfer commonsense reasoning across languages. The dataset is the translation and reannotation of the English COPA (Roemmele et al. 2011) and covers 11 languages from 11 families and several areas around the globe. The dataset is challenging as it requires both the command of world knowledge and the ability to generalise to new languages. All the details about the creation of XCOPA and the implementation of the baselines are available in the paper.

  13. P

    STEM Dataset

    • paperswithcode.com
    Updated May 15, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). STEM Dataset [Dataset]. https://paperswithcode.com/dataset/stem
    Explore at:
    Dataset updated
    May 15, 2025
    Description

    This dataset is proposed in the ICLR 2024 paper: Measuring Vision-Language STEM Skills of Neural Models. The problems in the real world often require solutions, combining knowledge from STEM (science, technology, engineering, and math). Unlike existing datasets, our dataset requires the understanding of multimodal vision-language information of STEM. Our dataset features one of the largest and most comprehensive datasets for the challenge. It includes 448 skills and 1,073,146 questions spanning all STEM subjects. Compared to existing datasets that often focus on examining expert-level ability, our dataset includes fundamental skills and questions designed based on the K-12 curriculum. We also add state-of-the-art foundation models such as CLIP and GPT-3.5-Turbo to our benchmark.

  14. THINGS-MEG

    • openneuro.org
    Updated May 29, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Martin N. Hebart; Oliver Contier; Lina Teichmann; Adam H. Rockter; Charles Zheng; Alexis Kidder; Anna Corriveau; Maryam Vaziri-Pashkam; Chris I. Baker (2025). THINGS-MEG [Dataset]. http://doi.org/10.18112/openneuro.ds004212.v3.0.0
    Explore at:
    Dataset updated
    May 29, 2025
    Dataset provided by
    OpenNeurohttps://openneuro.org/
    Authors
    Martin N. Hebart; Oliver Contier; Lina Teichmann; Adam H. Rockter; Charles Zheng; Alexis Kidder; Anna Corriveau; Maryam Vaziri-Pashkam; Chris I. Baker
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    THINGS-MEG

    Understanding object representations visual and semantic processing of objects requires a broad, comprehensive sampling of the objects in our visual world with dense measurements of brain activity and behavior. This densely sampled fMRI dataset is part of THINGS-data, a multimodal collection of large-scale datasets comprising functional MRI, magnetoencephalographic recordings, and 4.70 million behavioral judgments in response to thousands of photographic images for up to 1,854 object concepts. THINGS-data is unique in its breadth of richly-annotated objects, allowing for testing countless novel hypotheses at scale while assessing the reproducibility of previous findings. The multimodal data allows for studying both the temporal and spatial dynamics of object representations and their relationship to behavior and additionally provides the means for combining these datasets for novel insights into object processing. THINGS-data constitutes the core release of the THINGS initiative for bridging the gap between disciplines and the advancement of cognitive neuroscience.

    Dataset overview

    We collected extensively sampled object representations using magnetoencephalography (MEG). To this end, we drew on the THINGS database (Hebart et al., 2019), a richly-annotated database of 1,854 object concepts representative of the American English language which contains 26,107 manually-curated naturalistic object images.

    During the fMRI experiment, participants were shown a representative subset of THINGS images, spread across 12 separate sessions (N=4, 22,448 unique images of 1,854 objects). Images were shown in fast succession (1.5±0.2s), and participants were instructed to maintain central fixation. To ensure engagement, participants performed an oddball detection task responding to occasional artificially-generated images. A subset of images (n=200) were shown repeatedly in each session.

    Beyond the core functional imaging data in response to THINGS images, we acquired T1-weighted MRI scans to allow for cortical source localization. Eye movements were monitored in the MEG to ensure participants maintained central fixation.

  15. Artificial Intelligence (AI) Text Generator Market Analysis North America,...

    • technavio.com
    Updated Jul 15, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Technavio (2024). Artificial Intelligence (AI) Text Generator Market Analysis North America, Europe, APAC, South America, Middle East and Africa - US, UK, China, India, Germany - Size and Forecast 2024-2028 [Dataset]. https://www.technavio.com/report/ai-text-generator-market-analysis
    Explore at:
    Dataset updated
    Jul 15, 2024
    Dataset provided by
    TechNavio
    Authors
    Technavio
    Time period covered
    2021 - 2025
    Area covered
    United States, Global
    Description

    Snapshot img

    Artificial Intelligence Text Generator Market Size 2024-2028

    The artificial intelligence (AI) text generator market size is forecast to increase by USD 908.2 million at a CAGR of 21.22% between 2023 and 2028.

    The market is experiencing significant growth due to several key trends. One of these trends is the increasing popularity of AI generators in various sectors, including education for e-learning applications. Another trend is the growing importance of speech-to-text technology, which is becoming increasingly essential for improving productivity and accessibility. However, data privacy and security concerns remain a challenge for the market, as generators process and store vast amounts of sensitive information. It is crucial for market participants to address these concerns through strong data security measures and transparent data handling practices to ensure customer trust and compliance with regulations. Overall, the AI generator market is poised for continued growth as it offers significant benefits in terms of efficiency, accuracy, and accessibility.
    

    What will be the Size of the Artificial Intelligence (AI) Text Generator Market During the Forecast Period?

    Request Free Sample

    The market is experiencing significant growth as businesses and organizations seek to automate content creation across various industries. Driven by technological advancements in machine learning (ML) and natural language processing, AI generators are increasingly being adopted for downstream applications in sectors such as education, manufacturing, and e-commerce. 
    Moreover, these systems enable the creation of personalized content for global audiences in multiple languages, providing a competitive edge for businesses in an interconnected Internet economy. However, responsible AI practices are crucial to mitigate risks associated with biased content, misinformation, misuse, and potential misrepresentation.
    

    How is this Artificial Intelligence (AI) Text Generator Industry segmented and which is the largest segment?

    The artificial intelligence (AI) text generator industry research report provides comprehensive data (region-wise segment analysis), with forecasts and estimates in 'USD million' for the period 2024-2028, as well as historical data from 2018-2022 for the following segments.

    Component
    
      Solution
      Service
    
    
    Application
    
      Text to text
      Speech to text
      Image/video to text
    
    
    Geography
    
      North America
    
        US
    
    
      Europe
    
        Germany
        UK
    
    
      APAC
    
        China
        India
    
    
      South America
    
    
    
      Middle East and Africa
    

    By Component Insights

    The solution segment is estimated to witness significant growth during the forecast period.
    

    Artificial Intelligence (AI) text generators have gained significant traction in various industries due to their efficiency and cost-effectiveness in content creation. These solutions utilize machine learning algorithms, such as Deep Neural Networks, to analyze and learn from vast datasets of human-written text. By predicting the most probable word or sequence of words based on patterns and relationships identified In the training data, AIgenerators produce personalized content for multiple languages and global audiences. The application spans across industries, including education, manufacturing, e-commerce, and entertainment & media. In the education industry, AI generators assist in creating personalized learning materials.

    Get a glance at the Artificial Intelligence (AI) Text Generator Industry report of share of various segments Request Free Sample

    The solution segment was valued at USD 184.50 million in 2018 and showed a gradual increase during the forecast period.

    Regional Analysis

    North America is estimated to contribute 33% to the growth of the global market during the forecast period.
    

    Technavio's analysts have elaborately explained the regional trends and drivers that shape the market during the forecast period.

    For more insights on the market share of various regions, Request Free Sample

    The North American market holds the largest share in the market, driven by the region's technological advancements and increasing adoption of AI in various industries. AI text generators are increasingly utilized for content creation, customer service, virtual assistants, and chatbots, catering to the growing demand for high-quality, personalized content in sectors such as e-commerce and digital marketing. Moreover, the presence of tech giants like Google, Microsoft, and Amazon in North America, who are investing significantly in AI and machine learning, further fuels market growth. AI generators employ Machine Learning algorithms, Deep Neural Networks, and Natural Language Processing to generate content in multiple languages for global audiences.

    Market Dynamics

    Our researchers analyzed the data with 2023 as the base year, along with the key drivers, trends, and c

  16. h

    x-fact

    • huggingface.co
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    NLP at University of Utah, x-fact [Dataset]. https://huggingface.co/datasets/utahnlp/x-fact
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset authored and provided by
    NLP at University of Utah
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Dataset Card for "x-fact"

      Dataset Description
    
    
    
    
    
      Dataset Summary
    

    X-FACT is a multilingual dataset for fact-checking with real world claims. The dataset contains short statments in 25 languages with top five evidence documents retrieved by performing google search with claim statements. The dataset contains two additional evaluation splits (in addition to a traditional test set): ood and zeroshot. ood measures out-of-domain generalization where while the language… See the full description on the dataset page: https://huggingface.co/datasets/utahnlp/x-fact.

  17. h

    tydiqa

    • huggingface.co
    Updated Jun 20, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    SEACrowd (2024). tydiqa [Dataset]. https://huggingface.co/datasets/SEACrowd/tydiqa
    Explore at:
    Dataset updated
    Jun 20, 2024
    Dataset authored and provided by
    SEACrowd
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    TyDi QA is a question answering dataset covering 11 typologically diverse languages with 204K question-answer pairs. The languages of TyDi QA are diverse with regard to their typology -- the set of linguistic features that each language expresses -- such that we expect models performing well on this set to generalize across a large number of the languages in the world. It contains language phenomena that would not be found in English-only corpora. To provide a realistic information-seeking task and avoid priming effects, questions are written by people who want to know the answer, but don’t know the answer yet, (unlike SQuAD and its descendents) and the data is collected directly in each language without the use of translation (unlike MLQA and XQuAD).

  18. h

    danish-citizen-global-exams

    • huggingface.co
    Updated Apr 1, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mike Zhang (2025). danish-citizen-global-exams [Dataset]. https://huggingface.co/datasets/jjzha/danish-citizen-global-exams
    Explore at:
    Dataset updated
    Apr 1, 2025
    Authors
    Mike Zhang
    License

    https://choosealicense.com/licenses/cc0-1.0/https://choosealicense.com/licenses/cc0-1.0/

    Description

    Dataset Card for "danish-citizen-tests"

    Original point of contact: Dan Saattrup Nielsen from the The Alexandra Institute Processor for Global Exams: Mike Zhang from Aalborg University

      Dataset Summary
    

    This dataset contains tests for citizenship ("indfødsretsprøven") and permanent residence ("medborgerskabsprøven") in Denmark, from the years 2016-2023.

      Languages
    

    The dataset is available in Danish (da).

      Dataset Structure
    

    An example from… See the full description on the dataset page: https://huggingface.co/datasets/jjzha/danish-citizen-global-exams.

  19. h

    Indic-subtitler-audio_evals

    • huggingface.co
    Updated Feb 19, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kurian Benoy (2024). Indic-subtitler-audio_evals [Dataset]. https://huggingface.co/datasets/kurianbenoy/Indic-subtitler-audio_evals
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 19, 2024
    Authors
    Kurian Benoy
    License

    https://choosealicense.com/licenses/gpl-2.0/https://choosealicense.com/licenses/gpl-2.0/

    Description

    Indic_audio_evals

    As part of this project. We are evaluating our performance of various ASR models as well in a benchmarking dataset, we have created in various languages. This benchmarking dataset is more alligned to real-world use-cases rather than having any academic datasets.

      About Dataset
    

    Dataset Link in HuggingFace: kurianbenoy/Indic-subtitler-audio_evals

    This dataset contains audio file in .wav format and video file in .mp4. The respective groundtruth will be… See the full description on the dataset page: https://huggingface.co/datasets/kurianbenoy/Indic-subtitler-audio_evals.

  20. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
World Cuisines (2024). vqa [Dataset]. https://huggingface.co/datasets/worldcuisines/vqa

vqa

worldcuisines/vqa

Explore at:
Dataset updated
Oct 9, 2024
Dataset authored and provided by
World Cuisines
License

Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically

Description

WorldCuisines: A Massive-Scale Benchmark for Multilingual and Multicultural Visual Question Answering on Global Cuisines

WorldCuisines is a massive-scale visual question answering (VQA) benchmark for multilingual and multicultural understanding through global cuisines. The dataset contains text-image pairs across 30 languages and dialects, spanning 9 language families and featuring over 1 million data points, making it the largest multicultural VQA benchmark as of 17 October 2024.… See the full description on the dataset page: https://huggingface.co/datasets/worldcuisines/vqa.

Search
Clear search
Close search
Google apps
Main menu