32 datasets found
  1. d

    Parallel NFT AMM Volume Competition

    • dune.com
    Updated Aug 11, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    souqfinance (2023). Parallel NFT AMM Volume Competition [Dataset]. https://dune.com/discover/content/trending?q=author%3Asouqfinance&resource-type=queries
    Explore at:
    Dataset updated
    Aug 11, 2023
    Dataset authored and provided by
    souqfinance
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Blockchain data query: Parallel NFT AMM Volume Competition

  2. r

    Specification and optimization of analytical data flows

    • resodate.org
    Updated May 27, 2016
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Fabian Hüske (2016). Specification and optimization of analytical data flows [Dataset]. http://doi.org/10.14279/depositonce-5150
    Explore at:
    Dataset updated
    May 27, 2016
    Dataset provided by
    Technische Universität Berlin
    DepositOnce
    Authors
    Fabian Hüske
    Description

    In the past, the majority of data analysis use cases was addressed by aggregating relational data. Since a few years, a trend is evolving, which is called “Big Data” and which has several implications on the field of data analysis. Compared to previous applications, much larger data sets are analyzed using more elaborate and diverse analysis methods such as information extraction techniques, data mining algorithms, and machine learning methods. At the same time, analysis applications include data sets with less or even no structure at all. This evolution has implications on the requirements on data processing systems. Due to the growing size of data sets and the increasing computational complexity of advanced analysis methods, data must be processed in a massively parallel fashion. The large number and diversity of data analysis techniques as well as the lack of data structure determine the use of user-defined functions and data types. Many traditional database systems are not flexible enough to satisfy these requirements. Hence, there is a need for programming abstractions to define and efficiently execute complex parallel data analysis programs that support custom user-defined operations. The success of the SQL query language has shown the advantages of declarative query specification, such as potential for optimization and ease of use. Today, most relational database management systems feature a query optimizer that compiles declarative queries into physical execution plans. Cost-based optimizers choose from billions of plan candidates the plan with the least estimated cost. However, traditional optimization techniques cannot be readily integrated into systems that aim to support novel data analysis use cases. For example, the use of user-defined functions (UDFs) can significantly limit the optimization potential of data analysis programs. Furthermore, lack of detailed data statistics is common when large amounts of unstructured data is analyzed. This leads to imprecise optimizer cost estimates, which can cause sub-optimal plan choices. In this thesis we address three challenges that arise in the context of specifying and optimizing data analysis programs. First, we propose a parallel programming model with declarative properties to specify data analysis tasks as data flow programs. In this model, data processing operators are composed of a system-provided second-order function and a user-defined first-order function. A cost-based optimizer compiles data flow programs specified in this abstraction into parallel data flows. The optimizer borrows techniques from relational optimizers and ports them to the domain of general-purpose parallel programming models. Second, we propose an approach to enhance the optimization of data flow programs that include UDF operators with unknown semantics. We identify operator properties and conditions to reorder neighboring UDF operators without changing the semantics of the program. We show how to automatically extract these properties from UDF operators by leveraging static code analysis techniques. Our approach is able to emulate relational optimizations such as filter and join reordering and holistic aggregation push-down while not being limited to relational operators. Finally, we analyze the impact of changing execution conditions such as varying predicate selectivities and memory budgets on the performance of relational query plans. We identify plan patterns that cause significantly varying execution performance for changing execution conditions. Plans that include such risky patterns are prone to cause problems in presence of imprecise optimizer estimates. Based on our findings, we introduce an approach to avoid risky plan choices. Moreover, we present a method to assess the risk of a query execution plan using a machine-learned prediction model. Experiments show that the prediction model outperforms risk predictions which are computed from optimizer estimates.

  3. d

    Parallel-prime-token-holder

    • dune.com
    Updated Jun 24, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    jack_quest1 (2024). Parallel-prime-token-holder [Dataset]. https://dune.com/discover/content/relevant?q=author:jack_quest1&resource-type=queries
    Explore at:
    Dataset updated
    Jun 24, 2024
    Authors
    jack_quest1
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Blockchain data query: Parallel-prime-token-holder

  4. F

    English-Dutch Parallel Corpus for the Management Domain

    • futurebeeai.com
    wav
    Updated Aug 1, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    FutureBee AI (2022). English-Dutch Parallel Corpus for the Management Domain [Dataset]. https://www.futurebeeai.com/dataset/parallel-corpora/dutch-english-translated-parallel-corpus-for-management-domain
    Explore at:
    wavAvailable download formats
    Dataset updated
    Aug 1, 2022
    Dataset provided by
    FutureBeeAI
    Authors
    FutureBee AI
    License

    https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement

    Dataset funded by
    FutureBeeAI
    Description

    Introduction

    Welcome to the English-Dutch Bilingual Parallel Corpora dataset for the Management domain. This comprehensive dataset contains a large collection of bilingual sentence pairs, carefully translated between English and Dutch, designed to support the development of management-specific language models, natural language processing systems, and machine translation engines.

    Dataset Content

    Volume and Diversity
    Extensive Coverage: Contains over 50,000 high-quality sentence pairs suitable for various language technology applications.
    Translator Diversity: Created by more than 200 native Dutch linguists, ensuring a wide range of linguistic styles, regional expressions, and translation approaches.
    Sentence Diversity
    Word Count: Sentences range between 7 and 25 words, suitable for NLP model training and evaluation.
    Syntactic Structures: Includes simple, compound, and complex sentences.
    Grammatical Forms: Interrogative and imperative constructions to reflect practical and directive language, Affirmative and negative statements to cover different polarities, Active and passive voice to offer multiple linguistic perspectives
    Idiomatic and Figurative Language: Incorporates business-related metaphors, idiomatic phrases, and figurative expressions common in the management domain.
    Discourse Markers: Includes conjunctions, transitional phrases, and logical connectors to ensure coherent and natural sentence flow.
    Cross Translation: The dataset includes both English-to-Dutch and Dutch-to-English translations to support bi-directional translation system development.

    Domain-Specific Content

    Terminology: Covers a broad lexicon of management-related terms from areas such as business strategy, leadership, marketing, operations, finance, and human resources.
    Authentic Language Use: Captures expressions, idioms, and terminology found in real-world management contexts, including reports, case studies, presentations, and corporate dialogues.
    Contextual Variety: Includes content from business reports, management literature, corporate communications, organizational behavior studies, and financial documents.
    Cross-Domain Applicability: Also incorporates content from related fields such as economics, psychology, sociology, and technology, enriching the dataset's real-world relevance.

    Format and Structure

    File Formats: Delivered in Excel format, with the option to convert into JSON, TMX, XML, XLIFF, XLS, and other widely used industry formats.
    Structure Fields: Serial Number, Unique ID, Source Sentence and Source Word Count, Target Sentence and Target Word Count

    Usage and Application

    Machine Translation: Useful for building and fine-tuning translation models for management-specific content.
    NLP Applications: Enhances tools such as predictive keyboards, grammar and spell checkers, and speech/text understanding systems focused on business and management contexts.
    Large Language Model (LLM) Training: Supports fine-tuning of LLMs for use cases such as generating business articles, summarizing market insights, interpreting corporate data, and answering management-related queries.

    Secure and Ethical Collection

    Data Collection Platform: Built entirely through FutureBeeAI’s proprietary Yugo platform, ensuring control, quality, and traceability.
    Confidentiality and Compliance:
    <span

  5. F

    English-Bengali Parallel Corpus for the Management Domain

    • futurebeeai.com
    wav
    Updated Aug 1, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    FutureBee AI (2022). English-Bengali Parallel Corpus for the Management Domain [Dataset]. https://www.futurebeeai.com/dataset/parallel-corpora/bengali-english-translated-parallel-corpus-for-management-domain
    Explore at:
    wavAvailable download formats
    Dataset updated
    Aug 1, 2022
    Dataset provided by
    FutureBeeAI
    Authors
    FutureBee AI
    License

    https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement

    Dataset funded by
    FutureBeeAI
    Description

    Introduction

    Welcome to the English-Bengali Bilingual Parallel Corpora dataset for the Management domain. This comprehensive dataset contains a large collection of bilingual sentence pairs, carefully translated between English and Bengali, designed to support the development of management-specific language models, natural language processing systems, and machine translation engines.

    Dataset Content

    Volume and Diversity
    Extensive Coverage: Contains over 50,000 high-quality sentence pairs suitable for various language technology applications.
    Translator Diversity: Created by more than 200 native Bengali linguists, ensuring a wide range of linguistic styles, regional expressions, and translation approaches.
    Sentence Diversity
    Word Count: Sentences range between 7 and 25 words, suitable for NLP model training and evaluation.
    Syntactic Structures: Includes simple, compound, and complex sentences.
    Grammatical Forms: Interrogative and imperative constructions to reflect practical and directive language, Affirmative and negative statements to cover different polarities, Active and passive voice to offer multiple linguistic perspectives
    Idiomatic and Figurative Language: Incorporates business-related metaphors, idiomatic phrases, and figurative expressions common in the management domain.
    Discourse Markers: Includes conjunctions, transitional phrases, and logical connectors to ensure coherent and natural sentence flow.
    Cross Translation: The dataset includes both English-to-Bengali and Bengali-to-English translations to support bi-directional translation system development.

    Domain-Specific Content

    Terminology: Covers a broad lexicon of management-related terms from areas such as business strategy, leadership, marketing, operations, finance, and human resources.
    Authentic Language Use: Captures expressions, idioms, and terminology found in real-world management contexts, including reports, case studies, presentations, and corporate dialogues.
    Contextual Variety: Includes content from business reports, management literature, corporate communications, organizational behavior studies, and financial documents.
    Cross-Domain Applicability: Also incorporates content from related fields such as economics, psychology, sociology, and technology, enriching the dataset's real-world relevance.

    Format and Structure

    File Formats: Delivered in Excel format, with the option to convert into JSON, TMX, XML, XLIFF, XLS, and other widely used industry formats.
    Structure Fields: Serial Number, Unique ID, Source Sentence and Source Word Count, Target Sentence and Target Word Count

    Usage and Application

    Machine Translation: Useful for building and fine-tuning translation models for management-specific content.
    NLP Applications: Enhances tools such as predictive keyboards, grammar and spell checkers, and speech/text understanding systems focused on business and management contexts.
    Large Language Model (LLM) Training: Supports fine-tuning of LLMs for use cases such as generating business articles, summarizing market insights, interpreting corporate data, and answering management-related queries.

    Secure and Ethical Collection

    Data Collection Platform: Built entirely through FutureBeeAI’s proprietary Yugo platform, ensuring control, quality, and traceability.
    Confidentiality and Compliance:
    <span

  6. Data from: Dataset associated with "Polyhedral optimizations of RNA-RNA...

    • commons.datacite.org
    Updated 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Swetha Varadarajan (2018). Dataset associated with "Polyhedral optimizations of RNA-RNA interaction computations" [Dataset]. http://doi.org/10.25675/10217/191189
    Explore at:
    Dataset updated
    2018
    Dataset provided by
    DataCitehttps://www.datacite.org/
    Mountain Scholar
    Authors
    Swetha Varadarajan
    Description

    These files can be used to re-create the results in the thesis manuscript: "Polyhedral Optimizations of RNA-RNA Interaction Computations". One will need to use the AlphaZ tool (http://www.cs.colostate.edu/AlphaZ/wiki/doku.php ) to produce result from the .ab and .cs file. For users not aware of AlphaZ can use the C codes.

  7. F

    English-Bulgarian Parallel Corpus for the Management Domain

    • futurebeeai.com
    wav
    Updated Aug 1, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    FutureBee AI (2022). English-Bulgarian Parallel Corpus for the Management Domain [Dataset]. https://www.futurebeeai.com/dataset/parallel-corpora/bulgarian-english-translated-parallel-corpus-for-management-domain
    Explore at:
    wavAvailable download formats
    Dataset updated
    Aug 1, 2022
    Dataset provided by
    FutureBeeAI
    Authors
    FutureBee AI
    License

    https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement

    Dataset funded by
    FutureBeeAI
    Description

    Introduction

    Welcome to the English-Bulgarian Bilingual Parallel Corpora dataset for the Management domain. This comprehensive dataset contains a large collection of bilingual sentence pairs, carefully translated between English and Bulgarian, designed to support the development of management-specific language models, natural language processing systems, and machine translation engines.

    Dataset Content

    Volume and Diversity
    Extensive Coverage: Contains over 50,000 high-quality sentence pairs suitable for various language technology applications.
    Translator Diversity: Created by more than 200 native Bulgarian linguists, ensuring a wide range of linguistic styles, regional expressions, and translation approaches.
    Sentence Diversity
    Word Count: Sentences range between 7 and 25 words, suitable for NLP model training and evaluation.
    Syntactic Structures: Includes simple, compound, and complex sentences.
    Grammatical Forms: Interrogative and imperative constructions to reflect practical and directive language, Affirmative and negative statements to cover different polarities, Active and passive voice to offer multiple linguistic perspectives
    Idiomatic and Figurative Language: Incorporates business-related metaphors, idiomatic phrases, and figurative expressions common in the management domain.
    Discourse Markers: Includes conjunctions, transitional phrases, and logical connectors to ensure coherent and natural sentence flow.
    Cross Translation: The dataset includes both English-to-Bulgarian and Bulgarian-to-English translations to support bi-directional translation system development.

    Domain-Specific Content

    Terminology: Covers a broad lexicon of management-related terms from areas such as business strategy, leadership, marketing, operations, finance, and human resources.
    Authentic Language Use: Captures expressions, idioms, and terminology found in real-world management contexts, including reports, case studies, presentations, and corporate dialogues.
    Contextual Variety: Includes content from business reports, management literature, corporate communications, organizational behavior studies, and financial documents.
    Cross-Domain Applicability: Also incorporates content from related fields such as economics, psychology, sociology, and technology, enriching the dataset's real-world relevance.

    Format and Structure

    File Formats: Delivered in Excel format, with the option to convert into JSON, TMX, XML, XLIFF, XLS, and other widely used industry formats.
    Structure Fields: Serial Number, Unique ID, Source Sentence and Source Word Count, Target Sentence and Target Word Count

    Usage and Application

    Machine Translation: Useful for building and fine-tuning translation models for management-specific content.
    NLP Applications: Enhances tools such as predictive keyboards, grammar and spell checkers, and speech/text understanding systems focused on business and management contexts.
    Large Language Model (LLM) Training: Supports fine-tuning of LLMs for use cases such as generating business articles, summarizing market insights, interpreting corporate data, and answering management-related queries.

    Secure and Ethical Collection

    Data Collection Platform: Built entirely through FutureBeeAI’s proprietary Yugo platform, ensuring control, quality, and traceability.
    Confidentiality and Compliance:
    <span

  8. SauLTCv1

    • kaggle.com
    zip
    Updated Sep 24, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    SauLTC Corpus (2024). SauLTCv1 [Dataset]. https://www.kaggle.com/datasets/saultccorpus/saultcv1
    Explore at:
    zip(22035649 bytes)Available download formats
    Dataset updated
    Sep 24, 2024
    Authors
    SauLTC Corpus
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    SauLTC (Saudi Learner Translator Corpus) is a uni-directional POS-tagged English-Arabic parallel and sentence-aligned learner corpus. This multi-version corpus features linguistic annotation, complemented with an interface for monolingual or bilingual querying of the data.

    SauLTC was initiated as a source of data for translation studies and learner corpora research. The corpus can be utilized in the examination of linguistic properties of translations, translation quality assessment, variation in translation, translational competence, and cross-linguistic transference. For example, one of SauLTC’s functionalities allows the examination of what a trainee translator produces on her own (draft translation) and the effect of an expert translator’s feedback (final translation submission).

    The translation program at PNU includes a four-credit hour graduation project course, where students, at the end of the program, are required to translate a booklet or book chapter from English into Arabic. The course is designed to help students demonstrate their translation competence and to apply all the skills they have acquired into translating longer texts (6000+). The typical arrangement of the course is as follows: The student selects a text she prefers and obtains her instructor’s approval. The source texts of SauLTC include chapters or booklet extractions from fiction, self-help, biography, history, health, psychology, religion, culture, management, or science. Then she submits a draft translation to her instructor who reads the translation and meets with the student to discuss her output. The discussion highlights both the strengths and weaknesses of the translation and gives the student the chance to justify her linguistic choices when translating the text. Based on the instructor’s feedback, the student makes the necessary changes in the translation and once again submits it to the instructor (final translation). The translation students and instructors give their consent to include their translation projects (the source texts, the first drafts, and the final drafts of their translations) in SauLTC. All three documents are collected in one folder under the student’s name in addition to the student’s profile information.

    The corpus is currently in its first version of two million words with a proposed plan to include more translated texts from students at other universities in Saudi Arabia. Each student’s contribution includes a learner profile, the source text, the draft translation, and the post-instructor feedback final submission, all of which are enriched with searchable profile metadata.

    The Auto-aligner at WordFast Anywhere was utilized for the automatic parallelization of the source text, draft text, and final submission text. This automated process was followed by a manual verification conducted by professional translators.

    The query interface supports lexical and PoS search for both sources and targets and returns sentences with the query item along with their targets/sources. The query results can be filtered by several metadata fields, including the translator’s age, translation assessment grade, year, and source text genre.

  9. n

    Data from: Parallel evolution of bower-building behavior in two groups of...

    • data.niaid.nih.gov
    • zenodo.org
    • +1more
    zip
    Updated Nov 15, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Per G. P. Ericson; Martin Irestedt; Johan A. A. Nylander; Les Christidis; Leo Joseph; Yanhua Qu (2023). Parallel evolution of bower-building behavior in two groups of bowerbirds suggested by phylogenomics [Dataset]. http://doi.org/10.5061/dryad.6hdr7sqwp
    Explore at:
    zipAvailable download formats
    Dataset updated
    Nov 15, 2023
    Dataset provided by
    Commonwealth Scientific and Industrial Research Organisation
    Swedish Museum of Natural History
    Chinese Academy of Sciences
    Southern Cross University
    Authors
    Per G. P. Ericson; Martin Irestedt; Johan A. A. Nylander; Les Christidis; Leo Joseph; Yanhua Qu
    License

    https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html

    Description

    The bowerbirds in New Guinea and Australia include species that build the largest and perhaps most elaborately decorated constructions outside of humans. The males use these courtship bowers, along with their displays, to attract females. In these species, the mating system is polygynous and the females alone incubate and feed the nestlings. The bowerbirds also include 10 species of the socially monogamous catbirds in which the male participates in most aspects of raising the young. How the bower-building behavior evolved has remained poorly understood, as no comprehensive phylogeny exists for the family. It has been assumed that the monogamous catbird clade is sister to all polygynous species. We here test this hypothesis using a newly developed pipeline for obtaining homologous alignments of thousands of exonic and intronic regions from genomic data to build a phylogeny. Our well-supported species tree shows that the polygynous, bower-building species are not monophyletic. The result suggests either that bower-building behavior is an ancestral condition in the family that was secondarily lost in the catbirds, or that it has arisen in parallel in two lineages of bowerbirds. We favor the latter hypothesis based on an ancestral character reconstruction showing that polygyny but not bower-building is ancestral in bowerbirds, and on the observation that Scenopoeetes dentirostris, the sister species to one of the bower-building clades, does not build a proper bower but constructs a court for male display. This species is also sexually monomorphic in plumage despite having a polygynous mating system. We argue that the relatively stable tropical and subtropical forest environment in combination with low predator pressure and rich food access (mostly fruit) facilitated the evolution of these unique life-history traits.

    Methods This is supplementary material to the manuscript "Parallel evolution of bower-building behavior and polygyny in two groups of bowerbirds suggested by phylogenomics". We used the Birdscanner pipeline (available at github.com/Naturhistoriska/birdscanner.git) to obtain homologous alignments of 5653 exonic and 7020 intronic regions from whole-genome sequence data. The pipeline utilize probabilistic queries using hidden Markov models that were used to probe the mapped bowerbird genomes to find where they had their best fit. For each query and taxon we obtained genomic coordinates for the best hits that were then ranked according to their “sequence E-values”, i.e. the expected number of false positives (non-homologous sequences) that scored this well or better. For each query and taxon the sequences for the hits with the lowest values were parsed out using the genomic coordinates. These were then aligned in separate files for exonic and intronic loci. Poorly aligned sequences were identified, based on a calculated distance matrix using OD-Seq (github.com/PeterJehl/OD-Seq), and excluded from the further analyses. We also checked the alignments manually and removed those that included non-homologous sequences for some taxa (indicated by an extreme proportion of variable positions in the alignment) and those that contained no phylogenetically information. Individual trees were constructed using IQ-TREE that automatically selects the best substitution model for each loci alignment. We used ASTRAL-III to construct species trees from the gene trees both for the exonic and intronic loci separately and for all loci combined. ASTRAL estimates a species tree given a set of unrooted gene trees and branch support is calculated using local posterior probabilities. We assembled mitochondrial genomes from the resequenced data for each individual using MITObim , and used 12 of the 13 protein-coding genes to infer the phylogenetic tree. The aligned mitochondrial data set used in the analyses consists of 10,560 bp (3,520 codons). The phylogenetic analysis of the mitogenomic data set was performed with MEGA X . We estimated the maximum-likelihood tree for the mitochondrial data using 100 bootstrap replicates to assess the reliability of the branches. The data set was analyzed both with all codon positions present and with the third codon positions excluded.

  10. F

    English-Portuguese Parallel Corpus for the Gaming Domain

    • futurebeeai.com
    wav
    Updated Aug 1, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    FutureBee AI (2022). English-Portuguese Parallel Corpus for the Gaming Domain [Dataset]. https://www.futurebeeai.com/dataset/parallel-corpora/portuguese-english-translated-parallel-corpus-for-gaming-domain
    Explore at:
    wavAvailable download formats
    Dataset updated
    Aug 1, 2022
    Dataset provided by
    FutureBeeAI
    Authors
    FutureBee AI
    License

    https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement

    Dataset funded by
    FutureBeeAI
    Description

    Introduction

    The English-Portuguese Gaming Parallel Corpora is a curated bilingual dataset designed to support game localization, machine translation, and language model training for the Gaming industry. It consists of over 50,000 sentence pairs, professionally translated between English and Portuguese, capturing the linguistic and cultural depth of gaming content.

    Dataset Content

    Volume and Translator Diversity
    Total Sentence Pairs: 50,000+
    Contributors: Over 200 native and professional translators
    Source: All content is original and tailored specifically for the Gaming domain
    Sentence Variety
    Sentence Length: 7 to 25 words
    Sentence Types: Includes simple, compound, and complex sentences
    Forms Covered: Interrogative, imperative, affirmative, and negative sentences
    Voice Diversity: Sentences written in both active and passive voice
    Stylistic Coverage: Includes idioms, metaphors, gaming slang, and figurative expressions
    Discourse Elements: Contains conjunctions, logical connectors, and transitional phrases for natural flow
    Bidirectional Structure: Includes English to Portuguese and Portuguese to English translations for robust model training

    Domain-Specific Focus

    Gaming Language Coverage
    Terminology: Covers in-game elements, UI/UX, controls, multiplayer features, and genre-specific phrases
    Dialogue Content: Includes NPC dialogue, tutorial lines, mission briefings, walkthroughs, and strategy guidance
    Communication Scenarios: Reflects live chat, support queries, and multiplayer messaging
    Cross-Domain Inclusion: Contains relevant terms from adjacent domains like entertainment, esports, virtual worlds, and AR/VR
    Format and Structure
    File Formats: Delivered in Excel, with optional conversion to JSON, TMX, XML, XLIFF, XLS, or other standard formats
    Structure Fields: Serial Number, Unique ID, Source Sentence, Source Word Count, Target Sentence, Target Word Count
    Sentence Alignment: Sentence-level parallel pairs with consistent formatting for MT pipelines

    Usage and Applications

    Machine Translation: Train and fine-tune domain-specific MT engines for gaming content
    Game Localization: Adapt games across English-Portuguese markets while preserving nuance and playability
    NLP Tools: Power predictive keyboards, grammar checkers, spelling correction, and sentence completion models
    LLM Fine-Tuning: Strengthen bilingual comprehension and translation capabilities in large language models
    Dialogue Systems: Enable context-aware, conversational AI for in-game or support environments
    Bilingual Retrieval: Use for cross-language search, sentence matching, and similarity scoring

    Alignment Confidence and Quality Assurance

    All translations are manually verified by native bilingual experts for accuracy, naturalness, and domain relevance
    Each sentence pair is reviewed to ensure semantic alignment and stylistic

  11. h

    EuroPIRQ-retrieval

    • huggingface.co
    Updated Apr 24, 2007
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Elias H (2007). EuroPIRQ-retrieval [Dataset]. https://huggingface.co/datasets/eherra/EuroPIRQ-retrieval
    Explore at:
    Dataset updated
    Apr 24, 2007
    Authors
    Elias H
    Description

    EuroPIRQ: European Parallel Information Retrieval Queries

      Dataset Details
    

    The EuroPIRQ retrieval dataset is a multilingual collection designed for evaluating retrieval and cross-lingual retrieval tasks. Dataset contains 10,000 parallel passages & 100 parallel queries (synthetic) in three languages: English🇬🇧, Portuguese🇵🇹, and Finnish🇫🇮, constructed from the European Union's DGT-Acquis corpus.

    Languages: English (en), Portuguese (pt), Finnish (fi) Format: JSONL… See the full description on the dataset page: https://huggingface.co/datasets/eherra/EuroPIRQ-retrieval.

  12. h

    dpo_irish_eng_translations

    • huggingface.co
    Updated Aug 18, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Cian Prendergast (2024). dpo_irish_eng_translations [Dataset]. https://huggingface.co/datasets/c123ian/dpo_irish_eng_translations
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 18, 2024
    Authors
    Cian Prendergast
    Description

    This is a test for my DPO dataset for Irish ENglish trasnlslations, raw data origin : https://www.gaois.ie/en/corpora/parallel?Query=Apple&Language=en&SearchMode=exact&PerPage=50, used COMETXL refrernce free maodel Unbabel/wmt23-cometkiwi-da-xl (which has been trained to asses Irish) to score accepted/rejected. Used GPT4 to generate translations to compare with human stranslations of Irish legislation (which has to have a Irisng/English copy by law)

  13. Parameters used in performance evaluation for synthetic data.

    • plos.figshare.com
    xls
    Updated Jun 15, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yong-Ki Kim; Hyeong-Jin Kim; Hyunjo Lee; Jae-Woo Chang (2023). Parameters used in performance evaluation for synthetic data. [Dataset]. http://doi.org/10.1371/journal.pone.0267908.t004
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 15, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Yong-Ki Kim; Hyeong-Jin Kim; Hyunjo Lee; Jae-Woo Chang
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Parameters used in performance evaluation for synthetic data.

  14. F

    English-German Parallel Corpus for the Management Domain

    • futurebeeai.com
    wav
    Updated Aug 1, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    FutureBee AI (2022). English-German Parallel Corpus for the Management Domain [Dataset]. https://www.futurebeeai.com/dataset/parallel-corpora/german-english-translated-parallel-corpus-for-management-domain
    Explore at:
    wavAvailable download formats
    Dataset updated
    Aug 1, 2022
    Dataset provided by
    FutureBeeAI
    Authors
    FutureBee AI
    License

    https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement

    Dataset funded by
    FutureBeeAI
    Description

    Introduction

    Welcome to the English-German Bilingual Parallel Corpora dataset for the Management domain. This comprehensive dataset contains a large collection of bilingual sentence pairs, carefully translated between English and German, designed to support the development of management-specific language models, natural language processing systems, and machine translation engines.

    Dataset Content

    Volume and Diversity
    Extensive Coverage: Contains over 50,000 high-quality sentence pairs suitable for various language technology applications.
    Translator Diversity: Created by more than 200 native German linguists, ensuring a wide range of linguistic styles, regional expressions, and translation approaches.
    Sentence Diversity
    Word Count: Sentences range between 7 and 25 words, suitable for NLP model training and evaluation.
    Syntactic Structures: Includes simple, compound, and complex sentences.
    Grammatical Forms: Interrogative and imperative constructions to reflect practical and directive language, Affirmative and negative statements to cover different polarities, Active and passive voice to offer multiple linguistic perspectives
    Idiomatic and Figurative Language: Incorporates business-related metaphors, idiomatic phrases, and figurative expressions common in the management domain.
    Discourse Markers: Includes conjunctions, transitional phrases, and logical connectors to ensure coherent and natural sentence flow.
    Cross Translation: The dataset includes both English-to-German and German-to-English translations to support bi-directional translation system development.

    Domain-Specific Content

    Terminology: Covers a broad lexicon of management-related terms from areas such as business strategy, leadership, marketing, operations, finance, and human resources.
    Authentic Language Use: Captures expressions, idioms, and terminology found in real-world management contexts, including reports, case studies, presentations, and corporate dialogues.
    Contextual Variety: Includes content from business reports, management literature, corporate communications, organizational behavior studies, and financial documents.
    Cross-Domain Applicability: Also incorporates content from related fields such as economics, psychology, sociology, and technology, enriching the dataset's real-world relevance.

    Format and Structure

    File Formats: Delivered in Excel format, with the option to convert into JSON, TMX, XML, XLIFF, XLS, and other widely used industry formats.
    Structure Fields: Serial Number, Unique ID, Source Sentence and Source Word Count, Target Sentence and Target Word Count

    Usage and Application

    Machine Translation: Useful for building and fine-tuning translation models for management-specific content.
    NLP Applications: Enhances tools such as predictive keyboards, grammar and spell checkers, and speech/text understanding systems focused on business and management contexts.
    Large Language Model (LLM) Training: Supports fine-tuning of LLMs for use cases such as generating business articles, summarizing market insights, interpreting corporate data, and answering management-related queries.

    Secure and Ethical Collection

    Data Collection Platform: Built entirely through FutureBeeAI’s proprietary Yugo platform, ensuring control, quality, and traceability.
    Confidentiality and Compliance:
    <span

  15. Nepali English Parallel Audio Text Dataset

    • kaggle.com
    zip
    Updated Nov 11, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Prashant Raj Bista (2025). Nepali English Parallel Audio Text Dataset [Dataset]. https://www.kaggle.com/datasets/prashantrajbista/nepali-english-parallel-audio-text-dataset
    Explore at:
    zip(952626877 bytes)Available download formats
    Dataset updated
    Nov 11, 2025
    Authors
    Prashant Raj Bista
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    This dataset contains a Nepali-English parallel speech corpus designed to support research in speech recognition, speech translation, and multilingual language processing.

    It includes audio pairs and their corresponding transcriptions in both Nepali and English, carefully aligned.

    Total Nepali Audio Duration: ~125 minutes Total English Audio Duration: ~112 minutes Number of Speakers: 4 (covering both male and female voices) Content Type: Conversational and general-purpose sentences covering common expressions when one is travelling through the country, short queries, and contextually varied speech samples. Format: Each audio file is accompanied by a corresponding text transcription and metadata linking the Nepali and English versions.

    Purpose: Created to assist researchers, developers, and linguists working on Nepali-English speech technologies, such as automatic speech recognition (ASR), speech-to-text (STT), and speech translation systems.

    All data has been manually curated and verified to ensure clarity, alignment accuracy, and quality balance across both languages.

  16. F

    English-Malayalam Parallel Corpus for the Management Domain

    • futurebeeai.com
    wav
    Updated Aug 1, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    FutureBee AI (2022). English-Malayalam Parallel Corpus for the Management Domain [Dataset]. https://www.futurebeeai.com/dataset/parallel-corpora/malayalam-english-translated-parallel-corpus-for-management-domain
    Explore at:
    wavAvailable download formats
    Dataset updated
    Aug 1, 2022
    Dataset provided by
    FutureBeeAI
    Authors
    FutureBee AI
    License

    https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement

    Dataset funded by
    FutureBeeAI
    Description

    Introduction

    Welcome to the English-Malayalam Bilingual Parallel Corpora dataset for the Management domain. This comprehensive dataset contains a large collection of bilingual sentence pairs, carefully translated between English and Malayalam, designed to support the development of management-specific language models, natural language processing systems, and machine translation engines.

    Dataset Content

    Volume and Diversity
    Extensive Coverage: Contains over 50,000 high-quality sentence pairs suitable for various language technology applications.
    Translator Diversity: Created by more than 200 native Malayalam linguists, ensuring a wide range of linguistic styles, regional expressions, and translation approaches.
    Sentence Diversity
    Word Count: Sentences range between 7 and 25 words, suitable for NLP model training and evaluation.
    Syntactic Structures: Includes simple, compound, and complex sentences.
    Grammatical Forms: Interrogative and imperative constructions to reflect practical and directive language, Affirmative and negative statements to cover different polarities, Active and passive voice to offer multiple linguistic perspectives
    Idiomatic and Figurative Language: Incorporates business-related metaphors, idiomatic phrases, and figurative expressions common in the management domain.
    Discourse Markers: Includes conjunctions, transitional phrases, and logical connectors to ensure coherent and natural sentence flow.
    Cross Translation: The dataset includes both English-to-Malayalam and Malayalam-to-English translations to support bi-directional translation system development.

    Domain-Specific Content

    Terminology: Covers a broad lexicon of management-related terms from areas such as business strategy, leadership, marketing, operations, finance, and human resources.
    Authentic Language Use: Captures expressions, idioms, and terminology found in real-world management contexts, including reports, case studies, presentations, and corporate dialogues.
    Contextual Variety: Includes content from business reports, management literature, corporate communications, organizational behavior studies, and financial documents.
    Cross-Domain Applicability: Also incorporates content from related fields such as economics, psychology, sociology, and technology, enriching the dataset's real-world relevance.

    Format and Structure

    File Formats: Delivered in Excel format, with the option to convert into JSON, TMX, XML, XLIFF, XLS, and other widely used industry formats.
    Structure Fields: Serial Number, Unique ID, Source Sentence and Source Word Count, Target Sentence and Target Word Count

    Usage and Application

    Machine Translation: Useful for building and fine-tuning translation models for management-specific content.
    NLP Applications: Enhances tools such as predictive keyboards, grammar and spell checkers, and speech/text understanding systems focused on business and management contexts.
    Large Language Model (LLM) Training: Supports fine-tuning of LLMs for use cases such as generating business articles, summarizing market insights, interpreting corporate data, and answering management-related queries.

    Secure and Ethical Collection

    Data Collection Platform: Built entirely through FutureBeeAI’s proprietary Yugo platform, ensuring control, quality, and traceability.
    Confidentiality and Compliance:
    <span

  17. F

    English-Arabic Parallel Corpus for the Management Domain

    • futurebeeai.com
    wav
    Updated Aug 1, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    FutureBee AI (2022). English-Arabic Parallel Corpus for the Management Domain [Dataset]. https://www.futurebeeai.com/dataset/parallel-corpora/arabic-english-translated-parallel-corpus-for-management-domain
    Explore at:
    wavAvailable download formats
    Dataset updated
    Aug 1, 2022
    Dataset provided by
    FutureBeeAI
    Authors
    FutureBee AI
    License

    https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement

    Dataset funded by
    FutureBeeAI
    Description

    Introduction

    Welcome to the English-Arabic Bilingual Parallel Corpora dataset for the Management domain. This comprehensive dataset contains a large collection of bilingual sentence pairs, carefully translated between English and Arabic, designed to support the development of management-specific language models, natural language processing systems, and machine translation engines.

    Dataset Content

    Volume and Diversity
    Extensive Coverage: Contains over 50,000 high-quality sentence pairs suitable for various language technology applications.
    Translator Diversity: Created by more than 200 native Arabic linguists, ensuring a wide range of linguistic styles, regional expressions, and translation approaches.
    Sentence Diversity
    Word Count: Sentences range between 7 and 25 words, suitable for NLP model training and evaluation.
    Syntactic Structures: Includes simple, compound, and complex sentences.
    Grammatical Forms: Interrogative and imperative constructions to reflect practical and directive language, Affirmative and negative statements to cover different polarities, Active and passive voice to offer multiple linguistic perspectives
    Idiomatic and Figurative Language: Incorporates business-related metaphors, idiomatic phrases, and figurative expressions common in the management domain.
    Discourse Markers: Includes conjunctions, transitional phrases, and logical connectors to ensure coherent and natural sentence flow.
    Cross Translation: The dataset includes both English-to-Arabic and Arabic-to-English translations to support bi-directional translation system development.

    Domain-Specific Content

    Terminology: Covers a broad lexicon of management-related terms from areas such as business strategy, leadership, marketing, operations, finance, and human resources.
    Authentic Language Use: Captures expressions, idioms, and terminology found in real-world management contexts, including reports, case studies, presentations, and corporate dialogues.
    Contextual Variety: Includes content from business reports, management literature, corporate communications, organizational behavior studies, and financial documents.
    Cross-Domain Applicability: Also incorporates content from related fields such as economics, psychology, sociology, and technology, enriching the dataset's real-world relevance.

    Format and Structure

    File Formats: Delivered in Excel format, with the option to convert into JSON, TMX, XML, XLIFF, XLS, and other widely used industry formats.
    Structure Fields: Serial Number, Unique ID, Source Sentence and Source Word Count, Target Sentence and Target Word Count

    Usage and Application

    Machine Translation: Useful for building and fine-tuning translation models for management-specific content.
    NLP Applications: Enhances tools such as predictive keyboards, grammar and spell checkers, and speech/text understanding systems focused on business and management contexts.
    Large Language Model (LLM) Training: Supports fine-tuning of LLMs for use cases such as generating business articles, summarizing market insights, interpreting corporate data, and answering management-related queries.

    Secure and Ethical Collection

    Data Collection Platform: Built entirely through FutureBeeAI’s proprietary Yugo platform, ensuring control, quality, and traceability.
    Confidentiality and Compliance:
    <span

  18. Definitions of common notations.

    • plos.figshare.com
    xls
    Updated Jun 3, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yong-Ki Kim; Hyeong-Jin Kim; Hyunjo Lee; Jae-Woo Chang (2023). Definitions of common notations. [Dataset]. http://doi.org/10.1371/journal.pone.0267908.t002
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 3, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Yong-Ki Kim; Hyeong-Jin Kim; Hyunjo Lee; Jae-Woo Chang
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Definitions of common notations.

  19. Peachy Parallel Assignments (EduHPC 2025). Distributed SoftMax

    • figshare.com
    pdf
    Updated Sep 3, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Maria Pantoja; Clara Almeida; David Guerrero-Pantoja; Cameron Maloney; Silvio Rizzi (2025). Peachy Parallel Assignments (EduHPC 2025). Distributed SoftMax [Dataset]. http://doi.org/10.6084/m9.figshare.30040486.v1
    Explore at:
    pdfAvailable download formats
    Dataset updated
    Sep 3, 2025
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Maria Pantoja; Clara Almeida; David Guerrero-Pantoja; Cameron Maloney; Silvio Rizzi
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    As large-scale deep learning models become integral to scientific discovery and engineering applications, it is increasingly important to teach students how to implement them efficiently and at scale. This section presents a coding assignment that focuses on optimizing the Softmax function, a central component of many deep learning models, including attention mechanisms in transformer models. The assignment is designed for an undergraduate level Distributed Computing course , and tailored to students with little or no prior experience in machine learning.By integrating modern AI workloads into an HPC curriculum, this work equips students with both the conceptual understanding and practical experience needed to build scalable solutions in scientific computing.

  20. o

    A common-vvealth or nothing: or, Monarchy and oligarchy prov'd parallel in...

    • llds.ling-phil.ox.ac.uk
    Updated Oct 3, 2008
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2008). A common-vvealth or nothing: or, Monarchy and oligarchy prov'd parallel in tyranny. In xii. queries, worthy the consideration of all publique spirits in this juncture. By a well-wisher to the true security of both Christian and civil liberty. [Dataset]. https://llds.ling-phil.ox.ac.uk/llds/xmlui/handle/20.500.14106/A80253
    Explore at:
    Dataset updated
    Oct 3, 2008
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    (:unav)...........................................

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
souqfinance (2023). Parallel NFT AMM Volume Competition [Dataset]. https://dune.com/discover/content/trending?q=author%3Asouqfinance&resource-type=queries

Parallel NFT AMM Volume Competition

Explore at:
Dataset updated
Aug 11, 2023
Dataset authored and provided by
souqfinance
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Blockchain data query: Parallel NFT AMM Volume Competition

Search
Clear search
Close search
Google apps
Main menu