32 datasets found

d
Parallel NFT AMM Volume Competition
dune.com
Updated Aug 11, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
souqfinance (2023). Parallel NFT AMM Volume Competition [Dataset]. https://dune.com/discover/content/trending?q=author%3Asouqfinance&resource-type=queries
Explore at:
Dataset updated
Aug 11, 2023
Dataset authored and provided by
souqfinance
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Blockchain data query: Parallel NFT AMM Volume Competition
r
Specification and optimization of analytical data flows
resodate.org
Updated May 27, 2016
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Fabian Hüske (2016). Specification and optimization of analytical data flows [Dataset]. http://doi.org/10.14279/depositonce-5150
Explore at:
Unique identifier
https://doi.org/10.14279/depositonce-5150
Dataset updated
May 27, 2016
Dataset provided by
Technische Universität Berlin
DepositOnce
Authors
Fabian Hüske
Description
In the past, the majority of data analysis use cases was addressed by aggregating relational data. Since a few years, a trend is evolving, which is called “Big Data” and which has several implications on the field of data analysis. Compared to previous applications, much larger data sets are analyzed using more elaborate and diverse analysis methods such as information extraction techniques, data mining algorithms, and machine learning methods. At the same time, analysis applications include data sets with less or even no structure at all. This evolution has implications on the requirements on data processing systems. Due to the growing size of data sets and the increasing computational complexity of advanced analysis methods, data must be processed in a massively parallel fashion. The large number and diversity of data analysis techniques as well as the lack of data structure determine the use of user-defined functions and data types. Many traditional database systems are not flexible enough to satisfy these requirements. Hence, there is a need for programming abstractions to define and efficiently execute complex parallel data analysis programs that support custom user-defined operations. The success of the SQL query language has shown the advantages of declarative query specification, such as potential for optimization and ease of use. Today, most relational database management systems feature a query optimizer that compiles declarative queries into physical execution plans. Cost-based optimizers choose from billions of plan candidates the plan with the least estimated cost. However, traditional optimization techniques cannot be readily integrated into systems that aim to support novel data analysis use cases. For example, the use of user-defined functions (UDFs) can significantly limit the optimization potential of data analysis programs. Furthermore, lack of detailed data statistics is common when large amounts of unstructured data is analyzed. This leads to imprecise optimizer cost estimates, which can cause sub-optimal plan choices. In this thesis we address three challenges that arise in the context of specifying and optimizing data analysis programs. First, we propose a parallel programming model with declarative properties to specify data analysis tasks as data flow programs. In this model, data processing operators are composed of a system-provided second-order function and a user-defined first-order function. A cost-based optimizer compiles data flow programs specified in this abstraction into parallel data flows. The optimizer borrows techniques from relational optimizers and ports them to the domain of general-purpose parallel programming models. Second, we propose an approach to enhance the optimization of data flow programs that include UDF operators with unknown semantics. We identify operator properties and conditions to reorder neighboring UDF operators without changing the semantics of the program. We show how to automatically extract these properties from UDF operators by leveraging static code analysis techniques. Our approach is able to emulate relational optimizations such as filter and join reordering and holistic aggregation push-down while not being limited to relational operators. Finally, we analyze the impact of changing execution conditions such as varying predicate selectivities and memory budgets on the performance of relational query plans. We identify plan patterns that cause significantly varying execution performance for changing execution conditions. Plans that include such risky patterns are prone to cause problems in presence of imprecise optimizer estimates. Based on our findings, we introduce an approach to avoid risky plan choices. Moreover, we present a method to assess the risk of a query execution plan using a machine-learned prediction model. Experiments show that the prediction model outperforms risk predictions which are computed from optimizer estimates.
d
Parallel-prime-token-holder
dune.com
Updated Jun 24, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
jack_quest1 (2024). Parallel-prime-token-holder [Dataset]. https://dune.com/discover/content/relevant?q=author:jack_quest1&resource-type=queries
Explore at:
Dataset updated
Jun 24, 2024
Authors
jack_quest1
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Blockchain data query: Parallel-prime-token-holder
F
English-Dutch Parallel Corpus for the Management Domain
futurebeeai.com
wav
Updated Aug 1, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
FutureBee AI (2022). English-Dutch Parallel Corpus for the Management Domain [Dataset]. https://www.futurebeeai.com/dataset/parallel-corpora/dutch-english-translated-parallel-corpus-for-management-domain
Explore at:
wavAvailable download formats
Dataset updated
Aug 1, 2022
Dataset provided by
FutureBeeAI
Authors
FutureBee AI
License
https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement
Dataset funded by
FutureBeeAI
Description
Introduction
Welcome to the English-Dutch Bilingual Parallel Corpora dataset for the Management domain. This comprehensive dataset contains a large collection of bilingual sentence pairs, carefully translated between English and Dutch, designed to support the development of management-specific language models, natural language processing systems, and machine translation engines.
Dataset Content
•Volume and Diversity
•
Extensive Coverage: Contains over 50,000 high-quality sentence pairs suitable for various language technology applications.

•
Translator Diversity: Created by more than 200 native Dutch linguists, ensuring a wide range of linguistic styles, regional expressions, and translation approaches.

•Sentence Diversity
•
Word Count: Sentences range between 7 and 25 words, suitable for NLP model training and evaluation.

•
Syntactic Structures: Includes simple, compound, and complex sentences.

•
Grammatical Forms: Interrogative and imperative constructions to reflect practical and directive language, Affirmative and negative statements to cover different polarities, Active and passive voice to offer multiple linguistic perspectives

•
Idiomatic and Figurative Language: Incorporates business-related metaphors, idiomatic phrases, and figurative expressions common in the management domain.

•
Discourse Markers: Includes conjunctions, transitional phrases, and logical connectors to ensure coherent and natural sentence flow.

•
Cross Translation: The dataset includes both English-to-Dutch and Dutch-to-English translations to support bi-directional translation system development.

Domain-Specific Content
•
Terminology: Covers a broad lexicon of management-related terms from areas such as business strategy, leadership, marketing, operations, finance, and human resources.

•
Authentic Language Use: Captures expressions, idioms, and terminology found in real-world management contexts, including reports, case studies, presentations, and corporate dialogues.

•
Contextual Variety: Includes content from business reports, management literature, corporate communications, organizational behavior studies, and financial documents.

•
Cross-Domain Applicability: Also incorporates content from related fields such as economics, psychology, sociology, and technology, enriching the dataset's real-world relevance.

Format and Structure
•
File Formats: Delivered in Excel format, with the option to convert into JSON, TMX, XML, XLIFF, XLS, and other widely used industry formats.

•
Structure Fields: Serial Number, Unique ID, Source Sentence and Source Word Count, Target Sentence and Target Word Count

Usage and Application
•
Machine Translation: Useful for building and fine-tuning translation models for management-specific content.

•
NLP Applications: Enhances tools such as predictive keyboards, grammar and spell checkers, and speech/text understanding systems focused on business and management contexts.

•
Large Language Model (LLM) Training: Supports fine-tuning of LLMs for use cases such as generating business articles, summarizing market insights, interpreting corporate data, and answering management-related queries.

Secure and Ethical Collection
•
Data Collection Platform: Built entirely through FutureBeeAI’s proprietary Yugo platform, ensuring control, quality, and traceability.

•Confidentiality and Compliance:
<span
F
English-Bengali Parallel Corpus for the Management Domain
futurebeeai.com
wav
Updated Aug 1, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
FutureBee AI (2022). English-Bengali Parallel Corpus for the Management Domain [Dataset]. https://www.futurebeeai.com/dataset/parallel-corpora/bengali-english-translated-parallel-corpus-for-management-domain
Explore at:
wavAvailable download formats
Dataset updated
Aug 1, 2022
Dataset provided by
FutureBeeAI
Authors
FutureBee AI
License
https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement
Dataset funded by
FutureBeeAI
Description
Introduction
Welcome to the English-Bengali Bilingual Parallel Corpora dataset for the Management domain. This comprehensive dataset contains a large collection of bilingual sentence pairs, carefully translated between English and Bengali, designed to support the development of management-specific language models, natural language processing systems, and machine translation engines.
Dataset Content
•Volume and Diversity
•
Extensive Coverage: Contains over 50,000 high-quality sentence pairs suitable for various language technology applications.

•
Translator Diversity: Created by more than 200 native Bengali linguists, ensuring a wide range of linguistic styles, regional expressions, and translation approaches.

•Sentence Diversity
•
Word Count: Sentences range between 7 and 25 words, suitable for NLP model training and evaluation.

•
Syntactic Structures: Includes simple, compound, and complex sentences.

•
Grammatical Forms: Interrogative and imperative constructions to reflect practical and directive language, Affirmative and negative statements to cover different polarities, Active and passive voice to offer multiple linguistic perspectives

•
Idiomatic and Figurative Language: Incorporates business-related metaphors, idiomatic phrases, and figurative expressions common in the management domain.

•
Discourse Markers: Includes conjunctions, transitional phrases, and logical connectors to ensure coherent and natural sentence flow.

•
Cross Translation: The dataset includes both English-to-Bengali and Bengali-to-English translations to support bi-directional translation system development.

Domain-Specific Content
•
Terminology: Covers a broad lexicon of management-related terms from areas such as business strategy, leadership, marketing, operations, finance, and human resources.

•
Authentic Language Use: Captures expressions, idioms, and terminology found in real-world management contexts, including reports, case studies, presentations, and corporate dialogues.

•
Contextual Variety: Includes content from business reports, management literature, corporate communications, organizational behavior studies, and financial documents.

•
Cross-Domain Applicability: Also incorporates content from related fields such as economics, psychology, sociology, and technology, enriching the dataset's real-world relevance.

Format and Structure
•
File Formats: Delivered in Excel format, with the option to convert into JSON, TMX, XML, XLIFF, XLS, and other widely used industry formats.

•
Structure Fields: Serial Number, Unique ID, Source Sentence and Source Word Count, Target Sentence and Target Word Count

Usage and Application
•
Machine Translation: Useful for building and fine-tuning translation models for management-specific content.

•
NLP Applications: Enhances tools such as predictive keyboards, grammar and spell checkers, and speech/text understanding systems focused on business and management contexts.

•
Large Language Model (LLM) Training: Supports fine-tuning of LLMs for use cases such as generating business articles, summarizing market insights, interpreting corporate data, and answering management-related queries.

Secure and Ethical Collection
•
Data Collection Platform: Built entirely through FutureBeeAI’s proprietary Yugo platform, ensuring control, quality, and traceability.

•Confidentiality and Compliance:
<span
Data from: Dataset associated with "Polyhedral optimizations of RNA-RNA...
commons.datacite.org
Updated 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Swetha Varadarajan (2018). Dataset associated with "Polyhedral optimizations of RNA-RNA interaction computations" [Dataset]. http://doi.org/10.25675/10217/191189
Explore at:
Unique identifier
https://doi.org/10.25675/10217/191189
Dataset updated
2018
Dataset provided by
DataCitehttps://www.datacite.org/
Mountain Scholar
Authors
Swetha Varadarajan
Description
These files can be used to re-create the results in the thesis manuscript: "Polyhedral Optimizations of RNA-RNA Interaction Computations". One will need to use the AlphaZ tool (http://www.cs.colostate.edu/AlphaZ/wiki/doku.php ) to produce result from the .ab and .cs file. For users not aware of AlphaZ can use the C codes.
F
English-Bulgarian Parallel Corpus for the Management Domain
futurebeeai.com
wav
Updated Aug 1, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
FutureBee AI (2022). English-Bulgarian Parallel Corpus for the Management Domain [Dataset]. https://www.futurebeeai.com/dataset/parallel-corpora/bulgarian-english-translated-parallel-corpus-for-management-domain
Explore at:
wavAvailable download formats
Dataset updated
Aug 1, 2022
Dataset provided by
FutureBeeAI
Authors
FutureBee AI
License
https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement
Dataset funded by
FutureBeeAI
Description
Introduction
Welcome to the English-Bulgarian Bilingual Parallel Corpora dataset for the Management domain. This comprehensive dataset contains a large collection of bilingual sentence pairs, carefully translated between English and Bulgarian, designed to support the development of management-specific language models, natural language processing systems, and machine translation engines.
Dataset Content
•Volume and Diversity
•
Extensive Coverage: Contains over 50,000 high-quality sentence pairs suitable for various language technology applications.

•
Translator Diversity: Created by more than 200 native Bulgarian linguists, ensuring a wide range of linguistic styles, regional expressions, and translation approaches.

•Sentence Diversity
•
Word Count: Sentences range between 7 and 25 words, suitable for NLP model training and evaluation.

•
Syntactic Structures: Includes simple, compound, and complex sentences.

•
Grammatical Forms: Interrogative and imperative constructions to reflect practical and directive language, Affirmative and negative statements to cover different polarities, Active and passive voice to offer multiple linguistic perspectives

•
Idiomatic and Figurative Language: Incorporates business-related metaphors, idiomatic phrases, and figurative expressions common in the management domain.

•
Discourse Markers: Includes conjunctions, transitional phrases, and logical connectors to ensure coherent and natural sentence flow.

•
Cross Translation: The dataset includes both English-to-Bulgarian and Bulgarian-to-English translations to support bi-directional translation system development.

Domain-Specific Content
•
Terminology: Covers a broad lexicon of management-related terms from areas such as business strategy, leadership, marketing, operations, finance, and human resources.

•
Authentic Language Use: Captures expressions, idioms, and terminology found in real-world management contexts, including reports, case studies, presentations, and corporate dialogues.

•
Contextual Variety: Includes content from business reports, management literature, corporate communications, organizational behavior studies, and financial documents.

•
Cross-Domain Applicability: Also incorporates content from related fields such as economics, psychology, sociology, and technology, enriching the dataset's real-world relevance.

Format and Structure
•
File Formats: Delivered in Excel format, with the option to convert into JSON, TMX, XML, XLIFF, XLS, and other widely used industry formats.

•
Structure Fields: Serial Number, Unique ID, Source Sentence and Source Word Count, Target Sentence and Target Word Count

Usage and Application
•
Machine Translation: Useful for building and fine-tuning translation models for management-specific content.

•
NLP Applications: Enhances tools such as predictive keyboards, grammar and spell checkers, and speech/text understanding systems focused on business and management contexts.

•
Large Language Model (LLM) Training: Supports fine-tuning of LLMs for use cases such as generating business articles, summarizing market insights, interpreting corporate data, and answering management-related queries.

Secure and Ethical Collection
•
Data Collection Platform: Built entirely through FutureBeeAI’s proprietary Yugo platform, ensuring control, quality, and traceability.

•Confidentiality and Compliance:
<span
SauLTCv1
kaggle.com
zip
Updated Sep 24, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
SauLTC Corpus (2024). SauLTCv1 [Dataset]. https://www.kaggle.com/datasets/saultccorpus/saultcv1
Explore at:
zip(22035649 bytes)Available download formats
Dataset updated
Sep 24, 2024
Authors
SauLTC Corpus
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
SauLTC (Saudi Learner Translator Corpus) is a uni-directional POS-tagged English-Arabic parallel and sentence-aligned learner corpus. This multi-version corpus features linguistic annotation, complemented with an interface for monolingual or bilingual querying of the data.

SauLTC was initiated as a source of data for translation studies and learner corpora research. The corpus can be utilized in the examination of linguistic properties of translations, translation quality assessment, variation in translation, translational competence, and cross-linguistic transference. For example, one of SauLTC’s functionalities allows the examination of what a trainee translator produces on her own (draft translation) and the effect of an expert translator’s feedback (final translation submission).

The translation program at PNU includes a four-credit hour graduation project course, where students, at the end of the program, are required to translate a booklet or book chapter from English into Arabic. The course is designed to help students demonstrate their translation competence and to apply all the skills they have acquired into translating longer texts (6000+). The typical arrangement of the course is as follows: The student selects a text she prefers and obtains her instructor’s approval. The source texts of SauLTC include chapters or booklet extractions from fiction, self-help, biography, history, health, psychology, religion, culture, management, or science. Then she submits a draft translation to her instructor who reads the translation and meets with the student to discuss her output. The discussion highlights both the strengths and weaknesses of the translation and gives the student the chance to justify her linguistic choices when translating the text. Based on the instructor’s feedback, the student makes the necessary changes in the translation and once again submits it to the instructor (final translation). The translation students and instructors give their consent to include their translation projects (the source texts, the first drafts, and the final drafts of their translations) in SauLTC. All three documents are collected in one folder under the student’s name in addition to the student’s profile information.

The corpus is currently in its first version of two million words with a proposed plan to include more translated texts from students at other universities in Saudi Arabia. Each student’s contribution includes a learner profile, the source text, the draft translation, and the post-instructor feedback final submission, all of which are enriched with searchable profile metadata.

The Auto-aligner at WordFast Anywhere was utilized for the automatic parallelization of the source text, draft text, and final submission text. This automated process was followed by a manual verification conducted by professional translators.

The query interface supports lexical and PoS search for both sources and targets and returns sentences with the query item along with their targets/sources. The query results can be filtered by several metadata fields, including the translator’s age, translation assessment grade, year, and source text genre.
n
Data from: Parallel evolution of bower-building behavior in two groups of...
data.niaid.nih.gov
zenodo.org
+1more
zip
Updated Nov 15, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Per G. P. Ericson; Martin Irestedt; Johan A. A. Nylander; Les Christidis; Leo Joseph; Yanhua Qu (2023). Parallel evolution of bower-building behavior in two groups of bowerbirds suggested by phylogenomics [Dataset]. http://doi.org/10.5061/dryad.6hdr7sqwp
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5061/dryad.6hdr7sqwp
Dataset updated
Nov 15, 2023
Dataset provided by
Commonwealth Scientific and Industrial Research Organisation
Swedish Museum of Natural History
Chinese Academy of Sciences
Southern Cross University
Authors
Per G. P. Ericson; Martin Irestedt; Johan A. A. Nylander; Les Christidis; Leo Joseph; Yanhua Qu
License
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Description
The bowerbirds in New Guinea and Australia include species that build the largest and perhaps most elaborately decorated constructions outside of humans. The males use these courtship bowers, along with their displays, to attract females. In these species, the mating system is polygynous and the females alone incubate and feed the nestlings. The bowerbirds also include 10 species of the socially monogamous catbirds in which the male participates in most aspects of raising the young. How the bower-building behavior evolved has remained poorly understood, as no comprehensive phylogeny exists for the family. It has been assumed that the monogamous catbird clade is sister to all polygynous species. We here test this hypothesis using a newly developed pipeline for obtaining homologous alignments of thousands of exonic and intronic regions from genomic data to build a phylogeny. Our well-supported species tree shows that the polygynous, bower-building species are not monophyletic. The result suggests either that bower-building behavior is an ancestral condition in the family that was secondarily lost in the catbirds, or that it has arisen in parallel in two lineages of bowerbirds. We favor the latter hypothesis based on an ancestral character reconstruction showing that polygyny but not bower-building is ancestral in bowerbirds, and on the observation that Scenopoeetes dentirostris, the sister species to one of the bower-building clades, does not build a proper bower but constructs a court for male display. This species is also sexually monomorphic in plumage despite having a polygynous mating system. We argue that the relatively stable tropical and subtropical forest environment in combination with low predator pressure and rich food access (mostly fruit) facilitated the evolution of these unique life-history traits.

Methods This is supplementary material to the manuscript "Parallel evolution of bower-building behavior and polygyny in two groups of bowerbirds suggested by phylogenomics". We used the Birdscanner pipeline (available at github.com/Naturhistoriska/birdscanner.git) to obtain homologous alignments of 5653 exonic and 7020 intronic regions from whole-genome sequence data. The pipeline utilize probabilistic queries using hidden Markov models that were used to probe the mapped bowerbird genomes to find where they had their best fit. For each query and taxon we obtained genomic coordinates for the best hits that were then ranked according to their “sequence E-values”, i.e. the expected number of false positives (non-homologous sequences) that scored this well or better. For each query and taxon the sequences for the hits with the lowest values were parsed out using the genomic coordinates. These were then aligned in separate files for exonic and intronic loci. Poorly aligned sequences were identified, based on a calculated distance matrix using OD-Seq (github.com/PeterJehl/OD-Seq), and excluded from the further analyses. We also checked the alignments manually and removed those that included non-homologous sequences for some taxa (indicated by an extreme proportion of variable positions in the alignment) and those that contained no phylogenetically information. Individual trees were constructed using IQ-TREE that automatically selects the best substitution model for each loci alignment. We used ASTRAL-III to construct species trees from the gene trees both for the exonic and intronic loci separately and for all loci combined. ASTRAL estimates a species tree given a set of unrooted gene trees and branch support is calculated using local posterior probabilities. We assembled mitochondrial genomes from the resequenced data for each individual using MITObim , and used 12 of the 13 protein-coding genes to infer the phylogenetic tree. The aligned mitochondrial data set used in the analyses consists of 10,560 bp (3,520 codons). The phylogenetic analysis of the mitogenomic data set was performed with MEGA X . We estimated the maximum-likelihood tree for the mitochondrial data using 100 bootstrap replicates to assess the reliability of the branches. The data set was analyzed both with all codon positions present and with the third codon positions excluded.
F
English-Portuguese Parallel Corpus for the Gaming Domain
futurebeeai.com
wav
Updated Aug 1, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
FutureBee AI (2022). English-Portuguese Parallel Corpus for the Gaming Domain [Dataset]. https://www.futurebeeai.com/dataset/parallel-corpora/portuguese-english-translated-parallel-corpus-for-gaming-domain
Explore at:
wavAvailable download formats
Dataset updated
Aug 1, 2022
Dataset provided by
FutureBeeAI
Authors
FutureBee AI
License
https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement
Dataset funded by
FutureBeeAI
Description
Introduction
The English-Portuguese Gaming Parallel Corpora is a curated bilingual dataset designed to support game localization, machine translation, and language model training for the Gaming industry. It consists of over 50,000 sentence pairs, professionally translated between English and Portuguese, capturing the linguistic and cultural depth of gaming content.
Dataset Content
•Volume and Translator Diversity
•Total Sentence Pairs: 50,000+
•Contributors: Over 200 native and professional translators
•Source: All content is original and tailored specifically for the Gaming domain
•Sentence Variety
•Sentence Length: 7 to 25 words
•Sentence Types: Includes simple, compound, and complex sentences
•Forms Covered: Interrogative, imperative, affirmative, and negative sentences
•Voice Diversity: Sentences written in both active and passive voice
•Stylistic Coverage: Includes idioms, metaphors, gaming slang, and figurative expressions
•Discourse Elements: Contains conjunctions, logical connectors, and transitional phrases for natural flow
•Bidirectional Structure: Includes English to Portuguese and Portuguese to English translations for robust model training
Domain-Specific Focus
•Gaming Language Coverage
•
Terminology: Covers in-game elements, UI/UX, controls, multiplayer features, and genre-specific phrases

•
Dialogue Content: Includes NPC dialogue, tutorial lines, mission briefings, walkthroughs, and strategy guidance

•
Communication Scenarios: Reflects live chat, support queries, and multiplayer messaging

•
Cross-Domain Inclusion: Contains relevant terms from adjacent domains like entertainment, esports, virtual worlds, and AR/VR

•Format and Structure
•
File Formats: Delivered in Excel, with optional conversion to JSON, TMX, XML, XLIFF, XLS, or other standard formats

•
Structure Fields: Serial Number, Unique ID, Source Sentence, Source Word Count, Target Sentence, Target Word Count

•
Sentence Alignment: Sentence-level parallel pairs with consistent formatting for MT pipelines

Usage and Applications
•
Machine Translation: Train and fine-tune domain-specific MT engines for gaming content

•
Game Localization: Adapt games across English-Portuguese markets while preserving nuance and playability

•
NLP Tools: Power predictive keyboards, grammar checkers, spelling correction, and sentence completion models

•
LLM Fine-Tuning: Strengthen bilingual comprehension and translation capabilities in large language models

•
Dialogue Systems: Enable context-aware, conversational AI for in-game or support environments

•
Bilingual Retrieval: Use for cross-language search, sentence matching, and similarity scoring

Alignment Confidence and Quality Assurance
•All translations are manually verified by native bilingual experts for accuracy, naturalness, and domain relevance
•Each sentence pair is reviewed to ensure semantic alignment and stylistic
h
EuroPIRQ-retrieval
huggingface.co
Updated Apr 24, 2007
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Elias H (2007). EuroPIRQ-retrieval [Dataset]. https://huggingface.co/datasets/eherra/EuroPIRQ-retrieval
Explore at:
Dataset updated
Apr 24, 2007
Authors
Elias H
Description
EuroPIRQ: European Parallel Information Retrieval Queries

Dataset Details

The EuroPIRQ retrieval dataset is a multilingual collection designed for evaluating retrieval and cross-lingual retrieval tasks. Dataset contains 10,000 parallel passages & 100 parallel queries (synthetic) in three languages: English🇬🇧, Portuguese🇵🇹, and Finnish🇫🇮, constructed from the European Union's DGT-Acquis corpus.

Languages: English (en), Portuguese (pt), Finnish (fi) Format: JSONL… See the full description on the dataset page: https://huggingface.co/datasets/eherra/EuroPIRQ-retrieval.
h
dpo_irish_eng_translations
huggingface.co
Updated Aug 18, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Cian Prendergast (2024). dpo_irish_eng_translations [Dataset]. https://huggingface.co/datasets/c123ian/dpo_irish_eng_translations
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Aug 18, 2024
Authors
Cian Prendergast
Description
This is a test for my DPO dataset for Irish ENglish trasnlslations, raw data origin : https://www.gaois.ie/en/corpora/parallel?Query=Apple&Language=en&SearchMode=exact&PerPage=50, used COMETXL refrernce free maodel Unbabel/wmt23-cometkiwi-da-xl (which has been trained to asses Irish) to score accepted/rejected. Used GPT4 to generate translations to compare with human stranslations of Irish legislation (which has to have a Irisng/English copy by law)
Parameters used in performance evaluation for synthetic data.
plos.figshare.com
xls
Updated Jun 15, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yong-Ki Kim; Hyeong-Jin Kim; Hyunjo Lee; Jae-Woo Chang (2023). Parameters used in performance evaluation for synthetic data. [Dataset]. http://doi.org/10.1371/journal.pone.0267908.t004
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0267908.t004
Dataset updated
Jun 15, 2023
Dataset provided by
PLOShttp://plos.org/
Authors
Yong-Ki Kim; Hyeong-Jin Kim; Hyunjo Lee; Jae-Woo Chang
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Parameters used in performance evaluation for synthetic data.
F
English-German Parallel Corpus for the Management Domain
futurebeeai.com
wav
Updated Aug 1, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
FutureBee AI (2022). English-German Parallel Corpus for the Management Domain [Dataset]. https://www.futurebeeai.com/dataset/parallel-corpora/german-english-translated-parallel-corpus-for-management-domain
Explore at:
wavAvailable download formats
Dataset updated
Aug 1, 2022
Dataset provided by
FutureBeeAI
Authors
FutureBee AI
License
https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement
Dataset funded by
FutureBeeAI
Description
Introduction
Welcome to the English-German Bilingual Parallel Corpora dataset for the Management domain. This comprehensive dataset contains a large collection of bilingual sentence pairs, carefully translated between English and German, designed to support the development of management-specific language models, natural language processing systems, and machine translation engines.
Dataset Content
•Volume and Diversity
•
Extensive Coverage: Contains over 50,000 high-quality sentence pairs suitable for various language technology applications.

•
Translator Diversity: Created by more than 200 native German linguists, ensuring a wide range of linguistic styles, regional expressions, and translation approaches.

•Sentence Diversity
•
Word Count: Sentences range between 7 and 25 words, suitable for NLP model training and evaluation.

•
Syntactic Structures: Includes simple, compound, and complex sentences.

•
Grammatical Forms: Interrogative and imperative constructions to reflect practical and directive language, Affirmative and negative statements to cover different polarities, Active and passive voice to offer multiple linguistic perspectives

•
Idiomatic and Figurative Language: Incorporates business-related metaphors, idiomatic phrases, and figurative expressions common in the management domain.

•
Discourse Markers: Includes conjunctions, transitional phrases, and logical connectors to ensure coherent and natural sentence flow.

•
Cross Translation: The dataset includes both English-to-German and German-to-English translations to support bi-directional translation system development.

Domain-Specific Content
•
Terminology: Covers a broad lexicon of management-related terms from areas such as business strategy, leadership, marketing, operations, finance, and human resources.

•
Authentic Language Use: Captures expressions, idioms, and terminology found in real-world management contexts, including reports, case studies, presentations, and corporate dialogues.

•
Contextual Variety: Includes content from business reports, management literature, corporate communications, organizational behavior studies, and financial documents.

•
Cross-Domain Applicability: Also incorporates content from related fields such as economics, psychology, sociology, and technology, enriching the dataset's real-world relevance.

Format and Structure
•
File Formats: Delivered in Excel format, with the option to convert into JSON, TMX, XML, XLIFF, XLS, and other widely used industry formats.

•
Structure Fields: Serial Number, Unique ID, Source Sentence and Source Word Count, Target Sentence and Target Word Count

Usage and Application
•
Machine Translation: Useful for building and fine-tuning translation models for management-specific content.

•
NLP Applications: Enhances tools such as predictive keyboards, grammar and spell checkers, and speech/text understanding systems focused on business and management contexts.

•
Large Language Model (LLM) Training: Supports fine-tuning of LLMs for use cases such as generating business articles, summarizing market insights, interpreting corporate data, and answering management-related queries.

Secure and Ethical Collection
•
Data Collection Platform: Built entirely through FutureBeeAI’s proprietary Yugo platform, ensuring control, quality, and traceability.

•Confidentiality and Compliance:
<span
Nepali English Parallel Audio Text Dataset
kaggle.com
zip
Updated Nov 11, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Prashant Raj Bista (2025). Nepali English Parallel Audio Text Dataset [Dataset]. https://www.kaggle.com/datasets/prashantrajbista/nepali-english-parallel-audio-text-dataset
Explore at:
zip(952626877 bytes)Available download formats
Dataset updated
Nov 11, 2025
Authors
Prashant Raj Bista
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
This dataset contains a Nepali-English parallel speech corpus designed to support research in speech recognition, speech translation, and multilingual language processing.

It includes audio pairs and their corresponding transcriptions in both Nepali and English, carefully aligned.

Total Nepali Audio Duration: ~125 minutes Total English Audio Duration: ~112 minutes Number of Speakers: 4 (covering both male and female voices) Content Type: Conversational and general-purpose sentences covering common expressions when one is travelling through the country, short queries, and contextually varied speech samples. Format: Each audio file is accompanied by a corresponding text transcription and metadata linking the Nepali and English versions.

Purpose: Created to assist researchers, developers, and linguists working on Nepali-English speech technologies, such as automatic speech recognition (ASR), speech-to-text (STT), and speech translation systems.

All data has been manually curated and verified to ensure clarity, alignment accuracy, and quality balance across both languages.
F
English-Malayalam Parallel Corpus for the Management Domain
futurebeeai.com
wav
Updated Aug 1, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
FutureBee AI (2022). English-Malayalam Parallel Corpus for the Management Domain [Dataset]. https://www.futurebeeai.com/dataset/parallel-corpora/malayalam-english-translated-parallel-corpus-for-management-domain
Explore at:
wavAvailable download formats
Dataset updated
Aug 1, 2022
Dataset provided by
FutureBeeAI
Authors
FutureBee AI
License
https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement
Dataset funded by
FutureBeeAI
Description
Introduction
Welcome to the English-Malayalam Bilingual Parallel Corpora dataset for the Management domain. This comprehensive dataset contains a large collection of bilingual sentence pairs, carefully translated between English and Malayalam, designed to support the development of management-specific language models, natural language processing systems, and machine translation engines.
Dataset Content
•Volume and Diversity
•
Extensive Coverage: Contains over 50,000 high-quality sentence pairs suitable for various language technology applications.

•
Translator Diversity: Created by more than 200 native Malayalam linguists, ensuring a wide range of linguistic styles, regional expressions, and translation approaches.

•Sentence Diversity
•
Word Count: Sentences range between 7 and 25 words, suitable for NLP model training and evaluation.

•
Syntactic Structures: Includes simple, compound, and complex sentences.

•
Grammatical Forms: Interrogative and imperative constructions to reflect practical and directive language, Affirmative and negative statements to cover different polarities, Active and passive voice to offer multiple linguistic perspectives

•
Idiomatic and Figurative Language: Incorporates business-related metaphors, idiomatic phrases, and figurative expressions common in the management domain.

•
Discourse Markers: Includes conjunctions, transitional phrases, and logical connectors to ensure coherent and natural sentence flow.

•
Cross Translation: The dataset includes both English-to-Malayalam and Malayalam-to-English translations to support bi-directional translation system development.

Domain-Specific Content
•
Terminology: Covers a broad lexicon of management-related terms from areas such as business strategy, leadership, marketing, operations, finance, and human resources.

•
Authentic Language Use: Captures expressions, idioms, and terminology found in real-world management contexts, including reports, case studies, presentations, and corporate dialogues.

•
Contextual Variety: Includes content from business reports, management literature, corporate communications, organizational behavior studies, and financial documents.

•
Cross-Domain Applicability: Also incorporates content from related fields such as economics, psychology, sociology, and technology, enriching the dataset's real-world relevance.

Format and Structure
•
File Formats: Delivered in Excel format, with the option to convert into JSON, TMX, XML, XLIFF, XLS, and other widely used industry formats.

•
Structure Fields: Serial Number, Unique ID, Source Sentence and Source Word Count, Target Sentence and Target Word Count

Usage and Application
•
Machine Translation: Useful for building and fine-tuning translation models for management-specific content.

•
NLP Applications: Enhances tools such as predictive keyboards, grammar and spell checkers, and speech/text understanding systems focused on business and management contexts.

•
Large Language Model (LLM) Training: Supports fine-tuning of LLMs for use cases such as generating business articles, summarizing market insights, interpreting corporate data, and answering management-related queries.

Secure and Ethical Collection
•
Data Collection Platform: Built entirely through FutureBeeAI’s proprietary Yugo platform, ensuring control, quality, and traceability.

•Confidentiality and Compliance:
<span
F
English-Arabic Parallel Corpus for the Management Domain
futurebeeai.com
wav
Updated Aug 1, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
FutureBee AI (2022). English-Arabic Parallel Corpus for the Management Domain [Dataset]. https://www.futurebeeai.com/dataset/parallel-corpora/arabic-english-translated-parallel-corpus-for-management-domain
Explore at:
wavAvailable download formats
Dataset updated
Aug 1, 2022
Dataset provided by
FutureBeeAI
Authors
FutureBee AI
License
https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement
Dataset funded by
FutureBeeAI
Description
Introduction
Welcome to the English-Arabic Bilingual Parallel Corpora dataset for the Management domain. This comprehensive dataset contains a large collection of bilingual sentence pairs, carefully translated between English and Arabic, designed to support the development of management-specific language models, natural language processing systems, and machine translation engines.
Dataset Content
•Volume and Diversity
•
Extensive Coverage: Contains over 50,000 high-quality sentence pairs suitable for various language technology applications.

•
Translator Diversity: Created by more than 200 native Arabic linguists, ensuring a wide range of linguistic styles, regional expressions, and translation approaches.

•Sentence Diversity
•
Word Count: Sentences range between 7 and 25 words, suitable for NLP model training and evaluation.

•
Syntactic Structures: Includes simple, compound, and complex sentences.

•
Grammatical Forms: Interrogative and imperative constructions to reflect practical and directive language, Affirmative and negative statements to cover different polarities, Active and passive voice to offer multiple linguistic perspectives

•
Idiomatic and Figurative Language: Incorporates business-related metaphors, idiomatic phrases, and figurative expressions common in the management domain.

•
Discourse Markers: Includes conjunctions, transitional phrases, and logical connectors to ensure coherent and natural sentence flow.

•
Cross Translation: The dataset includes both English-to-Arabic and Arabic-to-English translations to support bi-directional translation system development.

Domain-Specific Content
•
Terminology: Covers a broad lexicon of management-related terms from areas such as business strategy, leadership, marketing, operations, finance, and human resources.

•
Authentic Language Use: Captures expressions, idioms, and terminology found in real-world management contexts, including reports, case studies, presentations, and corporate dialogues.

•
Contextual Variety: Includes content from business reports, management literature, corporate communications, organizational behavior studies, and financial documents.

•
Cross-Domain Applicability: Also incorporates content from related fields such as economics, psychology, sociology, and technology, enriching the dataset's real-world relevance.

Format and Structure
•
File Formats: Delivered in Excel format, with the option to convert into JSON, TMX, XML, XLIFF, XLS, and other widely used industry formats.

•
Structure Fields: Serial Number, Unique ID, Source Sentence and Source Word Count, Target Sentence and Target Word Count

Usage and Application
•
Machine Translation: Useful for building and fine-tuning translation models for management-specific content.

•
NLP Applications: Enhances tools such as predictive keyboards, grammar and spell checkers, and speech/text understanding systems focused on business and management contexts.

•
Large Language Model (LLM) Training: Supports fine-tuning of LLMs for use cases such as generating business articles, summarizing market insights, interpreting corporate data, and answering management-related queries.

Secure and Ethical Collection
•
Data Collection Platform: Built entirely through FutureBeeAI’s proprietary Yugo platform, ensuring control, quality, and traceability.

•Confidentiality and Compliance:
<span
Definitions of common notations.
plos.figshare.com
xls
Updated Jun 3, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yong-Ki Kim; Hyeong-Jin Kim; Hyunjo Lee; Jae-Woo Chang (2023). Definitions of common notations. [Dataset]. http://doi.org/10.1371/journal.pone.0267908.t002
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0267908.t002
Dataset updated
Jun 3, 2023
Dataset provided by
PLOShttp://plos.org/
Authors
Yong-Ki Kim; Hyeong-Jin Kim; Hyunjo Lee; Jae-Woo Chang
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Definitions of common notations.
Peachy Parallel Assignments (EduHPC 2025). Distributed SoftMax
figshare.com
pdf
Updated Sep 3, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Maria Pantoja; Clara Almeida; David Guerrero-Pantoja; Cameron Maloney; Silvio Rizzi (2025). Peachy Parallel Assignments (EduHPC 2025). Distributed SoftMax [Dataset]. http://doi.org/10.6084/m9.figshare.30040486.v1
Explore at:
pdfAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.30040486.v1
Dataset updated
Sep 3, 2025
Dataset provided by
Figsharehttp://figshare.com/
Authors
Maria Pantoja; Clara Almeida; David Guerrero-Pantoja; Cameron Maloney; Silvio Rizzi
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
As large-scale deep learning models become integral to scientific discovery and engineering applications, it is increasingly important to teach students how to implement them efficiently and at scale. This section presents a coding assignment that focuses on optimizing the Softmax function, a central component of many deep learning models, including attention mechanisms in transformer models. The assignment is designed for an undergraduate level Distributed Computing course , and tailored to students with little or no prior experience in machine learning.By integrating modern AI workloads into an HPC curriculum, this work equips students with both the conceptual understanding and practical experience needed to build scalable solutions in scientific computing.
o
A common-vvealth or nothing: or, Monarchy and oligarchy prov'd parallel in...
llds.ling-phil.ox.ac.uk
Updated Oct 3, 2008
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2008). A common-vvealth or nothing: or, Monarchy and oligarchy prov'd parallel in tyranny. In xii. queries, worthy the consideration of all publique spirits in this juncture. By a well-wisher to the true security of both Christian and civil liberty. [Dataset]. https://llds.ling-phil.ox.ac.uk/llds/xmlui/handle/20.500.14106/A80253
Explore at:
Dataset updated
Oct 3, 2008
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
(:unav)...........................................

Facebook

Twitter

Click to copy link

Link copied

Cite

souqfinance (2023). Parallel NFT AMM Volume Competition [Dataset]. https://dune.com/discover/content/trending?q=author%3Asouqfinance&resource-type=queries

Parallel NFT AMM Volume Competition

Explore at:

Dataset updated

Aug 11, 2023

Dataset authored and provided by

souqfinance

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Blockchain data query: Parallel NFT AMM Volume Competition

Clear search

Close search

Google apps

Main menu

Parallel NFT AMM Volume Competition

Specification and optimization of analytical data flows

Parallel-prime-token-holder

English-Dutch Parallel Corpus for the Management Domain

Introduction

Dataset Content

Domain-Specific Content

Format and Structure

Usage and Application

Secure and Ethical Collection

English-Bengali Parallel Corpus for the Management Domain

Introduction

Dataset Content

Domain-Specific Content

Format and Structure

Usage and Application

Secure and Ethical Collection

Data from: Dataset associated with "Polyhedral optimizations of RNA-RNA...

English-Bulgarian Parallel Corpus for the Management Domain

Introduction

Dataset Content

Domain-Specific Content

Format and Structure

Usage and Application

Secure and Ethical Collection

SauLTCv1

Data from: Parallel evolution of bower-building behavior in two groups of...

English-Portuguese Parallel Corpus for the Gaming Domain

Introduction

Dataset Content

Domain-Specific Focus

Usage and Applications

Alignment Confidence and Quality Assurance

EuroPIRQ-retrieval

dpo_irish_eng_translations

Parameters used in performance evaluation for synthetic data.

English-German Parallel Corpus for the Management Domain

Introduction

Dataset Content

Domain-Specific Content

Format and Structure

Usage and Application

Secure and Ethical Collection

Nepali English Parallel Audio Text Dataset

English-Malayalam Parallel Corpus for the Management Domain

Introduction

Dataset Content

Domain-Specific Content

Format and Structure

Usage and Application

Secure and Ethical Collection

English-Arabic Parallel Corpus for the Management Domain

Introduction

Dataset Content

Domain-Specific Content

Format and Structure

Usage and Application

Secure and Ethical Collection

Definitions of common notations.

Peachy Parallel Assignments (EduHPC 2025). Distributed SoftMax

A common-vvealth or nothing: or, Monarchy and oligarchy prov'd parallel in...

Parallel NFT AMM Volume Competition