Search
Clear search
Close search
Main menu
Google apps
100+ datasets found
  1. h

    stsb-mt-turkish

    • huggingface.co
    Updated Dec 25, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Emrecan Çelik (2021). stsb-mt-turkish [Dataset]. https://huggingface.co/datasets/emrecan/stsb-mt-turkish
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Dec 25, 2021
    Authors
    Emrecan Çelik
    Description

    STSb Turkish

    Semantic textual similarity dataset for the Turkish language. It is a machine translation (Azure) of the STSb English dataset. This dataset is not reviewed by expert human translators. Uploaded from this repository.

  2. h

    turkish-sentiment-analysis-dataset

    • huggingface.co
    • kaggle.com
    Updated Jun 21, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Batuhan (2022). turkish-sentiment-analysis-dataset [Dataset]. https://huggingface.co/datasets/winvoker/turkish-sentiment-analysis-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jun 21, 2022
    Authors
    Batuhan
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    Dataset

    This dataset contains positive , negative and notr sentences from several data sources given in the references. In the most sentiment models , there are only two labels; positive and negative. However , user input can be totally notr sentence. For such cases there were no data I could find. Therefore I created this dataset with 3 class. Positive and negative sentences are listed below. Notr examples are extraced from turkish wiki dump. In addition, added some random text… See the full description on the dataset page: https://huggingface.co/datasets/winvoker/turkish-sentiment-analysis-dataset.

  3. s

    Turkish Language Speech Datasets | NLP, Conversational AI & Machine Learning...

    • mg.shaip.com
    • uz.shaip.com
    • +71more
    Updated Dec 9, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Shaip (2024). Turkish Language Speech Datasets | NLP, Conversational AI & Machine Learning [Dataset]. https://mg.shaip.com/offerings/speech-data-catalog/turkish-turkey-dataset/
    Explore at:
    Dataset updated
    Dec 9, 2024
    Dataset authored and provided by
    Shaip
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Enhance your Conversational AI model with our Off-the-Shelf Turkish Language Dataset (Turkish Language Speech Datasets). Shaip high-quality audio datasets are a quick and effective solution for model training.

  4. E

    GlobalPhone Turkish

    • live.european-language-grid.eu
    • catalogue.elra.info
    audio format
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    GlobalPhone Turkish [Dataset]. https://live.european-language-grid.eu/catalogue/corpus/1917
    Explore at:
    audio formatAvailable download formats
    License

    http://catalogue.elra.info/static/from_media/metashare/licences/ELRA_VAR.pdfhttp://catalogue.elra.info/static/from_media/metashare/licences/ELRA_VAR.pdf

    http://catalogue.elra.info/static/from_media/metashare/licences/ELRA_END_USER.pdfhttp://catalogue.elra.info/static/from_media/metashare/licences/ELRA_END_USER.pdf

    Description

    The GlobalPhone corpus developed in collaboration with the Karlsruhe Institute of Technology (KIT) was designed to provide read speech data for the development and evaluation of large continuous speech recognition systems in the most widespread languages of the world, and to provide a uniform, multilingual speech and text database for language independent and language adaptive speech recognition as well as for language identification tasks.

    The entire GlobalPhone corpus enables the acquisition of acoustic-phonetic knowledge of the following 22 spoken languages: Arabic (ELRA-S0192), Bulgarian (ELRA-S0319), Chinese-Mandarin (ELRA-S0193), Chinese-Shanghai (ELRA-S0194), Croatian (ELRA-S0195), Czech (ELRA-S0196), French (ELRA-S0197), German (ELRA-S0198), Hausa (ELRA-S0347), Japanese (ELRA-S0199), Korean (ELRA-S0200), Polish (ELRA-S0320), Portuguese (Brazilian) (ELRA-S0201), Russian (ELRA-S0202), Spanish (Latin America) (ELRA-S0203), Swahili (ELRA-S0375), Swedish (ELRA-S0204), Tamil (ELRA-S0205), Thai (ELRA-S0321), Turkish (ELRA-S0206), Ukrainian (ELRA-S0377), and Vietnamese (ELRA-S0322).

    In each language about 100 sentences were read from each of the 100 speakers. The read texts were selected from national newspapers available via Internet to provide a large vocabulary. The read articles cover national and international political news as well as economic news. The speech is available in 16bit, 16kHz mono quality, recorded with a close-speaking microphone (Sennheiser 440-6). The transcriptions are internally validated and supplemented by special markers for spontaneous effects like stuttering, false starts, and non-verbal effects like laughing and hesitations. Speaker information like age, gender, occupation, etc. as well as information about the recording setup complement the database. The entire GlobalPhone corpus contains over 450 hours of speech spoken by more than 2100 native adult speakers.

    Data is shortened by means of the shorten program written by Tony Robinson. Alternatively, the data could be delivered unshorten.

    The Turkish corpus was produced using the Zaman newspaper. It contains recordings of 100 speakers (28 males, 72 females) recorded in Istanbul, Turkey. The following age distribution has been obtained: 30 speakers are below 19, 30 speakers are between 20 and 29, 23 speakers are between 30 and 39, 14 speakers are between 40 and 49, and 3 speakers are over 50.

  5. F

    General Domain Scripted Monologue Speech Data: Turkish (Turkey)

    • futurebeeai.com
    wav
    Updated Aug 1, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    FutureBee AI (2022). General Domain Scripted Monologue Speech Data: Turkish (Turkey) [Dataset]. https://www.futurebeeai.com/dataset/monologue-speech-dataset/general-scripted-speech-monologues-turkish-turkey
    Explore at:
    wavAvailable download formats
    Dataset updated
    Aug 1, 2022
    Dataset provided by
    FutureBeeAI
    Authors
    FutureBee AI
    License

    https://www.futurebeeai.com/data-license-agreementhttps://www.futurebeeai.com/data-license-agreement

    Area covered
    Türkiye
    Dataset funded by
    FutureBeeAI
    Description

    Introduction

    Welcome to the Turkish Scripted Monologue Speech Dataset for the General Domain. This meticulously curated dataset is designed to advance the development of General domain Turkish language speech recognition models.

    Speech Data

    This training dataset comprises over 6,000 high-quality scripted prompt recordings in Turkish. These recordings cover various General domain topics and scenarios, designed to build robust and accurate speech technology.

    Participant Diversity:
    Speakers: 60 native Turkish speakers from different regions of Turkey.
    Regions: Ensures a balanced representation of Turkish accents, dialects, and demographics.
    Participant Profile: Participants range from 18 to 70 years old, representing both males and females in a 60:40 ratio, respectively.
    Recording Details:
    Recording Nature: Audio recordings of scripted prompts/monologues.
    Audio Duration: Average duration of 5 to 30 seconds per recording.
    Formats: WAV format with mono channels, a bit depth of 16 bits, and sample rates of 8 kHz and 16 kHz.
    Environment: Recordings are conducted in quiet settings without background noise and echo.
    Topic Diversity: The dataset encompasses a wide array of topics and conversational scenarios from the General domain. Topics include:
    Daily Conversations
    Topic Specific Conversation
    General Information and Advice
    Idoms and Sayings
    Other Elements: To enhance realism and utility, the scripted prompts incorporate various elements commonly encountered in general interactions:
    Names: Region-specific names of males and females in various formats.
    Addresses: Region-specific addresses in different spoken formats.
    Dates & Times: Inclusion of date and time in various contexts.
    Organization Names: Names of different types of organizations.
    Numbers & Currencies: Various numbers and currencies in domain-specific interactions.

    Each scripted prompt is crafted to reflect real-life scenarios encountered in the General domain, ensuring applicability in training robust natural language processing and speech recognition models.

    Transcription Data

    In addition to high-quality audio recordings, the dataset includes meticulously prepared text files with verbatim transcriptions of each audio file. These transcriptions are essential for training accurate and robust speech recognition models.

    Content: Each text file contains the exact scripted prompt corresponding to its audio file, ensuring consistency.
    Format: Transcriptions are provided in plain text (.TXT) format, with files named to match their associated audio files for easy reference.
    Quality: All transcriptions are verified for accuracy and consistency by native Turkish transcribers.

    Metadata

    The dataset provides comprehensive metadata for each audio recording and participant:

    Participant Metadata: Unique identifier, age, gender, country, state, and dialect.
    Other Metadata:

  6. F

    Real Estate Call Center Speech Data: Turkish (Turkey)

    • futurebeeai.com
    wav
    Updated Aug 1, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    FutureBee AI (2022). Real Estate Call Center Speech Data: Turkish (Turkey) [Dataset]. https://www.futurebeeai.com/dataset/speech-dataset/realestate-call-center-conversation-turkish-turkey
    Explore at:
    wavAvailable download formats
    Dataset updated
    Aug 1, 2022
    Dataset provided by
    FutureBeeAI
    Authors
    FutureBee AI
    License

    https://www.futurebeeai.com/data-license-agreementhttps://www.futurebeeai.com/data-license-agreement

    Area covered
    Türkiye
    Dataset funded by
    FutureBeeAI
    Description

    Introduction

    Welcome to the Turkish Call Center Speech Dataset for the Real Estate domain designed to enhance the development of call center speech recognition models specifically for the Real Estate industry. This dataset is meticulously curated to support advanced speech recognition, natural language processing, conversational AI, and generative voice AI algorithms.

    Speech Data:

    This training dataset comprises 30 Hours of call center audio recordings covering various topics and scenarios related to the Real Estate domain, designed to build robust and accurate customer service speech technology.

    Participant Diversity:
    Speakers: 60 expert native Turkish speakers from the FutureBeeAI Community.
    Regions: Different states/provinces of Turkey, ensuring a balanced representation of Turkish accents, dialects, and demographics.
    Participant Profile: Participants range from 18 to 70 years old, representing both males and females in a 60:40 ratio, respectively.
    Recording Details:
    Conversation Nature: Unscripted and spontaneous conversations between call center agents and customers.
    Call Duration: Average duration of 5 to 15 minutes per call.
    Formats: WAV format with stereo channels, a bit depth of 16 bits, and a sample rate of 8 and 16 kHz.
    Environment: Without background noise and without echo.

    Topic Diversity

    This dataset offers a diverse range of conversation topics, call types, and outcomes, including both inbound and outbound calls with positive, neutral, and negative outcomes.

    Inbound Calls:
    Property Inquiry
    Rental Property Search & Availability
    Renovation Inquiries
    Property Features & Amenities Inquiry
    Investment Property Analysis & Advice
    Property History & Ownership Details, and many more
    Outbound Calls:
    New Property Listing Update
    Post Purchase Follow-ups
    Investment Opportunities & Property Recommendations
    Property Value Updates
    Customer Satisfaction Surveys, and many more

    This extensive coverage ensures the dataset includes realistic call center scenarios, which is essential for developing effective customer support speech recognition models.

    Transcription

    To facilitate your workflow, the dataset includes manual verbatim transcriptions of each call center audio file in JSON format. These transcriptions feature:

    Speaker-wise Segmentation: Time-coded segments for both agents and customers.
    Non-Speech Labels: Tags and labels for non-speech elements.
    Word Error Rate: Word error rate is less than 5% thanks to the dual layer of QA.

    These ready-to-use transcriptions accelerate the development of the Real Estate domain call center conversational AI and ASR models for the Turkish language.

    Metadata

    The dataset provides comprehensive metadata for each conversation and participant:

    Participant Metadata: Unique identifier, age, gender, country, state, district, accent and dialect.
    Conversation Metadata: Domain, topic, call type, outcome/sentiment, bit depth, and sample rate.

    This metadata is a powerful tool for understanding and characterizing the data, enabling informed decision-making in the development of Turkish call center speech recognition models.

    Usage and

  7. R

    Turkish Lira Detection Dataset

    • universe.roboflow.com
    zip
    Updated Apr 21, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    tltltltl (2023). Turkish Lira Detection Dataset [Dataset]. https://universe.roboflow.com/tltltltl/turkish-lira-detection
    Explore at:
    zipAvailable download formats
    Dataset updated
    Apr 21, 2023
    Dataset authored and provided by
    tltltltl
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Area covered
    Türkiye
    Variables measured
    Banknote Bounding Boxes
    Description

    Turkish Lira Detection

    ## Overview
    
    Turkish Lira Detection is a dataset for object detection tasks - it contains Banknote annotations for 4,531 images.
    
    ## Getting Started
    
    You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
    
      ## License
    
      This dataset is available under the [Public Domain license](https://creativecommons.org/licenses/Public Domain).
    
  8. P

    NLI-TR Dataset

    • paperswithcode.com
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    NLI-TR Dataset [Dataset]. https://paperswithcode.com/dataset/nli-tr
    Explore at:
    Authors
    Emrah Budur; Rıza Özçelik; Tunga Güngör; Christopher Potts
    Description

    Natural Language Inference in Turkish (NLI-TR) provides translations of two large English NLI datasets into Turkish and had a team of experts validate their translation quality and fidelity to the original labels.

  9. m

    English/Turkish Wikipedia Named-Entity Recognition and Text Categorization...

    • data.mendeley.com
    Updated Feb 9, 2017
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    H. Bahadir Sahin (2017). English/Turkish Wikipedia Named-Entity Recognition and Text Categorization Dataset [Dataset]. http://doi.org/10.17632/cdcztymf4k.1
    Explore at:
    Dataset updated
    Feb 9, 2017
    Authors
    H. Bahadir Sahin
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    TWNERTC and EWNERTC are collections of automatically categorized and annotated sentences obtained from Turkish and English Wikipedia for named-entity recognition and text categorization.

    Firstly, we construct large-scale gazetteers by using a graph crawler algorithm to extract relevant entity and domain information from a semantic knowledge base, Freebase. The final gazetteers has 77 domains (categories) and more than 1000 fine-grained entity types for both languages. Turkish gazetteers contains approximately 300K named-entities and English gazetteers has approximately 23M named-entities.

    By leveraging large-scale gazetteers and linked Wikipedia articles, we construct TWNERTC and EWNERTC. Since the categorization and annotation processes are automated, the raw collections are prone to ambiguity. Hence, we introduce two noise reduction methodologies: (a) domain-dependent (b) domain-independent. We produce two different versions by post-processing raw collections. As a result of this process, we introduced 3 versions of TWNERTC and EWNERTC: (a) raw (b) domain-dependent post-processed (c) domain-independent post-processed. Turkish collections have approximately 700K sentences for each version (varies between versions), while English collections contain more than 7M sentences.

    We also introduce "Coarse-Grained NER" versions of the same datasets. We reduce fine-grained types into "organization", "person", "location" and "misc" by mapping each fine-grained type to the most similar coarse-grained version. Note that this process also eliminated many domains and fine-grained annotations due to lack of information for coarse-grained NER. Hence, "Coarse-Grained NER" labelled datasets contain only 25 domains and number of sentences are decreased compared to "Fine-Grained NER" versions.

    All processes are explained in our published white paper for Turkish; however, major methods (gazetteers creation, automatic categorization/annotation, noise reduction) do not change for English.

  10. s

    Wake Word Turkish Dataset | Shaip

    • ha.shaip.com
    • ur.shaip.com
    • +69more
    Updated Dec 24, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Shaip (2024). Wake Word Turkish Dataset | Shaip [Dataset]. https://ha.shaip.com/offerings/speech-data-catalog/wake-word-turkish-dataset/
    Explore at:
    Dataset updated
    Dec 24, 2024
    Dataset authored and provided by
    Shaip
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    The Wake Word Turkish Dataset is a collection of audio recordings specifically curated for training and evaluating wake word detection systems in the Turkish language. This dataset includes a variety of speakers, environments, and scenarios to ensure robustness and effectiveness in wake word detection algorithms. It serves as a valuable resource for researchers and developers working on voice-controlled systems and natural language processing applications in Turkish.

  11. n

    504 Hours - Turkish(Turkey) Real-world Casual Conversation and Monologue...

    • m.nexdata.ai
    Updated Feb 9, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nexdata (2024). 504 Hours - Turkish(Turkey) Real-world Casual Conversation and Monologue speech dataset [Dataset]. https://m.nexdata.ai/datasets/speechrecog/1324
    Explore at:
    Dataset updated
    Feb 9, 2024
    Dataset provided by
    nexdata technology inc
    Authors
    Nexdata
    Area covered
    World, Türkiye
    Variables measured
    Format, Country, Accuracy, Language, Content category, Language(Region) Code, Recording environment, Features of annotation
    Description

    Turkish(Turkey) Real-world Casual Conversation and Monologue speech dataset, covers self-media, conversation, live and other generic domains, mirrors real-world interactions. Transcribed with text content, speaker's ID, gender and other attributes. Our dataset was collected from extensive and diversify speakers, geographicly speaking, enhancing model performance in real and complex tasks. Quality tested by various AI companies. We strictly adhere to data protection regulations and privacy standards, ensuring the maintenance of user privacy and legal rights throughout the data collection, storage, and usage processes, our datasets are all GDPR, CCPA, PIPL complied.

  12. P

    Marmara Turkish Coreference Resolution Corpus Dataset

    • paperswithcode.com
    • opendatalab.com
    Updated Jun 5, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Peter Schüller; Kübra Cıngıllı; Ferit Tunçer; Barış Gün Sürmeli; Ayşegül Pekel; Ayşe Hande Karatay; Hacer Ezgi Karakaş (2017). Marmara Turkish Coreference Resolution Corpus Dataset [Dataset]. https://paperswithcode.com/dataset/marmara-turkish-coreference-resolution-corpus
    Explore at:
    Dataset updated
    Jun 5, 2017
    Authors
    Peter Schüller; Kübra Cıngıllı; Ferit Tunçer; Barış Gün Sürmeli; Ayşegül Pekel; Ayşe Hande Karatay; Hacer Ezgi Karakaş
    Area covered
    Marmara Region
    Description

    Describe the Marmara Turkish Coreference Corpus, which is an annotation of the whole METU-Sabanci Turkish Treebank with mentions and coreference chains.

  13. E

    Turkish web corpus MaCoCu-tr 1.0

    • live.european-language-grid.eu
    • clarin.si
    xml
    Updated Apr 26, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2022). Turkish web corpus MaCoCu-tr 1.0 [Dataset]. https://live.european-language-grid.eu/catalogue/corpus/19770
    Explore at:
    xmlAvailable download formats
    Dataset updated
    Apr 26, 2022
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    The Turkish web corpus MaCoCu-tr 1.0 was built by crawling the ".tr" internet top-level domain in 2021, extending the crawl dynamically to other domains as well (https://github.com/macocu/MaCoCu-crawler).

    Considerable efforts were devoted into cleaning the extracted text to provide a high-quality web corpus. This was achieved by removing boilerplate (https://corpus.tools/wiki/Justext) and near-duplicated paragraphs (https://corpus.tools/wiki/Onion), discarding very short texts as well as texts that are not in the target language. The dataset is characterized by extensive metadata which allows filtering the dataset based on text quality and other criteria (https://github.com/bitextor/monotextor), making the corpus highly useful for corpus linguistics studies, as well as for training language models and other language technologies.

    Each document is accompanied by the following metadata: title, crawl date, url, domain, file type of the original document, distribution of languages inside the document, and a fluency score (based on a language model). The text of each document is divided into paragraphs that are accompanied by metadata on the information whether a paragraph is a heading or not, metadata on the paragraph quality and fluency, the automatically identified language of the text in the paragraph, and information whether the paragraph contains personal information.

    This action has received funding from the European Union's Connecting Europe Facility 2014-2020 - CEF Telecom, under Grant Agreement No. INEA/CEF/ICT/A2020/2278341. This communication reflects only the author’s view. The Agency is not responsible for any use that may be made of the information it contains.

  14. F

    Turkish Intervention: Central Bank of Turkey Purchases of USD (Millions of...

    • fred.stlouisfed.org
    json
    Updated Mar 3, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). Turkish Intervention: Central Bank of Turkey Purchases of USD (Millions of USD) [Dataset]. https://fred.stlouisfed.org/series/TRINTDEXR
    Explore at:
    jsonAvailable download formats
    Dataset updated
    Mar 3, 2025
    License

    https://fred.stlouisfed.org/legal/#copyright-citation-requiredhttps://fred.stlouisfed.org/legal/#copyright-citation-required

    Description

    Graph and download economic data for Turkish Intervention: Central Bank of Turkey Purchases of USD (Millions of USD) (TRINTDEXR) from 2002-01-01 to 2025-03-03 about intervention, Turkey, banks, and depository institutions.

  15. 50Million Rows Turkish Market Sales Dataset(MSSQL)

    • kaggle.com
    Updated Aug 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Omer Colakoglu (2023). 50Million Rows Turkish Market Sales Dataset(MSSQL) [Dataset]. https://www.kaggle.com/datasets/omercolakoglu/50million-rows-turkish-market-sales-datasetmssql
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 31, 2023
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Omer Colakoglu
    Description

    50 Million Rows MSSQL Backup File with Clustered Columnstore Index.

    This dataset contains -27K categorized Turkish supermarket items. -81 stores (Every city of Turkey has a store) -100K real Turkish names customer, address -10M rows sales data generated randomly. -All data has a near real price with influation factor by the time.

    All the data generated randomly. So the usernames have been generated with real Turkish names and surnames but they are not real people. The sale data generated randomly. But it has some rules. For example, every order can contains 1-9 kind of item. Every orderline amount can be 1-9 pieces. The randomise function works according to population of the city. So the number of orders for Istanbul (the biggest city of Turkey) is about 20% of all data and another city for example orders for the Gaziantep (the population is 2.5% of Turkey population) is about 2.5% off all data. https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F1611072%2F9442f2a1dbae7f05ead4fde9e1033ac6%2Finbox_1611072_135236e39b79d6fae8830dec3fca4961_1.png?generation=1693509562300174&alt=media" alt=""> https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F1611072%2F1c39195270db87250e59d9f2917ccea1%2Finbox_1611072_b73d9ca432dae956564cfa5bfe42268c_3.png?generation=1693509575061587&alt=media" alt=""> https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F1611072%2Fa908389f33ae5c983e383d17f0d9a763%2Finbox_1611072_c5d349aa1f33c0fc4fc74b79b7167d3a_F3za81TXkAA1Il4.png?generation=1693509586158658&alt=media" alt="">

  16. T

    Turkey Imports

    • tradingeconomics.com
    • fr.tradingeconomics.com
    • +16more
    csv, excel, json, xml
    Updated Mar 27, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    TRADING ECONOMICS (2025). Turkey Imports [Dataset]. https://tradingeconomics.com/turkey/imports
    Explore at:
    xml, excel, csv, jsonAvailable download formats
    Dataset updated
    Mar 27, 2025
    Dataset authored and provided by
    TRADING ECONOMICS
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Jan 31, 1957 - Feb 28, 2025
    Area covered
    Türkiye
    Description

    Imports in Turkey decreased to 28532.57 USD Million in February from 28702.07 USD Million in January of 2025. This dataset provides the latest reported value for - Turkey Imports - plus previous releases, historical high and low, short-term forecast and long-term prediction, economic calendar, survey consensus and news.

  17. E

    Turkish Speecon database

    • catalogue.elra.info
    • live.european-language-grid.eu
    Updated Feb 22, 2007
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    ELRA (European Language Resources Association) (2007). Turkish Speecon database [Dataset]. https://catalogue.elra.info/en-us/repository/browse/ELRA-S0178/
    Explore at:
    Dataset updated
    Feb 22, 2007
    Dataset provided by
    ELRA (European Language Resources Association)
    ELRA (European Language Resources Association) and its operational body ELDA (Evaluations and Language resources Distribution Agency)
    License

    https://catalogue.elra.info/static/from_media/metashare/licences/ELRA_VAR.pdfhttps://catalogue.elra.info/static/from_media/metashare/licences/ELRA_VAR.pdf

    https://catalogue.elra.info/static/from_media/metashare/licences/ELRA_END_USER.pdfhttps://catalogue.elra.info/static/from_media/metashare/licences/ELRA_END_USER.pdf

    Description

    The Turkish Speecon database is divided into 2 sets: 1) The first set comprises the recordings of 550 adult Turkish speakers (280 males, 270 females), recorded over 4 microphone channels in 4 recording environments (office, entertainment, car, public place). 2) The second set comprises the recordings of 50 child Turkish speakers (25 boys, 25 girls), recorded over 4 microphone channels in 1 recording environment (children room). This database is partitioned into 28 DVDs (first set) and 4 DVDs (second set).The speech databases made within the Speecon project were validated by SPEX, the Netherlands, to assess their compliance with the Speecon format and content specifications.Each of the four speech channels is recorded at 16 kHz, 16 bit, uncompressed unsigned integers in Intel format (lo-hi byte order). To each signal file corresponds an ASCII SAM label file which contains the relevant descriptive information.Each speaker uttered the following items:Calibration data: 6 noise recordingsThe “silence word” recordingFree spontaneous items (adults only):3 minutes (session time) of free spontaneous, rich context items (story telling) (an open number of spontaneous topics out of a set of 30 topics)17 Elicited spontaneous items (adults only):3 dates, 2 times, 3 proper names, 2 city name, 1 letter sequence, 2 answers to questions, 3 telephone numbers, 1 language Read speech:30 phonetically rich sentences uttered by adults and 60 uttered by children5 phonetically rich words (adults only)4 isolated digits1 isolated digit sequence4 connected digit sequences1 telephone number3 natural numbers1 money amount2 time phrases (T1 : analogue, T2 : digital)3 dates (D1 : analogue, D2 : relative and general date, D3 : digital)3 letter sequences1 proper name2 city or street names2 questions2 special keyboard characters 1 Web address1 email address222 application specific words and phrases per session (adults)74 toy commands, 14 general commands, 31 phone commands and 4 application word synonyms (children)The following age distribution has been obtained: Adults: 244 speakers are between 15 and 30, 235 speakers are between 31 and 45, and 71 speakers are over 46.Children: 25 speakers are between 8 and 10, 25 speakers are between 11 and 15.A pronunciation lexicon with a phonemic transcription in SAMPA is also included.

  18. F

    Turkish Open Ended Classification Prompt & Response Dataset

    • futurebeeai.com
    wav
    Updated Aug 1, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Turkish Open Ended Classification Prompt & Response Dataset [Dataset]. https://www.futurebeeai.com/dataset/prompt-response-dataset/turkish-open-ended-classification-text-dataset
    Explore at:
    wavAvailable download formats
    Dataset updated
    Aug 1, 2022
    Dataset provided by
    FutureBeeAI
    Authors
    FutureBee AI
    License

    https://www.futurebeeai.com/data-license-agreementhttps://www.futurebeeai.com/data-license-agreement

    Dataset funded by
    FutureBeeAI
    Description

    What’s Included

    Welcome to the Turkish Open Ended Classification Prompt-Response Dataset—an extensive collection of 3000 meticulously curated prompt and response pairs. This dataset is a valuable resource for training Language Models (LMs) to classify input text accurately, a crucial aspect in advancing generative AI.

    Dataset Content: This open-ended classification dataset comprises a diverse set of prompts and responses where the prompt contains input text to be classified and may also contain task instruction, context, constraints, and restrictions while completion contains the best classification category as response. Both these prompts and completions are available in Turkish language. As this is an open-ended dataset, there will be no options given to choose the right classification category as a part of the prompt.

    These prompt and completion pairs cover a broad range of topics, including science, history, technology, geography, literature, current affairs, and more. Each prompt is accompanied by a response, providing valuable information and insights to enhance the language model training process. Both the prompt and response were manually curated by native Turkish people, and references were taken from diverse sources like books, news articles, websites, and other reliable references.

    This open-ended classification prompt and completion dataset contains different types of prompts, including instruction type, continuation type, and in-context learning (zero-shot, few-shot) type. The dataset also contains prompts and responses with different types of rich text, including tables, code, JSON, etc., with proper markdown.

    Prompt Diversity: To ensure diversity, this open-ended classification dataset includes prompts with varying complexity levels, ranging from easy to medium and hard. Additionally, prompts are diverse in terms of length from short to medium and long, creating a comprehensive variety. The classification dataset also contains prompts with constraints and persona restrictions, which makes it even more useful for LLM training.Response Formats: To accommodate diverse learning experiences, our dataset incorporates different types of responses depending on the prompt. These formats include single-word, short phrase, and single sentence type of response. These responses encompass text strings, numerical values, and date and time formats, enhancing the language model's ability to generate reliable, coherent, and contextually appropriate answers.Data Format and Annotation Details: This fully labeled Turkish Open Ended Classification Prompt Completion Dataset is available in JSON and CSV formats. It includes annotation details such as a unique ID, prompt, prompt type, prompt length, prompt complexity, domain, response, response type, and rich text presence.Quality and Accuracy: Our dataset upholds the highest standards of quality and accuracy. Each prompt undergoes meticulous validation, and the corresponding responses are thoroughly verified. We prioritize inclusivity, ensuring that the dataset incorporates prompts and completions representing diverse perspectives and writing styles, maintaining an unbiased and discrimination-free stance.

    The Turkish version is grammatically accurate without any spelling or grammatical errors. No copyrighted, toxic, or harmful content is used during the construction of this dataset.

    Continuous Updates and Customization: The entire dataset was prepared with the assistance of human curators from the FutureBeeAI crowd community. Ongoing efforts are made to add more assets to this dataset, ensuring its growth and relevance. Additionally, FutureBeeAI offers the ability to gather custom open-ended classification prompt and completion data tailored to specific needs, providing flexibility and customization options.License: The dataset, created by FutureBeeAI, is now available for commercial use. Researchers, data scientists, and developers can leverage this fully labeled and ready-to-deploy Turkish Open Ended Classification Prompt-Completion Dataset to enhance the classification abilities and accurate response generation capabilities of their generative AI models and explore new approaches to NLP tasks.

  19. h

    data-turkish-class

    • huggingface.co
    Updated Feb 25, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    savc (2023). data-turkish-class [Dataset]. https://huggingface.co/datasets/pnrr/data-turkish-class
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 25, 2023
    Authors
    savc
    License

    https://choosealicense.com/licenses/other/https://choosealicense.com/licenses/other/

    Description

    pnrr/data-turkish-class dataset hosted on Hugging Face and contributed by the HF Datasets community

  20. o

    The Turkish Political Economy Database, from the 1800s to Date

    • osf.io
    Updated Jan 22, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Altug Yalcintas; Özgür Kızılyurt; Kardelen Kaya Erman; Vural Başaran; Alaaddin Tok; Ekin Bal; Şebnem Gelmedi (2025). The Turkish Political Economy Database, from the 1800s to Date [Dataset]. http://doi.org/10.17605/OSF.IO/AY7U6
    Explore at:
    Dataset updated
    Jan 22, 2025
    Dataset provided by
    Center For Open Science
    Authors
    Altug Yalcintas; Özgür Kızılyurt; Kardelen Kaya Erman; Vural Başaran; Alaaddin Tok; Ekin Bal; Şebnem Gelmedi
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Türkiye
    Description

    The Turkish Political Economy Database, from the 1800s to Date is a digital history of economics project (ongoing). Results are preliminary.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Emrecan Çelik (2021). stsb-mt-turkish [Dataset]. https://huggingface.co/datasets/emrecan/stsb-mt-turkish

stsb-mt-turkish

emrecan/stsb-mt-turkish

Explore at:
4 scholarly articles cite this dataset (View in Google Scholar)
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Dec 25, 2021
Authors
Emrecan Çelik
Description

STSb Turkish

Semantic textual similarity dataset for the Turkish language. It is a machine translation (Azure) of the STSb English dataset. This dataset is not reviewed by expert human translators. Uploaded from this repository.