19 datasets found
  1. h

    face-celeb-vietnamese

    • huggingface.co
    Updated May 4, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    FPTU DSC (2023). face-celeb-vietnamese [Dataset]. https://huggingface.co/datasets/fptudsc/face-celeb-vietnamese
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    May 4, 2023
    Dataset authored and provided by
    FPTU DSC
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Dataset Card for "face-celeb-vietnamese"

      Dataset Summary
    

    This dataset contains information on over 8,000 samples of well-known Vietnamese individuals, categorized into three professions: singers, actors, and beauty queens. The dataset includes data on more than 100 celebrities in each of the three job categories.

      Languages
    

    Vietnamese: The label is used to indicate the name of celebrities in Vietnamese.

      Dataset Structure
    

    The image and Vietnamese… See the full description on the dataset page: https://huggingface.co/datasets/fptudsc/face-celeb-vietnamese.

  2. T

    Vietnam GDP

    • tradingeconomics.com
    • pt.tradingeconomics.com
    • +13more
    csv, excel, json, xml
    Updated Jun 15, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    TRADING ECONOMICS (2025). Vietnam GDP [Dataset]. https://tradingeconomics.com/vietnam/gdp
    Explore at:
    csv, excel, json, xmlAvailable download formats
    Dataset updated
    Jun 15, 2025
    Dataset authored and provided by
    TRADING ECONOMICS
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Dec 31, 1985 - Dec 31, 2024
    Area covered
    Vietnam
    Description

    The Gross Domestic Product (GDP) in Vietnam was worth 476.39 billion US dollars in 2024, according to official data from the World Bank. The GDP value of Vietnam represents 0.45 percent of the world economy. This dataset provides the latest reported value for - Vietnam GDP - plus previous releases, historical high and low, short-term forecast and long-term prediction, economic calendar, survey consensus and news.

  3. T

    Vietnam Exports to United States

    • tradingeconomics.com
    csv, excel, json, xml
    Updated Jun 8, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    TRADING ECONOMICS (2017). Vietnam Exports to United States [Dataset]. https://tradingeconomics.com/vietnam/exports/united-states
    Explore at:
    csv, xml, excel, jsonAvailable download formats
    Dataset updated
    Jun 8, 2017
    Dataset authored and provided by
    TRADING ECONOMICS
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Jan 1, 1990 - Dec 31, 2025
    Area covered
    Vietnam
    Description

    Vietnam Exports to United States was US$97.07 Billion during 2023, according to the United Nations COMTRADE database on international trade. Vietnam Exports to United States - data, historical chart and statistics - was last updated on July of 2025.

  4. T

    Vietnam Imports from United States

    • tradingeconomics.com
    csv, excel, json, xml
    Updated Jun 7, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    TRADING ECONOMICS (2017). Vietnam Imports from United States [Dataset]. https://tradingeconomics.com/vietnam/imports/united-states
    Explore at:
    csv, json, excel, xmlAvailable download formats
    Dataset updated
    Jun 7, 2017
    Dataset authored and provided by
    TRADING ECONOMICS
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Jan 1, 1990 - Dec 31, 2025
    Area covered
    Vietnam
    Description

    Vietnam Imports from United States was US$13.83 Billion during 2023, according to the United Nations COMTRADE database on international trade. Vietnam Imports from United States - data, historical chart and statistics - was last updated on August of 2025.

  5. h

    vietvault

    • huggingface.co
    Updated Jul 9, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nam Pham (2024). vietvault [Dataset]. http://doi.org/10.57967/hf/2210
    Explore at:
    Dataset updated
    Jul 9, 2024
    Authors
    Nam Pham
    License

    https://choosealicense.com/licenses/odc-by/https://choosealicense.com/licenses/odc-by/

    Description

    VietVault

    VietVault is a large-scale Vietnamese language corpus, carefully filtered and curated from Common Crawl dataset dumps prior to 2023. This dataset is designed to serve as a high-quality resource for Vietnamese language model pretraining and various natural language processing tasks.

      Dataset Statistics
    

    Size: 80GB of raw text Language: Vietnamese Source: Common Crawl dataset (all dumps in 2013-2023) Preprocessing: Cleaned, deduplicated, filtered for Vietnamese… See the full description on the dataset page: https://huggingface.co/datasets/nampdn-ai/vietvault.

  6. f

    Estimated number of Vietnamese individuals aged 50+ years who are eligible...

    • datasetcatalog.nlm.nih.gov
    • plos.figshare.com
    Updated Jun 16, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Hoang, Duy K.; Ho-Pham, Lan T.; Doan, Minh C.; Ho-Le, Thao P.; D. Mai, Linh (2021). Estimated number of Vietnamese individuals aged 50+ years who are eligible for treatment by the US National Osteoporosis Foundation Guidelines. [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0000885032
    Explore at:
    Dataset updated
    Jun 16, 2021
    Authors
    Hoang, Duy K.; Ho-Pham, Lan T.; Doan, Minh C.; Ho-Le, Thao P.; D. Mai, Linh
    Area covered
    United States
    Description

    Estimated number of Vietnamese individuals aged 50+ years who are eligible for treatment by the US National Osteoporosis Foundation Guidelines.

  7. A

    Primary Language of Newly Medi-Cal Eligible Individuals

    • data.amerigeoss.org
    • data.ca.gov
    • +3more
    csv, zip
    Updated Jun 16, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    United States (2022). Primary Language of Newly Medi-Cal Eligible Individuals [Dataset]. https://data.amerigeoss.org/he/dataset/primary-language-of-newly-medi-cal-eligible-individuals-c6b13
    Explore at:
    zip, csvAvailable download formats
    Dataset updated
    Jun 16, 2022
    Dataset provided by
    United States
    Description

    This dataset includes the primary language of newly Medi-Cal eligible individuals who identified their primary language as English, Spanish, Vietnamese, Mandarin, Cantonese, Arabic, Other Non-English, Armenian, Russian, Farsi, Korean, Tagalog, Other Chinese Languages, Hmong, Cambodian, Portuguese, Lao, French, Thai, Japanese, Samoan, Other Sign Language, American Sign Language (ASL), Turkish, Ilacano, Mien, Italian, Hebrew, and Polish, by reporting period. The primary language data is from the Medi-Cal Eligibility Data System (MEDS) and includes eligible individuals without prior Medi-Cal eligibility. This dataset is part of the public reporting requirements set forth in California Welfare and Institutions Code 14102.5.

  8. h

    VietnameseMedBench

    • huggingface.co
    Updated May 29, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Venera AI (2025). VietnameseMedBench [Dataset]. https://huggingface.co/datasets/venera-ai/VietnameseMedBench
    Explore at:
    Dataset updated
    May 29, 2025
    Dataset authored and provided by
    Venera AI
    Description

    Data platform

      Performances
    
    
    
    
    
    
    
    
    
    
      Citation
    

    Please use the following citation if you intend to use our dataset for training or evaluation: @misc{VietnameseMedBench, title={VM14K: First Vietnamese Medical Benchmark}, author={Anonymus}, year={2025}, howpublished = {\url{https://huggingface.co/datasets/venera-ai/VietnameseMedBench}} }

  9. T

    Vietnam Balance of Trade

    • tradingeconomics.com
    • pt.tradingeconomics.com
    • +13more
    csv, excel, json, xml
    Updated Aug 6, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    TRADING ECONOMICS (2025). Vietnam Balance of Trade [Dataset]. https://tradingeconomics.com/vietnam/balance-of-trade
    Explore at:
    xml, csv, json, excelAvailable download formats
    Dataset updated
    Aug 6, 2025
    Dataset authored and provided by
    TRADING ECONOMICS
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Dec 31, 1990 - Jul 31, 2025
    Area covered
    Vietnam
    Description

    Vietnam recorded a trade surplus of 2.83 USD Billion in June of 2025. This dataset provides the latest reported value for - Vietnam Balance of Trade - plus previous releases, historical high and low, short-term forecast and long-term prediction, economic calendar, survey consensus and news.

  10. h

    Vietnamese-Intel-orca_dpo_pairs-gg-translated

    • huggingface.co
    Updated Nov 25, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Fifth Civil Defender - 5CD (2024). Vietnamese-Intel-orca_dpo_pairs-gg-translated [Dataset]. https://huggingface.co/datasets/5CD-AI/Vietnamese-Intel-orca_dpo_pairs-gg-translated
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Nov 25, 2024
    Dataset authored and provided by
    Fifth Civil Defender - 5CD
    Description

    5CD-AI/Vietnamese-Intel-orca_dpo_pairs-gg-translated dataset hosted on Hugging Face and contributed by the HF Datasets community

  11. h

    wikipedia_vi

    • huggingface.co
    Updated Mar 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    VietGPT (2023). wikipedia_vi [Dataset]. https://huggingface.co/datasets/vietgpt/wikipedia_vi
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 31, 2023
    Dataset authored and provided by
    VietGPT
    Description

    Wikipedia

    Source: https://huggingface.co/datasets/wikipedia Num examples: 1,281,412 Language: Vietnamese

    from datasets import load_dataset

    load_dataset("tdtunlp/wikipedia_vi")

  12. h

    mt_eng_vietnamese

    • huggingface.co
    Updated May 23, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    International Conference on Spoken Language Translation (2024). mt_eng_vietnamese [Dataset]. https://huggingface.co/datasets/IWSLT/mt_eng_vietnamese
    Explore at:
    Dataset updated
    May 23, 2024
    Dataset authored and provided by
    International Conference on Spoken Language Translation
    License

    https://choosealicense.com/licenses/unknown/https://choosealicense.com/licenses/unknown/

    Description

    Preprocessed Dataset from IWSLT'15 English-Vietnamese machine translation: English-Vietnamese.

  13. h

    vietnamese_ultrachat_200k

    • huggingface.co
    Updated Dec 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Thien Phu Nguyen (2023). vietnamese_ultrachat_200k [Dataset]. https://huggingface.co/datasets/nguyenphuthien/vietnamese_ultrachat_200k
    Explore at:
    Dataset updated
    Dec 1, 2023
    Authors
    Thien Phu Nguyen
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Dataset Card for Vietnamese UltraChat 200k

      Dataset Description
    

    This is a heavily filtered version of the UltraChat dataset and was used to train Zephyr-7B-β, a state of the art 7b chat model. The original datasets consists of 1.4M dialogues generated by ChatGPT and spanning a wide range of topics. To create UltraChat 200k, we applied the following logic:

    Selection of a subset of data for faster supervised fine tuning. Truecasing of the dataset, as we observed around 5%… See the full description on the dataset page: https://huggingface.co/datasets/nguyenphuthien/vietnamese_ultrachat_200k.

  14. h

    asr-vi

    • huggingface.co
    Updated May 27, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    SMEW Techlonogy (2025). asr-vi [Dataset]. https://huggingface.co/datasets/SMEW-TECH/asr-vi
    Explore at:
    Dataset updated
    May 27, 2025
    Authors
    SMEW Techlonogy
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    MCP Cloudwords VIVOS Processed ASR Dataset

    This dataset contains Vietnamese speech data processed and prepared by MCP Cloudwords for ASR tasks. It includes audio files and corresponding transcriptions, divided into train and test sets.

      Replace YOUR_USERNAME/YOUR_DATASET_NAME with your actual Hugging Face username and dataset name
    

    dataset = load_dataset("YOUR_USERNAME/YOUR_DATASET_NAME", trust_remote_code=True)

      Display dataset information
    

    print(dataset)… See the full description on the dataset page: https://huggingface.co/datasets/SMEW-TECH/asr-vi.

  15. h

    Vietnamese-openbmb-RLAIF-V-Dataset-gg-translated

    • huggingface.co
    Updated Jan 27, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Fifth Civil Defender - 5CD (2025). Vietnamese-openbmb-RLAIF-V-Dataset-gg-translated [Dataset]. https://huggingface.co/datasets/5CD-AI/Vietnamese-openbmb-RLAIF-V-Dataset-gg-translated
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jan 27, 2025
    Dataset authored and provided by
    Fifth Civil Defender - 5CD
    Description

    5CD-AI/Vietnamese-openbmb-RLAIF-V-Dataset-gg-translated dataset hosted on Hugging Face and contributed by the HF Datasets community

  16. h

    alpaca_gpt4_dialogue_en

    • huggingface.co
    Updated Aug 18, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Hieu Lam (2024). alpaca_gpt4_dialogue_en [Dataset]. https://huggingface.co/datasets/lamhieu/alpaca_gpt4_dialogue_en
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 18, 2024
    Authors
    Hieu Lam
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Description

    The dataset is from 5CD-AI/Vietnamese-c-s-ale-alpaca-gpt4-data-gg-translated, formatted as dialogues for speed and ease of use. Many thanks to 5CD-AI for releasing it. Importantly, this format is easy to use via the default chat template of transformers, meaning you can use huggingface/alignment-handbook immediately, unsloth.

      Structure
    

    View online through viewer.

      Note
    

    We advise you to reconsider before use, thank you. If you find it useful… See the full description on the dataset page: https://huggingface.co/datasets/lamhieu/alpaca_gpt4_dialogue_en.

  17. h

    mabrycodes_dialogue_en

    • huggingface.co
    Updated Aug 18, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Hieu Lam (2024). mabrycodes_dialogue_en [Dataset]. https://huggingface.co/datasets/lamhieu/mabrycodes_dialogue_en
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 18, 2024
    Authors
    Hieu Lam
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Description

    The dataset is from 5CD-AI/Vietnamese-mabryCodes-tiny-cot-alpaca-gg-translated, formatted as dialogues for speed and ease of use. Many thanks to author for releasing it. Importantly, this format is easy to use via the default chat template of transformers, meaning you can use huggingface/alignment-handbook immediately, unsloth.

      Structure
    

    View online through viewer.

      Note
    

    We advise you to reconsider before use, thank you. If you find it useful… See the full description on the dataset page: https://huggingface.co/datasets/lamhieu/mabrycodes_dialogue_en.

  18. fleurs

    • huggingface.co
    • opendatalab.com
    Updated Jun 4, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Google (2022). fleurs [Dataset]. https://huggingface.co/datasets/google/fleurs
    Explore at:
    Dataset updated
    Jun 4, 2022
    Dataset authored and provided by
    Googlehttp://google.com/
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    FLEURS

    Fleurs is the speech version of the FLoRes machine translation benchmark. We use 2009 n-way parallel sentences from the FLoRes dev and devtest publicly available sets, in 102 languages. Training sets have around 10 hours of supervision. Speakers of the train sets are different than speakers from the dev/test sets. Multilingual fine-tuning is used and ”unit error rate” (characters, signs) of all languages is averaged. Languages and results are also grouped into seven… See the full description on the dataset page: https://huggingface.co/datasets/google/fleurs.

  19. h

    Vietnamese-Function-Calling-Test

    • huggingface.co
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    phamhai, Vietnamese-Function-Calling-Test [Dataset]. https://huggingface.co/datasets/phamhai/Vietnamese-Function-Calling-Test
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Authors
    phamhai
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Vietnamese Function Calling Benchmark

    RAG applications for Vietnamese chatbot systems are becoming increasingly popular. Many LLM models already support FC for Vietnamese, but there is no common and comprehensive benchmark yet. Today, I am releasing a benchmark for the Vietnamese Function Calling task. I hope this will serve as a standard for product teams to choose models in a reasonable and appropriate way. Dataset Details:

    Data size: 2899 single-turn funcation calling samples Domains:… See the full description on the dataset page: https://huggingface.co/datasets/phamhai/Vietnamese-Function-Calling-Test.

  20. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
FPTU DSC (2023). face-celeb-vietnamese [Dataset]. https://huggingface.co/datasets/fptudsc/face-celeb-vietnamese

face-celeb-vietnamese

fptudsc/face-celeb-vietnamese

Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
May 4, 2023
Dataset authored and provided by
FPTU DSC
License

Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically

Description

Dataset Card for "face-celeb-vietnamese"

  Dataset Summary

This dataset contains information on over 8,000 samples of well-known Vietnamese individuals, categorized into three professions: singers, actors, and beauty queens. The dataset includes data on more than 100 celebrities in each of the three job categories.

  Languages

Vietnamese: The label is used to indicate the name of celebrities in Vietnamese.

  Dataset Structure

The image and Vietnamese… See the full description on the dataset page: https://huggingface.co/datasets/fptudsc/face-celeb-vietnamese.

Search
Clear search
Close search
Google apps
Main menu