19 datasets found

h
face-celeb-vietnamese
huggingface.co
Updated May 4, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
FPTU DSC (2023). face-celeb-vietnamese [Dataset]. https://huggingface.co/datasets/fptudsc/face-celeb-vietnamese
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
May 4, 2023
Dataset authored and provided by
FPTU DSC
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Dataset Card for "face-celeb-vietnamese"

Dataset Summary

This dataset contains information on over 8,000 samples of well-known Vietnamese individuals, categorized into three professions: singers, actors, and beauty queens. The dataset includes data on more than 100 celebrities in each of the three job categories.

Languages

Vietnamese: The label is used to indicate the name of celebrities in Vietnamese.

Dataset Structure

The image and Vietnamese… See the full description on the dataset page: https://huggingface.co/datasets/fptudsc/face-celeb-vietnamese.
T
Vietnam GDP
tradingeconomics.com
pt.tradingeconomics.com
+13more
csv, excel, json, xml
Updated Jun 15, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
TRADING ECONOMICS (2025). Vietnam GDP [Dataset]. https://tradingeconomics.com/vietnam/gdp
Explore at:
csv, excel, json, xmlAvailable download formats
Dataset updated
Jun 15, 2025
Dataset authored and provided by
TRADING ECONOMICS
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
Dec 31, 1985 - Dec 31, 2024
Area covered
Vietnam
Description
The Gross Domestic Product (GDP) in Vietnam was worth 476.39 billion US dollars in 2024, according to official data from the World Bank. The GDP value of Vietnam represents 0.45 percent of the world economy. This dataset provides the latest reported value for - Vietnam GDP - plus previous releases, historical high and low, short-term forecast and long-term prediction, economic calendar, survey consensus and news.
T
Vietnam Exports to United States
tradingeconomics.com
csv, excel, json, xml
Updated Jun 8, 2017
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
TRADING ECONOMICS (2017). Vietnam Exports to United States [Dataset]. https://tradingeconomics.com/vietnam/exports/united-states
Explore at:
csv, xml, excel, jsonAvailable download formats
Dataset updated
Jun 8, 2017
Dataset authored and provided by
TRADING ECONOMICS
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
Jan 1, 1990 - Dec 31, 2025
Area covered
Vietnam
Description
Vietnam Exports to United States was US$97.07 Billion during 2023, according to the United Nations COMTRADE database on international trade. Vietnam Exports to United States - data, historical chart and statistics - was last updated on July of 2025.
T
Vietnam Imports from United States
tradingeconomics.com
csv, excel, json, xml
Updated Jun 7, 2017
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
TRADING ECONOMICS (2017). Vietnam Imports from United States [Dataset]. https://tradingeconomics.com/vietnam/imports/united-states
Explore at:
csv, json, excel, xmlAvailable download formats
Dataset updated
Jun 7, 2017
Dataset authored and provided by
TRADING ECONOMICS
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
Jan 1, 1990 - Dec 31, 2025
Area covered
Vietnam
Description
Vietnam Imports from United States was US$13.83 Billion during 2023, according to the United Nations COMTRADE database on international trade. Vietnam Imports from United States - data, historical chart and statistics - was last updated on August of 2025.
h
vietvault
huggingface.co
Updated Jul 9, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nam Pham (2024). vietvault [Dataset]. http://doi.org/10.57967/hf/2210
Explore at:
Unique identifier
https://doi.org/10.57967/hf/2210
Dataset updated
Jul 9, 2024
Authors
Nam Pham
License
https://choosealicense.com/licenses/odc-by/https://choosealicense.com/licenses/odc-by/
Description
VietVault

VietVault is a large-scale Vietnamese language corpus, carefully filtered and curated from Common Crawl dataset dumps prior to 2023. This dataset is designed to serve as a high-quality resource for Vietnamese language model pretraining and various natural language processing tasks.

Dataset Statistics

Size: 80GB of raw text Language: Vietnamese Source: Common Crawl dataset (all dumps in 2013-2023) Preprocessing: Cleaned, deduplicated, filtered for Vietnamese… See the full description on the dataset page: https://huggingface.co/datasets/nampdn-ai/vietvault.
f
Estimated number of Vietnamese individuals aged 50+ years who are eligible...
datasetcatalog.nlm.nih.gov
plos.figshare.com
Updated Jun 16, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Hoang, Duy K.; Ho-Pham, Lan T.; Doan, Minh C.; Ho-Le, Thao P.; D. Mai, Linh (2021). Estimated number of Vietnamese individuals aged 50+ years who are eligible for treatment by the US National Osteoporosis Foundation Guidelines. [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0000885032
Explore at:
Dataset updated
Jun 16, 2021
Authors
Hoang, Duy K.; Ho-Pham, Lan T.; Doan, Minh C.; Ho-Le, Thao P.; D. Mai, Linh
Area covered
United States
Description
Estimated number of Vietnamese individuals aged 50+ years who are eligible for treatment by the US National Osteoporosis Foundation Guidelines.
A
Primary Language of Newly Medi-Cal Eligible Individuals
data.amerigeoss.org
data.ca.gov
+3more
csv, zip
Updated Jun 16, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
United States (2022). Primary Language of Newly Medi-Cal Eligible Individuals [Dataset]. https://data.amerigeoss.org/he/dataset/primary-language-of-newly-medi-cal-eligible-individuals-c6b13
Explore at:
zip, csvAvailable download formats
Dataset updated
Jun 16, 2022
Dataset provided by
United States
Description
This dataset includes the primary language of newly Medi-Cal eligible individuals who identified their primary language as English, Spanish, Vietnamese, Mandarin, Cantonese, Arabic, Other Non-English, Armenian, Russian, Farsi, Korean, Tagalog, Other Chinese Languages, Hmong, Cambodian, Portuguese, Lao, French, Thai, Japanese, Samoan, Other Sign Language, American Sign Language (ASL), Turkish, Ilacano, Mien, Italian, Hebrew, and Polish, by reporting period. The primary language data is from the Medi-Cal Eligibility Data System (MEDS) and includes eligible individuals without prior Medi-Cal eligibility. This dataset is part of the public reporting requirements set forth in California Welfare and Institutions Code 14102.5.
h
VietnameseMedBench
huggingface.co
Updated May 29, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Venera AI (2025). VietnameseMedBench [Dataset]. https://huggingface.co/datasets/venera-ai/VietnameseMedBench
Explore at:
Dataset updated
May 29, 2025
Dataset authored and provided by
Venera AI
Description
Data platform

Performances Citation

Please use the following citation if you intend to use our dataset for training or evaluation: @misc{VietnameseMedBench, title={VM14K: First Vietnamese Medical Benchmark}, author={Anonymus}, year={2025}, howpublished = {\url{https://huggingface.co/datasets/venera-ai/VietnameseMedBench}} }
T
Vietnam Balance of Trade
tradingeconomics.com
pt.tradingeconomics.com
+13more
csv, excel, json, xml
Updated Aug 6, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
TRADING ECONOMICS (2025). Vietnam Balance of Trade [Dataset]. https://tradingeconomics.com/vietnam/balance-of-trade
Explore at:
xml, csv, json, excelAvailable download formats
Dataset updated
Aug 6, 2025
Dataset authored and provided by
TRADING ECONOMICS
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
Dec 31, 1990 - Jul 31, 2025
Area covered
Vietnam
Description
Vietnam recorded a trade surplus of 2.83 USD Billion in June of 2025. This dataset provides the latest reported value for - Vietnam Balance of Trade - plus previous releases, historical high and low, short-term forecast and long-term prediction, economic calendar, survey consensus and news.
h
Vietnamese-Intel-orca_dpo_pairs-gg-translated
huggingface.co
Updated Nov 25, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Fifth Civil Defender - 5CD (2024). Vietnamese-Intel-orca_dpo_pairs-gg-translated [Dataset]. https://huggingface.co/datasets/5CD-AI/Vietnamese-Intel-orca_dpo_pairs-gg-translated
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Nov 25, 2024
Dataset authored and provided by
Fifth Civil Defender - 5CD
Description
5CD-AI/Vietnamese-Intel-orca_dpo_pairs-gg-translated dataset hosted on Hugging Face and contributed by the HF Datasets community
h
wikipedia_vi
huggingface.co
Updated Mar 31, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
VietGPT (2023). wikipedia_vi [Dataset]. https://huggingface.co/datasets/vietgpt/wikipedia_vi
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Mar 31, 2023
Dataset authored and provided by
VietGPT
Description
Wikipedia

Source: https://huggingface.co/datasets/wikipedia Num examples: 1,281,412 Language: Vietnamese

from datasets import load_dataset

load_dataset("tdtunlp/wikipedia_vi")
h
mt_eng_vietnamese
huggingface.co
Updated May 23, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
International Conference on Spoken Language Translation (2024). mt_eng_vietnamese [Dataset]. https://huggingface.co/datasets/IWSLT/mt_eng_vietnamese
Explore at:
Dataset updated
May 23, 2024
Dataset authored and provided by
International Conference on Spoken Language Translation
License
https://choosealicense.com/licenses/unknown/https://choosealicense.com/licenses/unknown/
Description
Preprocessed Dataset from IWSLT'15 English-Vietnamese machine translation: English-Vietnamese.
h
vietnamese_ultrachat_200k
huggingface.co
Updated Dec 1, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Thien Phu Nguyen (2023). vietnamese_ultrachat_200k [Dataset]. https://huggingface.co/datasets/nguyenphuthien/vietnamese_ultrachat_200k
Explore at:
Dataset updated
Dec 1, 2023
Authors
Thien Phu Nguyen
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
Dataset Card for Vietnamese UltraChat 200k

Dataset Description

This is a heavily filtered version of the UltraChat dataset and was used to train Zephyr-7B-β, a state of the art 7b chat model. The original datasets consists of 1.4M dialogues generated by ChatGPT and spanning a wide range of topics. To create UltraChat 200k, we applied the following logic:

Selection of a subset of data for faster supervised fine tuning. Truecasing of the dataset, as we observed around 5%… See the full description on the dataset page: https://huggingface.co/datasets/nguyenphuthien/vietnamese_ultrachat_200k.
h
asr-vi
huggingface.co
Updated May 27, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
SMEW Techlonogy (2025). asr-vi [Dataset]. https://huggingface.co/datasets/SMEW-TECH/asr-vi
Explore at:
Dataset updated
May 27, 2025
Authors
SMEW Techlonogy
License
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Description
MCP Cloudwords VIVOS Processed ASR Dataset

This dataset contains Vietnamese speech data processed and prepared by MCP Cloudwords for ASR tasks. It includes audio files and corresponding transcriptions, divided into train and test sets.

Replace YOUR_USERNAME/YOUR_DATASET_NAME with your actual Hugging Face username and dataset name

dataset = load_dataset("YOUR_USERNAME/YOUR_DATASET_NAME", trust_remote_code=True)

Display dataset information

print(dataset)… See the full description on the dataset page: https://huggingface.co/datasets/SMEW-TECH/asr-vi.
h
Vietnamese-openbmb-RLAIF-V-Dataset-gg-translated
huggingface.co
Updated Jan 27, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Fifth Civil Defender - 5CD (2025). Vietnamese-openbmb-RLAIF-V-Dataset-gg-translated [Dataset]. https://huggingface.co/datasets/5CD-AI/Vietnamese-openbmb-RLAIF-V-Dataset-gg-translated
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jan 27, 2025
Dataset authored and provided by
Fifth Civil Defender - 5CD
Description
5CD-AI/Vietnamese-openbmb-RLAIF-V-Dataset-gg-translated dataset hosted on Hugging Face and contributed by the HF Datasets community
h
alpaca_gpt4_dialogue_en
huggingface.co
Updated Aug 18, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Hieu Lam (2024). alpaca_gpt4_dialogue_en [Dataset]. https://huggingface.co/datasets/lamhieu/alpaca_gpt4_dialogue_en
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Aug 18, 2024
Authors
Hieu Lam
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
Description

The dataset is from 5CD-AI/Vietnamese-c-s-ale-alpaca-gpt4-data-gg-translated, formatted as dialogues for speed and ease of use. Many thanks to 5CD-AI for releasing it. Importantly, this format is easy to use via the default chat template of transformers, meaning you can use huggingface/alignment-handbook immediately, unsloth.

Structure

View online through viewer.

Note

We advise you to reconsider before use, thank you. If you find it useful… See the full description on the dataset page: https://huggingface.co/datasets/lamhieu/alpaca_gpt4_dialogue_en.
h
mabrycodes_dialogue_en
huggingface.co
Updated Aug 18, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Hieu Lam (2024). mabrycodes_dialogue_en [Dataset]. https://huggingface.co/datasets/lamhieu/mabrycodes_dialogue_en
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Aug 18, 2024
Authors
Hieu Lam
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
Description

The dataset is from 5CD-AI/Vietnamese-mabryCodes-tiny-cot-alpaca-gg-translated, formatted as dialogues for speed and ease of use. Many thanks to author for releasing it. Importantly, this format is easy to use via the default chat template of transformers, meaning you can use huggingface/alignment-handbook immediately, unsloth.

Structure

View online through viewer.

Note

We advise you to reconsider before use, thank you. If you find it useful… See the full description on the dataset page: https://huggingface.co/datasets/lamhieu/mabrycodes_dialogue_en.
fleurs
huggingface.co
opendatalab.com
Updated Jun 4, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Google (2022). fleurs [Dataset]. https://huggingface.co/datasets/google/fleurs
Explore at:
Dataset updated
Jun 4, 2022
Dataset authored and provided by
Googlehttp://google.com/
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
FLEURS

Fleurs is the speech version of the FLoRes machine translation benchmark. We use 2009 n-way parallel sentences from the FLoRes dev and devtest publicly available sets, in 102 languages. Training sets have around 10 hours of supervision. Speakers of the train sets are different than speakers from the dev/test sets. Multilingual fine-tuning is used and ”unit error rate” (characters, signs) of all languages is averaged. Languages and results are also grouped into seven… See the full description on the dataset page: https://huggingface.co/datasets/google/fleurs.
h
Vietnamese-Function-Calling-Test
huggingface.co
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
phamhai, Vietnamese-Function-Calling-Test [Dataset]. https://huggingface.co/datasets/phamhai/Vietnamese-Function-Calling-Test
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Authors
phamhai
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Vietnamese Function Calling Benchmark

RAG applications for Vietnamese chatbot systems are becoming increasingly popular. Many LLM models already support FC for Vietnamese, but there is no common and comprehensive benchmark yet. Today, I am releasing a benchmark for the Vietnamese Function Calling task. I hope this will serve as a standard for product teams to choose models in a reasonable and appropriate way. Dataset Details:

Data size: 2899 single-turn funcation calling samples Domains:… See the full description on the dataset page: https://huggingface.co/datasets/phamhai/Vietnamese-Function-Calling-Test.
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

FPTU DSC (2023). face-celeb-vietnamese [Dataset]. https://huggingface.co/datasets/fptudsc/face-celeb-vietnamese

face-celeb-vietnamese

fptudsc/face-celeb-vietnamese

Explore at:

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Dataset updated

May 4, 2023

Dataset authored and provided by

FPTU DSC

License

Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically

Description

Dataset Card for "face-celeb-vietnamese"

  Dataset Summary

This dataset contains information on over 8,000 samples of well-known Vietnamese individuals, categorized into three professions: singers, actors, and beauty queens. The dataset includes data on more than 100 celebrities in each of the three job categories.

  Languages

Vietnamese: The label is used to indicate the name of celebrities in Vietnamese.

  Dataset Structure

The image and Vietnamese… See the full description on the dataset page: https://huggingface.co/datasets/fptudsc/face-celeb-vietnamese.

Clear search

Close search

Google apps

Main menu

face-celeb-vietnamese

Vietnam GDP

Vietnam Exports to United States

Vietnam Imports from United States

vietvault

Estimated number of Vietnamese individuals aged 50+ years who are eligible...

Primary Language of Newly Medi-Cal Eligible Individuals

VietnameseMedBench

Vietnam Balance of Trade

Vietnamese-Intel-orca_dpo_pairs-gg-translated

wikipedia_vi

mt_eng_vietnamese

vietnamese_ultrachat_200k

asr-vi

Vietnamese-openbmb-RLAIF-V-Dataset-gg-translated

alpaca_gpt4_dialogue_en

mabrycodes_dialogue_en

fleurs

Vietnamese-Function-Calling-Test

face-celeb-vietnamese

fptudsc/face-celeb-vietnamese