2 datasets found

h
pii-masking-200k
huggingface.co
Updated Apr 22, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ai4Privacy (2024). pii-masking-200k [Dataset]. http://doi.org/10.57967/hf/1532
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.57967/hf/1532
Dataset updated
Apr 22, 2024
Dataset authored and provided by
Ai4Privacy
Description
Ai4Privacy Community

Join our community at https://discord.gg/FmzWshaaQT to help build open datasets for privacy masking.

Purpose and Features

Previous world's largest open dataset for privacy. Now it is pii-masking-300k The purpose of the dataset is to train models to remove personally identifiable information (PII) from text, especially in the context of AI assistants and LLMs. The example texts have 54 PII classes (types of sensitive data), targeting 229 discussion… See the full description on the dataset page: https://huggingface.co/datasets/ai4privacy/pii-masking-200k.
h
pii-masking-english-5k
huggingface.co
Updated Aug 22, 2007
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Aniket Kulkarni (2007). pii-masking-english-5k [Dataset]. https://huggingface.co/datasets/aniket-curlscape/pii-masking-english-5k
Explore at:
Dataset updated
Aug 22, 2007
Authors
Aniket Kulkarni
License
https://choosealicense.com/licenses/other/https://choosealicense.com/licenses/other/
Description
Important

This repository contains the English-only subset of the Ai4Privacy PII-Masking-300k Dataset. The dataset is curated to provide English texts only, while retaining the structure, labeling schema, and licensing of the original dataset.

Licensing

Academic use is encouraged with proper citation provided it follows similar license terms*. Commercial entities should contact us at licensing@ai4privacy.com for licensing inquiries and additional data access.*

Terms… See the full description on the dataset page: https://huggingface.co/datasets/aniket-curlscape/pii-masking-english-5k.
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

Ai4Privacy (2024). pii-masking-200k [Dataset]. http://doi.org/10.57967/hf/1532

pii-masking-200k

Ai4Privacy PII200k Dataset

ai4privacy/pii-masking-200k

Explore at:

14 scholarly articles cite this dataset (View in Google Scholar)

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Unique identifier

https://doi.org/10.57967/hf/1532

Dataset updated

Apr 22, 2024

Dataset authored and provided by

Ai4Privacy

Description

Ai4Privacy Community

Join our community at https://discord.gg/FmzWshaaQT to help build open datasets for privacy masking.

  Purpose and Features

Previous world's largest open dataset for privacy. Now it is pii-masking-300k The purpose of the dataset is to train models to remove personally identifiable information (PII) from text, especially in the context of AI assistants and LLMs. The example texts have 54 PII classes (types of sensitive data), targeting 229 discussion… See the full description on the dataset page: https://huggingface.co/datasets/ai4privacy/pii-masking-200k.

Clear search

Close search

Google apps

Main menu

pii-masking-200k

pii-masking-english-5k

pii-masking-200k

Ai4Privacy PII200k Dataset

ai4privacy/pii-masking-200k