2 datasets found

h
pii-masking-43k
huggingface.co
Updated Jul 1, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ai4Privacy (2023). pii-masking-43k [Dataset]. http://doi.org/10.57967/hf/0824
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.57967/hf/0824
Dataset updated
Jul 1, 2023
Dataset authored and provided by
Ai4Privacy
Description
Purpose and Features

The purpose of the model and dataset is to remove personally identifiable information (PII) from text, especially in the context of AI assistants and LLMs. The model is a fine-tuned version of "Distilled BERT", a smaller and faster version of BERT. It was adapted for the task of token classification based on the largest to our knowledge open-source PII masking dataset, which we are releasing simultaneously. The model size is 62 million parameters. The… See the full description on the dataset page: https://huggingface.co/datasets/ai4privacy/pii-masking-43k.
h
pii-masking-65k
huggingface.co
Updated Apr 5, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ai4Privacy (2024). pii-masking-65k [Dataset]. http://doi.org/10.57967/hf/2012
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.57967/hf/2012
Dataset updated
Apr 5, 2024
Dataset authored and provided by
Ai4Privacy
Description
Purpose and Features

The purpose of the model and dataset is to remove personally identifiable information (PII) from text, especially in the context of AI assistants and LLMs. The model is a fine-tuned version of "Distilled BERT", a smaller and faster version of BERT. It was adapted for the task of token classification based on the largest to our knowledge open-source PII masking dataset, which we are releasing simultaneously. The model size is 62 million parameters. The… See the full description on the dataset page: https://huggingface.co/datasets/ai4privacy/pii-masking-65k.
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

Ai4Privacy (2023). pii-masking-43k [Dataset]. http://doi.org/10.57967/hf/0824

pii-masking-43k

ai4privacy/pii-masking-43k

Explore at:

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Unique identifier

https://doi.org/10.57967/hf/0824

Dataset updated

Jul 1, 2023

Dataset authored and provided by

Ai4Privacy

Description

Purpose and Features

The purpose of the model and dataset is to remove personally identifiable information (PII) from text, especially in the context of AI assistants and LLMs. The model is a fine-tuned version of "Distilled BERT", a smaller and faster version of BERT. It was adapted for the task of token classification based on the largest to our knowledge open-source PII masking dataset, which we are releasing simultaneously. The model size is 62 million parameters. The… See the full description on the dataset page: https://huggingface.co/datasets/ai4privacy/pii-masking-43k.

Clear search

Close search

Google apps

Main menu

pii-masking-43k

pii-masking-65k

pii-masking-43k

ai4privacy/pii-masking-43k