3 datasets found
  1. Parler archived dataset 2021

    • kaggle.com
    Updated Jan 10, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    dasika gayathry (2021). Parler archived dataset 2021 [Dataset]. https://www.kaggle.com/dasikag/parler-dataset/discussion
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jan 10, 2021
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    dasika gayathry
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Context

    Trump ban on social media following capitol riots made Parler and other fringe platforms gain a favourable right leaning follower base. The free-speech advocates on Parler reportedly perpetuated hatred and inflamed conspiracy theories. Few security researchers on twitter have been paying attention to the so-called "Right wing" network and with great effort, they archived around 3billion of the posts on archive.org over the past few months.

    As of 11th Jan 2020, Parler was removed from Google and Apple app stores, and the site was taken down by AWS.

    Content

    There are several txt files, each containing URL to an individual post. There are image, txt, and links to the video files. It also contains deleted posts and videos. https://web.archive.org/web/20210110202718/https://parler.com/post/d18e8fedcaf147649f160267e57bde41 It's beyond the scope to individually pull all the information for analysis here. It's quite big and slow to do it on one computer :) ~ 100 tb.

    Acknowledgements

    Twitter : @donk_enby

    Inspiration

    Sentiment analysis on the text data. Analysis of hate speech and profiling. Deep moji analysis Ideas on how to moderate a platform like this in future.

  2. h

    mls_eng_10k

    • huggingface.co
    Updated Mar 16, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Parler TTS (2024). mls_eng_10k [Dataset]. https://huggingface.co/datasets/parler-tts/mls_eng_10k
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 16, 2024
    Dataset authored and provided by
    Parler TTS
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Dataset Summary

    This is a 10K hours subset of English version of the Multilingual LibriSpeech (MLS) dataset. The data archives were restructured from the original ones from OpenSLR to make it easier to stream. MLS dataset is a large multilingual corpus suitable for speech research. The dataset is derived from read audiobooks from LibriVox and consists of 8 languages - English, German, Dutch, Spanish, French, Italian, Portuguese, Polish. It includes about 44.5K hours of English and… See the full description on the dataset page: https://huggingface.co/datasets/parler-tts/mls_eng_10k.

  3. h

    mls_eng

    • huggingface.co
    Updated May 21, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Parler TTS (2024). mls_eng [Dataset]. https://huggingface.co/datasets/parler-tts/mls_eng
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    May 21, 2024
    Dataset authored and provided by
    Parler TTS
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Dataset Card for English MLS

      Dataset Summary
    

    This is a streamable version of the English version of the Multilingual LibriSpeech (MLS) dataset. The data archives were restructured from the original ones from OpenSLR to make it easier to stream. MLS dataset is a large multilingual corpus suitable for speech research. The dataset is derived from read audiobooks from LibriVox and consists of 8 languages - English, German, Dutch, Spanish, French, Italian, Portuguese… See the full description on the dataset page: https://huggingface.co/datasets/parler-tts/mls_eng.

  4. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
dasika gayathry (2021). Parler archived dataset 2021 [Dataset]. https://www.kaggle.com/dasikag/parler-dataset/discussion
Organization logo

Parler archived dataset 2021

Social media data urls , to analyse how right were the right wingers :)

Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jan 10, 2021
Dataset provided by
Kagglehttp://kaggle.com/
Authors
dasika gayathry
License

https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

Description

Context

Trump ban on social media following capitol riots made Parler and other fringe platforms gain a favourable right leaning follower base. The free-speech advocates on Parler reportedly perpetuated hatred and inflamed conspiracy theories. Few security researchers on twitter have been paying attention to the so-called "Right wing" network and with great effort, they archived around 3billion of the posts on archive.org over the past few months.

As of 11th Jan 2020, Parler was removed from Google and Apple app stores, and the site was taken down by AWS.

Content

There are several txt files, each containing URL to an individual post. There are image, txt, and links to the video files. It also contains deleted posts and videos. https://web.archive.org/web/20210110202718/https://parler.com/post/d18e8fedcaf147649f160267e57bde41 It's beyond the scope to individually pull all the information for analysis here. It's quite big and slow to do it on one computer :) ~ 100 tb.

Acknowledgements

Twitter : @donk_enby

Inspiration

Sentiment analysis on the text data. Analysis of hate speech and profiling. Deep moji analysis Ideas on how to moderate a platform like this in future.

Search
Clear search
Close search
Google apps
Main menu