3 datasets found
  1. e

    Credibility Corpus with several datasets (Twitter, Web database) in French...

    • data.europa.eu
    • data.wu.ac.at
    rar
    Updated Dec 1, 2016
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    DELETED DELETED (2016). Credibility Corpus with several datasets (Twitter, Web database) in French and English [Dataset]. https://data.europa.eu/data/datasets/5840066288ee38426dc65bb3?locale=fr
    Explore at:
    rar(33261), rar(680351), rar(40693), rar(102374), rar(77120), rar(212274)Available download formats
    Dataset updated
    Dec 1, 2016
    Dataset authored and provided by
    DELETED DELETED
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    French
    Description

    Description of the corpora

    The set of these datasets are made to analyze ifnormation credibility in general (rumor and disinformation for English and French documents), and occuring on the social web. Target databases about rumor, hoax and disinformation helped to collect obviously misinformation. Some topic (with keywords) helps us to made corpora from the micrroblogging platform Twitter, great provider of rumors and disinformation.

    1 corpus describes Texts from the web database about rumors and disinformation. 4 corpora from Social Media Twitter about specific rumors (2 in English, 2 in French). 4 corpora from Social Media Twitter randomly built (2 in English, 2 in French). 4 corpora from Social Media Twitter about specific rumors (2 in English, 2 in French).

    Size of different corpora :

    Social Web Rumorous corpus: 1,612

    French Hollande Rumorous corpus (Twitter): 371 French Lemon Rumorous corpus (Twitter): 270 English Pin Rumorous corpus (Twitter): 679 English Swine Rumorous corpus (Twitter): 1024

    French 1st Random corpus (Twitter): 1000 French 2st Random corpus (Twitter): 1000 English 3st Random corpus (Twitter): 1000 English 4st Random corpus (Twitter): 1000

    French Rihanna Event corpus (Twitter): 543 English Rihanna Event corpus (Twitter): 1000 French Euro2016 Event corpus (Twitter): 1000 English Euro2016 Event corpus (Twitter): 1000

    A matrix links tweets with most 50 frequent words

    Text data :

    _id : message id body text : string text data

    Matrix data :

    52 columns (first column is id, second column is rumor indicator 1 or -1, other columns are words value is 1 contain or 0 does not contain) 11,102 lines (each line is a message)

    Hidalgo corpus: lines range 1:75 Lemon corpus : lines range 76:467 Pin rumor : lines range 468:656 swine : lines range 657:1311

    random messages : lines range 1312:11103

    Sample contains : French Pin Rumorous corpus (Twitter): 679 Matrix data :

    52 columns (first column is id, second column is rumor indicator 1 or -1, other columns are words value is 1 contain or 0 does not contain) 189 lines (each line is a message)

  2. E

    Credibility Corpus with several datasets (Twitter, Web database) in French...

    • live.european-language-grid.eu
    txt
    Updated Apr 10, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). Credibility Corpus with several datasets (Twitter, Web database) in French and English [Dataset]. https://live.european-language-grid.eu/catalogue/corpus/7468
    Explore at:
    txtAvailable download formats
    Dataset updated
    Apr 10, 2024
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    French
    Description

    The set of these datasets are made to analyze information credibility in general (rumor and disinformation for English and French documents), and occuring on the social web. Target databases about rumor, hoax and disinformation helped to collect obviously misinformation. Some topic (with keywords) helps us to made corpora from the micrroblogging platform Twitter, great provider of rumors and disinformation.1 corpus describes Texts from the web database about rumors and disinformation. 4 corpora from Social Media Twitter about specific rumors (2 in English, 2 in French). 4 corpora from Social Media Twitter randomly built (2 in English, 2 in French). 4 corpora from Social Media Twitter about specific rumors (2 in English, 2 in French).Size of different corpora :Social Web Rumorous corpus: 1,612French Hollande Rumorous corpus (Twitter): 371 French Lemon Rumorous corpus (Twitter): 270 English Pin Rumorous corpus (Twitter): 679 English Swine Rumorous corpus (Twitter): 1024French 1st Random corpus (Twitter): 1000 French 2st Random corpus (Twitter): 1000 English 3st Random corpus (Twitter): 1000 English 4st Random corpus (Twitter): 1000French Rihanna Event corpus (Twitter): 543 English Rihanna Event corpus (Twitter): 1000 French Euro2016 Event corpus (Twitter): 1000 English Euro2016 Event corpus (Twitter): 1000A matrix links tweets with most 50 frequent wordsText data :_id : message id body text : string text dataMatrix data :52 columns (first column is id, second column is rumor indicator 1 or -1, other columns are words value is 1 contain or 0 does not contain) 11,102 lines (each line is a message)Hidalgo corpus: lines range 1:75 Lemon corpus : lines range 76:467 Pin rumor : lines range 468:656 swine : lines range 657:1311random messages : lines range 1312:11103Sample contains : French Pin Rumorous corpus (Twitter): 679 Matrix data :52 columns (first column is id, second column is rumor indicator 1 or -1, other columns are words value is 1 contain or 0 does not contain) 189 lines (each line is a message)

  3. e

    Creedibility Corpus with several datasets (Twitter, Web database) in French...

    • data.europa.eu
    rar
    Updated Dec 1, 2016
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    nicolas turenne (2016). Creedibility Corpus with several datasets (Twitter, Web database) in French and English [Dataset]. https://data.europa.eu/data/datasets/5840066288ee38426dc65bb3/embed
    Explore at:
    rar(33261), rar(77120), rar(212274), rar(680351), rar(40693), rar(102374)Available download formats
    Dataset updated
    Dec 1, 2016
    Dataset authored and provided by
    nicolas turenne
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    French
    Description

    Description of the corpora

    The set of these datasets are made to analyse ifnormation credibility in general (rumor and disinformation for English and French documents), and occuring on the social web. Target databases about rumor, hoax and disinformation helped to collection obviously misinformation. Some topic (with keywords) helps us to made corpora from the micrroblogging platform Twitter, great provider of rumors and disinformation.

    1 corpus describes Texts from the web database about rumors and disinformation. 4 corpora from Social Media Twitter about specific rumors (2 in English, 2 in French). 4 corpora from Social Media Twitter randomly built (2 in English, 2 in French). 4 corpora from Social Media Twitter about specific rumors (2 in English, 2 in French).

    Size of different corpora:

    Social Web Rumorous corpus: 1,612

    French Hollande Rumorous corpus (Twitter): 371 French Lemon Rumorous corpus (Twitter): 270 English Pin Rumorous corpus (Twitter): 679 English Swine Rumorous corpus (Twitter): 1024

    French 1st Random corpus (Twitter): 1000 French 2nd Random corpus (Twitter): 1000 English 3rd Random corpus (Twitter): 1000 English 4th Random corpus (Twitter): 1000

    French Rihanna Event corpus (Twitter): 543 English Rihanna Event corpus (Twitter): 1000 French Euro2016 Event corpus (Twitter): 1000 English Euro2016 Event corpus (Twitter): 1000

    A matrix links tweets with most 50 frequent words

    Text data:

    _id: message id body text: string text data

    Matrix data:

    52 columns (first column is id, second column is rumor indicator 1 or -1, other columns are words value is 1 contain or 0 does not contain) 11,102 lines (each line is a message)

    Hidalgo corpus: lines range 1:75 Lemon corpus: lines range 76:467 Pin rumor: lines range 468:656 Swine: lines range 657:1311

    random Messages: lines range 1312:11103

    Sample contains: French Pin Rumorous corpus (Twitter): 679 Matrix data:

    52 columns (first column is id, second column is rumor indicator 1 or -1, other columns are words value is 1 contain or 0 does not contain) 189 lines (each line is a message)

  4. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
DELETED DELETED (2016). Credibility Corpus with several datasets (Twitter, Web database) in French and English [Dataset]. https://data.europa.eu/data/datasets/5840066288ee38426dc65bb3?locale=fr

Credibility Corpus with several datasets (Twitter, Web database) in French and English

Explore at:
rar(33261), rar(680351), rar(40693), rar(102374), rar(77120), rar(212274)Available download formats
Dataset updated
Dec 1, 2016
Dataset authored and provided by
DELETED DELETED
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Area covered
French
Description

Description of the corpora

The set of these datasets are made to analyze ifnormation credibility in general (rumor and disinformation for English and French documents), and occuring on the social web. Target databases about rumor, hoax and disinformation helped to collect obviously misinformation. Some topic (with keywords) helps us to made corpora from the micrroblogging platform Twitter, great provider of rumors and disinformation.

1 corpus describes Texts from the web database about rumors and disinformation. 4 corpora from Social Media Twitter about specific rumors (2 in English, 2 in French). 4 corpora from Social Media Twitter randomly built (2 in English, 2 in French). 4 corpora from Social Media Twitter about specific rumors (2 in English, 2 in French).

Size of different corpora :

Social Web Rumorous corpus: 1,612

French Hollande Rumorous corpus (Twitter): 371 French Lemon Rumorous corpus (Twitter): 270 English Pin Rumorous corpus (Twitter): 679 English Swine Rumorous corpus (Twitter): 1024

French 1st Random corpus (Twitter): 1000 French 2st Random corpus (Twitter): 1000 English 3st Random corpus (Twitter): 1000 English 4st Random corpus (Twitter): 1000

French Rihanna Event corpus (Twitter): 543 English Rihanna Event corpus (Twitter): 1000 French Euro2016 Event corpus (Twitter): 1000 English Euro2016 Event corpus (Twitter): 1000

A matrix links tweets with most 50 frequent words

Text data :

_id : message id body text : string text data

Matrix data :

52 columns (first column is id, second column is rumor indicator 1 or -1, other columns are words value is 1 contain or 0 does not contain) 11,102 lines (each line is a message)

Hidalgo corpus: lines range 1:75 Lemon corpus : lines range 76:467 Pin rumor : lines range 468:656 swine : lines range 657:1311

random messages : lines range 1312:11103

Sample contains : French Pin Rumorous corpus (Twitter): 679 Matrix data :

52 columns (first column is id, second column is rumor indicator 1 or -1, other columns are words value is 1 contain or 0 does not contain) 189 lines (each line is a message)

Search
Clear search
Close search
Google apps
Main menu