57 datasets found

f
Recent work related to propaganda.
plos.figshare.com
xls
Updated Jul 10, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Akib Mohi Ud Din Khanday; Mudasir Ahmad Wani; Syed Tanzeel Rabani; Qamar Rayees Khan; Ahmed A. Abd El-Latif (2024). Recent work related to propaganda. [Dataset]. http://doi.org/10.1371/journal.pone.0302583.t001
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0302583.t001
Dataset updated
Jul 10, 2024
Dataset provided by
PLOS ONE
Authors
Akib Mohi Ud Din Khanday; Mudasir Ahmad Wani; Syed Tanzeel Rabani; Qamar Rayees Khan; Ahmed A. Abd El-Latif
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Social media platforms serve as communication tools where users freely share information regardless of its accuracy. Propaganda on these platforms refers to the dissemination of biased or deceptive information aimed at influencing public opinion, encompassing various forms such as political campaigns, fake news, and conspiracy theories. This study introduces a Hybrid Feature Engineering Approach for Propaganda Identification (HAPI), designed to detect propaganda in text-based content like news articles and social media posts. HAPI combines conventional feature engineering methods with machine learning techniques to achieve high accuracy in propaganda detection. This study is conducted on data collected from Twitter via its API, and an annotation scheme is proposed to categorize tweets into binary classes (propaganda and non-propaganda). Hybrid feature engineering entails the amalgamation of various features, including Term Frequency-Inverse Document Frequency (TF-IDF), Bag of Words (BoW), Sentimental features, and tweet length, among others. Multiple Machine Learning classifiers undergo training and evaluation utilizing the proposed methodology, leveraging a selection of 40 pertinent features identified through the hybrid feature selection technique. All the selected algorithms including Multinomial Naive Bayes (MNB), Support Vector Machine (SVM), Decision Tree (DT), and Logistic Regression (LR) achieved promising results. The SVM-based HaPi (SVM-HaPi) exhibits superior performance among traditional algorithms, achieving precision, recall, F-Measure, and overall accuracy of 0.69, 0.69, 0.69, and 69.2%, respectively. Furthermore, the proposed approach is compared to well-known existing approaches where it overperformed most of the studies on several evaluation metrics. This research contributes to the development of a comprehensive system tailored for propaganda identification in textual content. Nonetheless, the purview of propaganda detection transcends textual data alone. Deep learning algorithms like Artificial Neural Networks (ANN) offer the capability to manage multimodal data, incorporating text, images, audio, and video, thereby considering not only the content itself but also its presentation and contextual nuances during dissemination.
D
DummyNVSSMBEEAnnual
data.cdc.gov
application/rdfxml +5
Updated Jan 16, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
National Center for Health Statistics (2025). DummyNVSSMBEEAnnual [Dataset]. https://data.cdc.gov/NCHS/DummyNVSSMBEEAnnual/cjrp-m89a
Explore at:
tsv, json, application/rdfxml, application/rssxml, csv, xmlAvailable download formats
Dataset updated
Jan 16, 2025
Dataset authored and provided by
National Center for Health Statisticshttps://www.cdc.gov/nchs/
Description
This is a fake data set used for testing various data visualization tools and displays.
H
Synthetic (fake) youth mental health datasets and data dictionaries
dataverse.harvard.edu
search.dataone.org
Updated Feb 8, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Matthew P Hamilton (2024). Synthetic (fake) youth mental health datasets and data dictionaries [Dataset]. http://doi.org/10.7910/DVN/HJXYKQ
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.7910/DVN/HJXYKQ
Dataset updated
Feb 8, 2024
Dataset provided by
Harvard Dataverse
Authors
Matthew P Hamilton
License
https://dataverse.harvard.edu/api/datasets/:persistentId/versions/5.0/customlicense?persistentId=doi:10.7910/DVN/HJXYKQhttps://dataverse.harvard.edu/api/datasets/:persistentId/versions/5.0/customlicense?persistentId=doi:10.7910/DVN/HJXYKQ
Description
The datasets in this collection are entirely fake. They were developed principally to demonstrate the workings of a number of utility scoring and mapping algorithms. However, they may be of more general use to others. In some limited cases, some of the included files could be used in exploratory simulation based analyses. However, you should read the metadata descriptors for each file to inform yourself of the validity and limitations of each fake dataset. To open the RDS format files included in this dataset, the R package ready4use needs to be installed (see https://ready4-dev.github.io/ready4use/ ). It is also recommended that you install the youthvars package ( https://ready4-dev.github.io/youthvars/) as this provides useful tools for inspecting and validating each dataset.
F
Fake Email Address Generator Report
datainsightsmarket.com
doc, pdf, ppt
Updated Feb 12, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Data Insights Market (2025). Fake Email Address Generator Report [Dataset]. https://www.datainsightsmarket.com/reports/fake-email-address-generator-1405019
Explore at:
doc, pdf, pptAvailable download formats
Dataset updated
Feb 12, 2025
Dataset authored and provided by
Data Insights Market
License
https://www.datainsightsmarket.com/privacy-policyhttps://www.datainsightsmarket.com/privacy-policy
Time period covered
2025 - 2033
Area covered
Global
Variables measured
Market Size
Description
Fake Email Address Generator Market Analysis The global market for Fake Email Address Generators is expected to reach a value of XXX million by 2033, growing at a CAGR of XX% from 2025 to 2033. Key drivers of this growth include the increasing demand for privacy and anonymity online, the growing prevalence of spam and phishing attacks, and the proliferation of digital marketing campaigns. Additionally, the adoption of cloud-based solutions and the emergence of new technologies, such as artificial intelligence (AI), are further fueling market expansion. Key trends in the Fake Email Address Generator market include the growing popularity of enterprise-grade solutions, the emergence of disposable email services, and the increasing integration with other online tools. Restraints to market growth include concerns over security and data protection, as well as the availability of free or low-cost alternatives. The market is dominated by a few major players, including Burnermail, TrashMail, and Guerrilla Mail, but a growing number of smaller vendors are emerging with innovative solutions. Geographically, North America and Europe are the largest markets, followed by the Asia Pacific region.
f
Data Sheet 1_OLTW-TEC: online learning with sliding windows for text...
figshare.com
docx
Updated Sep 11, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Khrystyna Lipianina-Honcharenko; Yevgeniy Bodyanskiy; Nataliia Kustra; Andrii Ivasechkо (2024). Data Sheet 1_OLTW-TEC: online learning with sliding windows for text classifier ensembles.docx [Dataset]. http://doi.org/10.3389/frai.2024.1401126.s001
Explore at:
docxAvailable download formats
Unique identifier
https://doi.org/10.3389/frai.2024.1401126.s001
Dataset updated
Sep 11, 2024
Dataset provided by
Frontiers
Authors
Khrystyna Lipianina-Honcharenko; Yevgeniy Bodyanskiy; Nataliia Kustra; Andrii Ivasechkо
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
In the digital age, rapid dissemination of information has elevated the challenge of distinguishing between authentic news and disinformation. This challenge is particularly acute in regions experiencing geopolitical tensions, where information plays a pivotal role in shaping public perception and policy. The prevalence of disinformation in the Ukrainian-language information space, intensified by the hybrid war with russia, necessitates the development of sophisticated tools for its detection and mitigation. Our study introduces the “Online Learning with Sliding Windows for Text Classifier Ensembles” (OLTW-TEC) method, designed to address this urgent need. This research aims to develop and validate an advanced machine learning method capable of dynamically adapting to evolving disinformation tactics. The focus is on creating a highly accurate, flexible, and efficient system for detecting disinformation in Ukrainian-language texts. The OLTW-TEC method leverages an ensemble of classifiers combined with a sliding window technique to continuously update the model with the most recent data, enhancing its adaptability and accuracy over time. A unique dataset comprising both authentic and fake news items was used to evaluate the method’s performance. Advanced metrics, including precision, recall, and F1-score, facilitated a comprehensive analysis of its effectiveness. The OLTW-TEC method demonstrated exceptional performance, achieving a classification accuracy of 93%. The integration of the sliding window technique with a classifier ensemble significantly contributed to the system’s ability to accurately identify disinformation, making it a robust tool in the ongoing battle against fake news in the Ukrainian context. The application of the OLTW-TEC method highlights its potential as a versatile and effective solution for disinformation detection. Its adaptability to the specifics of the Ukrainian language and the dynamic nature of information warfare offers valuable insights into the development of similar tools for other languages and regions. OLTW-TEC represents a significant advancement in the detection of disinformation within the Ukrainian-language information space. Its development and successful implementation underscore the importance of innovative machine learning techniques in combating fake news, paving the way for further research and application in the field of digital information integrity.
Data from: Supersharers of fake news on Twitter
data.niaid.nih.gov
search.dataone.org
+2more
zip
Updated May 24, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sahar Baribi-Bartov; Briony Swire-Thompson; Nir Grinberg (2024). Supersharers of fake news on Twitter [Dataset]. http://doi.org/10.5061/dryad.44j0zpcmq
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5061/dryad.44j0zpcmq
Dataset updated
May 24, 2024
Dataset provided by
Ben-Gurion University of the Negev
Northeastern University
Authors
Sahar Baribi-Bartov; Briony Swire-Thompson; Nir Grinberg
License
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Description
Governments may have the capacity to flood social media with fake news, but little is known about the use of flooding by ordinary voters. In this work, we identify 2107 registered US voters that account for 80% of the fake news shared on Twitter during the 2020 US presidential election by an entire panel of 664,391 voters. We find that supersharers are important members of the network, reaching a sizable 5.2% of registered voters on the platform. Supersharers have a significant overrepresentation of women, older adults, and registered Republicans. Supersharers' massive volume does not seem automated but is rather generated through manual and persistent retweeting. These findings highlight a vulnerability of social media for democracy, where a small group of people distort the political reality for many. Methods This dataset contains aggregated information necessary to replicate the results reported in our work on Supersharers of Fake News on Twitter while respecting and preserving the privacy expectations of individuals included in the analysis. No individual-level data is provided as part of this dataset. The data collection process that enabled the creation of this dataset leveraged a large-scale panel of registered U.S. voters matched to Twitter accounts. We examined the activity of 664,391 panel members who were active on Twitter during the months of the 2020 U.S. presidential election (August to November 2020, inclusive), and identified a subset of 2,107 supersharers, which are the most prolific sharers of fake news in the panel that together account for 80% of fake news content shared on the platform. We rely on a source-level definition of fake news, that uses the manually-labeled list of fake news sites by Grinberg et al. 2019 and an updated list based on NewsGuard ratings (commercially available, but not provided as part of this dataset), although the results were robust to different operationalizations of fake news sources. We restrict the analysis to tweets with external links that were identified as political by a machine learning classifier that we trained and validated against human coders, similar to the approach used in prior work. We address our research questions by contrasting supersharers with three reference groups: people who are the most prolific sharers of non-fake political tweets (supersharers non-fake group; SS-NF), a group of average fake news sharers, and a random sample of panel members. In particular, we identify the distinct sociodemographic characteristics of supersharers using a series of multilevel regressions, examine their use of Twitter through existing tools and additional statistical analysis, and study supersharers' reach by examining the consumption patterns of voters that follow supersharers.
m
Dataset of development of business during the COVID-19 crisis
data.mendeley.com
narcis.nl
Updated Nov 9, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Tatiana N. Litvinova (2020). Dataset of development of business during the COVID-19 crisis [Dataset]. http://doi.org/10.17632/9vvrd34f8t.1
Explore at:
Unique identifier
https://doi.org/10.17632/9vvrd34f8t.1
Dataset updated
Nov 9, 2020
Authors
Tatiana N. Litvinova
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
To create the dataset, the top 10 countries leading in the incidence of COVID-19 in the world were selected as of October 22, 2020 (on the eve of the second full of pandemics), which are presented in the Global 500 ranking for 2020: USA, India, Brazil, Russia, Spain, France and Mexico. For each of these countries, no more than 10 of the largest transnational corporations included in the Global 500 rating for 2020 and 2019 were selected separately. The arithmetic averages were calculated and the change (increase) in indicators such as profitability and profitability of enterprises, their ranking position (competitiveness), asset value and number of employees. The arithmetic mean values of these indicators for all countries of the sample were found, characterizing the situation in international entrepreneurship as a whole in the context of the COVID-19 crisis in 2020 on the eve of the second wave of the pandemic. The data is collected in a general Microsoft Excel table. Dataset is a unique database that combines COVID-19 statistics and entrepreneurship statistics. The dataset is flexible data that can be supplemented with data from other countries and newer statistics on the COVID-19 pandemic. Due to the fact that the data in the dataset are not ready-made numbers, but formulas, when adding and / or changing the values in the original table at the beginning of the dataset, most of the subsequent tables will be automatically recalculated and the graphs will be updated. This allows the dataset to be used not just as an array of data, but as an analytical tool for automating scientific research on the impact of the COVID-19 pandemic and crisis on international entrepreneurship. The dataset includes not only tabular data, but also charts that provide data visualization. The dataset contains not only actual, but also forecast data on morbidity and mortality from COVID-19 for the period of the second wave of the pandemic in 2020. The forecasts are presented in the form of a normal distribution of predicted values and the probability of their occurrence in practice. This allows for a broad scenario analysis of the impact of the COVID-19 pandemic and crisis on international entrepreneurship, substituting various predicted morbidity and mortality rates in risk assessment tables and obtaining automatically calculated consequences (changes) on the characteristics of international entrepreneurship. It is also possible to substitute the actual values identified in the process and following the results of the second wave of the pandemic to check the reliability of pre-made forecasts and conduct a plan-fact analysis. The dataset contains not only the numerical values of the initial and predicted values of the set of studied indicators, but also their qualitative interpretation, reflecting the presence and level of risks of a pandemic and COVID-19 crisis for international entrepreneurship.
Google Analytics Sample
kaggle.com
zip
Updated Sep 19, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Google BigQuery (2019). Google Analytics Sample [Dataset]. https://www.kaggle.com/datasets/bigquery/google-analytics-sample
Explore at:
zip(0 bytes)Available download formats
Dataset updated
Sep 19, 2019
Dataset provided by
Googlehttp://google.com/
BigQueryhttps://cloud.google.com/bigquery
Authors
Google BigQuery
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Context

The Google Merchandise Store sells Google branded merchandise. The data is typical of what you would see for an ecommerce website.

Content

The sample dataset contains Google Analytics 360 data from the Google Merchandise Store, a real ecommerce store. The Google Merchandise Store sells Google branded merchandise. The data is typical of what you would see for an ecommerce website. It includes the following kinds of information:

Traffic source data: information about where website visitors originate. This includes data about organic traffic, paid search traffic, display traffic, etc. Content data: information about the behavior of users on the site. This includes the URLs of pages that visitors look at, how they interact with content, etc. Transactional data: information about the transactions that occur on the Google Merchandise Store website.

Fork this kernel to get started.

Acknowledgements

Data from: https://bigquery.cloud.google.com/table/bigquery-public-data:google_analytics_sample.ga_sessions_20170801

Banner Photo by Edho Pratama from Unsplash.

Inspiration

What is the total number of transactions generated per device browser in July 2017?

The real bounce rate is defined as the percentage of visits with a single pageview. What was the real bounce rate per traffic source?

What was the average number of product pageviews for users who made a purchase in July 2017?

What was the average number of product pageviews for users who did not make a purchase in July 2017?

What was the average total transactions per user that made a purchase in July 2017?

What is the average amount of money spent per session in July 2017?

What is the sequence of pages viewed?
Data Masking Market - Size & Share
mordorintelligence.com
pdf,excel,csv,ppt
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mordor Intelligence, Data Masking Market - Size & Share [Dataset]. https://www.mordorintelligence.com/industry-reports/data-masking-market
Explore at:
pdf,excel,csv,pptAvailable download formats
Dataset authored and provided by
Mordor Intelligence
License
https://www.mordorintelligence.com/privacy-policyhttps://www.mordorintelligence.com/privacy-policy
Time period covered
2019 - 2030
Area covered
Global
Description
The report covers Global Data Masking Tools and Technology Market and it is segmented by Type (Static, Dynamic), Deployment (Cloud, On-premise), End User Industry (BFSI, Healthcare, IT and Telecom, Retail, Government and Defense, Manufacturing, and Media and Entertainment), and by Geography. The market size and forecasts are provided in terms of value (USD million) for all the above segments.
H
TTU (Transfer to Utility) R package - EQ-5D vignette output
dataverse.harvard.edu
Updated May 24, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Matthew Hamilton (2022). TTU (Transfer to Utility) R package - EQ-5D vignette output [Dataset]. http://doi.org/10.7910/DVN/612HDC
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.7910/DVN/612HDC
Dataset updated
May 24, 2022
Dataset provided by
Harvard Dataverse
Authors
Matthew Hamilton
License
https://dataverse.harvard.edu/api/datasets/:persistentId/versions/16.2/customlicense?persistentId=doi:10.7910/DVN/612HDChttps://dataverse.harvard.edu/api/datasets/:persistentId/versions/16.2/customlicense?persistentId=doi:10.7910/DVN/612HDC
Description
This dataset is provided as a teaching aid. It is the output of tools from the TTU R package, applied to a synthetic dataset (Fake Data) of psychological distress and psychological wellbeing. It is not to be used to support decision-making.
c
Tweets used to explore the potential role of social media data in responding...
datacatalogue.cessda.eu
Updated Mar 14, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Edwards, P; Markovic, M; Petrunova, N; Chenghua, L; Corsar, D (2025). Tweets used to explore the potential role of social media data in responding to new and emerging forms of food fraud 2018 [Dataset]. http://doi.org/10.5255/UKDA-SN-853377
Explore at:
Unique identifier
https://doi.org/10.5255/UKDA-SN-853377
Dataset updated
Mar 14, 2025
Dataset provided by
University of Aberdeen
Authors
Edwards, P; Markovic, M; Petrunova, N; Chenghua, L; Corsar, D
Time period covered
May 16, 2018
Area covered
United Kingdom
Variables measured
Individual, Organization, Event/process, Geographic Unit
Measurement technique
The search for relevant data content was performed using a custom built data collection module within the Observatory platform (https://sites.google.com/view/foobs/the-observatory). A public API provided by Twitter was utilised to gather all social media messages (Tweets) matching a specific set of keywords. Each line in the food-keywords.txt file (group 1) and in the in the outrage-keywords.txt file (group 2) contains a search keyword/phrase. A list of search keywords was then created from all possible combinations of individual keywords/phrases form group 1 and group 2. A matching Tweet, returned by the search had to include at least one combination of such search keywords/phrases. Therefore, the search string used by the API was constructed as follows: (<keyword1 from group1> <keyword1 from group 2>) OR (<keyword1 from group1> <keyword2 from group 2>) OR ... *Note: the space between <> <> represents a logical AND in terms of the Twitter API service. The Twitter API allows historical searches to be restricted to Tweets associated with a specific location, however, this can be only specified as a specific radius from a given latitude and longitude geo-point. We used Twitter's geo-resticted search by defining a Lat/Long point and radius (in kilometres). In order to cover major areas in the UK we used the following four geo-restrictions: Latitude =57.334942 Longitude=-4.395858 Radius = 253 km; Latitude =55.288000 Longitude=-2.374374 Radius = 282 km; Latitude =52.250808 Longitude=-0.660507 Radius = 198 km; Latitude =51.953880 Longitude=-2.989608 Radius = 198 km.
Description
Data collected from Twitter social media platform (6 May 2018 - 16 May 2018) to explore the potential role of social media data in responding to new and emerging forms of food fraud reported on social media from posts originating in the UK. The dataset contains Tweet IDs and keywords used to search for Tweets using a programatic access via the public Twitter API. Keywords used in this search were generated using a machine learning tool and consisted of a combinations of keywords describing terms related to food and outrage.
Social media and other forms of online content have enormous potential as a way to understand people's opinions and attitudes, and as a means to observe emerging phenomena - such as disease outbreaks. How might policy makers use such new forms of data to better assess existing policies and help formulate new ones? This one year demonstrator project is a partnership between computer science academics at the University of Aberdeen and officers from Food Standards Scotland which aims to answer this question. Food Standards Scotland is the public-sector food body for Scotland created by the Food (Scotland) Act 2015. It regularly provides policy guidance to ministers in areas such as food hygiene monitoring and reporting, food-related health risks, and food fraud. The project will develop a software tool (the Food Sentiment Observatory) that will be used to explore the role of data from sources such as Twitter, Facebook, and TripAdvisor in three policy areas selected by Food Standards Scotland: - attitudes to the differing food hygiene information systems used in Scotland and the other UK nations; - study of an historical E.coli outbreak to understand effectiveness of monitoring and decision making protocols; - understanding the potential role of social media data in responding to new and emerging forms of food fraud. The Observatory will integrate a number of existing software tools (developed in our recent research) to allow us to mine large volumes of data to identify important textual signals, extract opinions held by individuals or groups, and crucially, to document these data processing operations - to aid transparency of policy decision-making. Given the amount of noise appearing in user-generated online content (such as fake restaurant reviews) it is our intention to investigate methods to extract meaningful and reliable knowledge, to better support policy making.
E
A Corpus of Online Drug Usage Guideline Documents Annotated with Type of...
live.european-language-grid.eu
data.niaid.nih.gov
tsv
Updated Sep 8, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2022). A Corpus of Online Drug Usage Guideline Documents Annotated with Type of Advice [Dataset]. https://live.european-language-grid.eu/catalogue/corpus/7399
Explore at:
tsvAvailable download formats
Dataset updated
Sep 8, 2022
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Introduction: The goal of this dataset is to aid NLP research on recognizing safety critical information from drug usage guideline or patient handout data. This dataset contains annotated advice statements from 90 online DUG documents that corresponds to 90 drugs or medications that are used in the prescriptions of patients suffering from one or more chronic diseases. The advice statements are annotated in eight safety-critical categories: activity or lifestyle related, disease or symptom related, drug administration related, exercise related, food or beverage related, other drug related, pregnancy related, and temporal.
Data Collection: The data was collected from MedScape. It is one of the most widely used reference for health care providers. At first, 34 real anonymized prescriptions of patients suffering from one or more chronic diseases are collected. These prescriptions contains 165 drugs that are used to treat chronic diseases. Then, MedScape was crawled to collect the drug user guideline (DUG) / patient handout for these 165 drugs. But, MedScape does not have DUG document for all drugs. We found DUG document for 90 drugs in MedScape.
Data Annotation tool: The data annotation tool is developed to ease the annotation process. It allows the user to select a DUG document and select a position from the document in terms of line number. It stores the user log from the annotator and loads the most recent position from the log when the application is launched. It supports annotating multiple files for the same drug, as often there are multiple overlapping sources of drug usage guidelines for a single drug. Often DUG documents contain formatted text. This tool aids annotation of the formatted text as well. The annotation tool is also available upon request.
Annotated Data Description: The annotated data contains the annotation tag(s) of each advice extracted from the 90 online DUG documents. It also contains the phrases or topics in the advice statement that triggers the annotation tag, such as, activity, exercise, medication name, food or beverage name, disease name, pregnancy condition (gestational, postpartum). Sometimes disease names are not directly mentioned rather mentioned as a condition (e.g., stomach bleeding, alcohol abuse) or state of a parameter (e.g., low blood sugar, low blood pressure). The annotated data is formatted as following:
drug name, drug number, line number of the first sentence of the advice in the DUG document, advice Text, advice tag(s), medication, food, activity, exercise, and disease names mentioned in the advice.

Unannotated Data Description:
The unannotated data contains the raw DUG document for 90 drugs. It also contains the drug interaction information for the 165 drugs. The drug interaction information is categorized in 4 classes, contraindicated, serious, monitor closely, and minor. This information can be utilized to automatically detect potential interaction and effect of interaction among multiple drugs.
Citation: If you use this dataset in your work, please cite the following reference in any publication:
@inproceedings{preum2018DUG,
title={A Corpus of Drug Usage Guidelines Annotated with Type of Advice},
author={Sarah Masud Preum, Md. Rizwan Parvez, Kai-Wei Chang, and John A. Stankovic},
booktitle={ Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)},
publisher = {European Language Resources Association (ELRA)},
year={2018}
}
u
Data from: Data for Multiple Linear Regresion social media on Cost of...
portaldelainvestigacion.uma.es
Updated 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Castillo Esparcia, Antonio; Almansa Martinez, Ana; Gorostiza Cervino, Aritz; Castillo Esparcia, Antonio; Almansa Martinez, Ana; Gorostiza Cervino, Aritz (2023). Data for Multiple Linear Regresion social media on Cost of Representation of the European Groups of Interest [Dataset]. https://portaldelainvestigacion.uma.es/documentos/67a9c7ce19544708f8c731fe
Explore at:
Dataset updated
2023
Authors
Castillo Esparcia, Antonio; Almansa Martinez, Ana; Gorostiza Cervino, Aritz; Castillo Esparcia, Antonio; Almansa Martinez, Ana; Gorostiza Cervino, Aritz
Description
This dataset provides a critical nexus between social media engagement and the financial dimension of European interest groups' representation costs. Designed to facilitate multiple linear regression analysis, this dataset is a valuable tool for researchers, statisticians, and analysts seeking to unravel the intricate relationships between digital engagement and the financial commitments of these interest groups.

The dataset offers a robust collection of data points that enables the exploration of potential correlations, dependencies, and predictive insights. By delving into the varying levels of social media presence across different platforms and their potential influence on the cost of representation, researchers can gain a deeper understanding of the interplay between virtual engagement and real-world financial investment.

Accessible to the academic and research community, this dataset holds the promise of shedding light on the dynamic and evolving landscape of interest groups' communication strategies and their financial implications. With the potential to inform policy decisions and strategic planning, this dataset represents a stepping stone toward a more comprehensive understanding of the intricate web of relationships that shape the world of European interest groups. The variables included:

1. Twitter_link (Dummy):

1.1. Twitter_YES

1.2. Twitter_NO

2. Facebook_link (Dummy):

2.1. Facebook_YES

2.2. Facebook_NO

3. Instagram_link (Dummy):

3.1. Instagram_YES

3.2. Instagram_NO

4. Linkedin_link (Dummy):

4.1. Linkedin_YES

4.2. Linkedin_NO

5. Youtube_link (Dummy):

5.1. Youtube_YES

5.2. Youtube_NO

6. mean_cost (continus): The mean of the range of estimated cost of representation.
metabarcoding data for: Benchmark of bioinformatics tools for fast and...
zenodo.org
txt, zip
Updated Jun 4, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Laetitia Mathon; Laetitia Mathon (2022). metabarcoding data for: Benchmark of bioinformatics tools for fast and accurate species identification from environmental DNA metabarcoding [Dataset]. http://doi.org/10.5061/dryad.15dv41nx6
Explore at:
zip, txtAvailable download formats
Unique identifier
https://doi.org/10.5061/dryad.15dv41nx6
Dataset updated
Jun 4, 2022
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Laetitia Mathon; Laetitia Mathon
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
This dataset contains fish DNA sequences samples, simulated with Grinder, to build a mock community, as well as real fish eDNA metabarcoding data from the Mediterranean sea.

These data have been used to compare the efficiency of different bioinformatic tools in retrieving the species composition of real and simulated samples.
Online privacy attitudes in the UK Q3 2023
statista.com
Updated Oct 2, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Online privacy attitudes in the UK Q3 2023 [Dataset]. https://www.statista.com/statistics/1386118/online-adults-attitudes-data-privacy-uk/
Explore at:
Dataset updated
Oct 2, 2024
Dataset authored and provided by
Statistahttp://statista.com/
Area covered
United Kingdom
Description
As of the third quarter of 2023, more than 44 percent of online users in the United Kingdom (UK) declined cookies on websites at least some of the time. Another 38.4 percent worried about how companies might use their online data. Furthermore, around 28 percent reported using a tool to block advertisements on the internet at least some of the time.
d
Data from: Eclipse CDT code analysis and unit testing
datadryad.org
figshare.com
+1more
zip
Updated May 14, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Shaun C. D'Souza (2019). Eclipse CDT code analysis and unit testing [Dataset]. http://doi.org/10.5061/dryad.5q0g45h
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5061/dryad.5q0g45h
Dataset updated
May 14, 2019
Dataset provided by
Dryad
Authors
Shaun C. D'Souza
Time period covered
2019
Description
Eclipse CDT code analysis and unit testing
d
Tweets used to study reports of food fraud related to fish products 2018 -...
b2find.dkrz.de
Updated Oct 22, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2023). Tweets used to study reports of food fraud related to fish products 2018 - Dataset - B2FIND [Dataset]. https://b2find.dkrz.de/dataset/76dbb18f-8f5a-53db-95cf-d1a2354ddfde
Explore at:
Dataset updated
Oct 22, 2023
Description
Data collected from Twitter social media platform (8 June 2018 - 22 June 2018) to study reports of food fraud related to fish products on social media from posts originating in the UK. The dataset contains Tweet IDs and keywords used to search for Tweets using a programatic access via the public Twitter API. Keywords used in this search were generated using a machine learning tool and consisted of combinations of keywords describing terms related to fish and fake.Social media and other forms of online content have enormous potential as a way to understand people's opinions and attitudes, and as a means to observe emerging phenomena - such as disease outbreaks. How might policy makers use such new forms of data to better assess existing policies and help formulate new ones? This one year demonstrator project is a partnership between computer science academics at the University of Aberdeen and officers from Food Standards Scotland which aims to answer this question. Food Standards Scotland is the public-sector food body for Scotland created by the Food (Scotland) Act 2015. It regularly provides policy guidance to ministers in areas such as food hygiene monitoring and reporting, food-related health risks, and food fraud. The project will develop a software tool (the Food Sentiment Observatory) that will be used to explore the role of data from sources such as Twitter, Facebook, and TripAdvisor in three policy areas selected by Food Standards Scotland: - attitudes to the differing food hygiene information systems used in Scotland and the other UK nations; - study of an historical E.coli outbreak to understand effectiveness of monitoring and decision making protocols; - understanding the potential role of social media data in responding to new and emerging forms of food fraud. The Observatory will integrate a number of existing software tools (developed in our recent research) to allow us to mine large volumes of data to identify important textual signals, extract opinions held by individuals or groups, and crucially, to document these data processing operations - to aid transparency of policy decision-making. Given the amount of noise appearing in user-generated online content (such as fake restaurant reviews) it is our intention to investigate methods to extract meaningful and reliable knowledge, to better support policy making. The search for relevant data content was performed using a custom built data collection module within the Observatory platform (see Related Resources). A public API provided by Twitter was utilised to gather all social media messages (Tweets) matching a specific set of keywords. Each line in the fish-keywords.txt file (group 1) and in the fake-keywords.txt file (group 2) contains a search keyword/phrase. A list of search keywords was then created from all possible combinations of individual keywords/phrases form group 1 and group 2. A matching Tweet, returned by the search had to include at least one combination of such search keywords/phrases. Therefore, the search string used by the API was constructed as follows: ( ) OR ( ) OR ... *Note: the space between represents a logical AND in terms of the Twitter API service. The Twitter API allows historical searches to be restricted to Tweets associated with a specific location, however, this can be only specified as a specific radius from a given latitude and longitude geo-point. We used Twitter's geo-resticted search by defining a Lat/Long point and radius (in kilometres). In order to cover major areas in the UK we used the following four geo-restrictions: Latitude =57.334942 Longitude=-4.395858 Radius = 253 km; Latitude =55.288000 Longitude=-2.374374 Radius = 282 km; Latitude =52.250808 Longitude=-0.660507 Radius = 198 km; Latitude =51.953880 Longitude=-2.989608 Radius = 198 km.
Z
TIMIT-TTS: a Text-to-Speech Dataset for Synthetic Speech Detection
data.niaid.nih.gov
zenodo.org
Updated Sep 21, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Brian Hosler (2022). TIMIT-TTS: a Text-to-Speech Dataset for Synthetic Speech Detection [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_6560158
Explore at:
Dataset updated
Sep 21, 2022
Dataset provided by
Paolo Bestagini
Davide Salvi
Brian Hosler
Stefano Tubaro
Matthew C. Stamm
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
With the rapid development of deep learning techniques, the generation and counterfeiting of multimedia material are becoming increasingly straightforward to perform. At the same time, sharing fake content on the web has become so simple that malicious users can create unpleasant situations with minimal effort. Also, forged media are getting more and more complex, with manipulated videos (e.g., deepfakes where both the visual and audio contents can be counterfeited) that are taking the scene over still images. The multimedia forensic community has addressed the possible threats that this situation could imply by developing detectors that verify the authenticity of multimedia objects. However, the vast majority of these tools only analyze one modality at a time. This was not a problem as long as still images were considered the most widely edited media, but now, since manipulated videos are becoming customary, performing monomodal analyses could be reductive. Nonetheless, there is a lack in the literature regarding multimodal detectors (systems that consider both audio and video components). This is due to the difficulty of developing them but also to the scarsity of datasets containing forged multimodal data to train and test the designed algorithms.

In this paper we focus on the generation of an audio-visual deepfake dataset. First, we present a general pipeline for synthesizing speech deepfake content from a given real or fake video, facilitating the creation of counterfeit multimodal material. The proposed method uses Text-to-Speech (TTS) and Dynamic Time Warping (DTW) techniques to achieve realistic speech tracks. Then, we use the pipeline to generate and release TIMIT-TTS, a synthetic speech dataset containing the most cutting-edge methods in the TTS field. This can be used as a standalone audio dataset, or combined with DeepfakeTIMIT and VidTIMIT video datasets to perform multimodal research. Finally, we present numerous experiments to benchmark the proposed dataset in both monomodal (i.e., audio) and multimodal (i.e., audio and video) conditions. This highlights the need for multimodal forensic detectors and more multimodal deepfake data.

For the initial version of TIMIT-TTS v1.0

Arxiv: https://arxiv.org/abs/2209.08000

TIMIT-TTS Database v1.0: https://zenodo.org/record/6560159
Digital privacy concerns among adult internet users in Italy Q3 2023
statista.com
Updated Mar 5, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2024). Digital privacy concerns among adult internet users in Italy Q3 2023 [Dataset]. https://www.statista.com/statistics/1234940/italy-digital-privacy-concerns-and-actions/
Explore at:
Dataset updated
Mar 5, 2024
Dataset authored and provided by
Statistahttp://statista.com/
Area covered
Italy
Description
According to a survey of adult internet users in Italy, around 53.1 percent of respondents expressed concern about what is real or fake on the internet. Almost 54 percent stated that they declined cookies on websites, while 34 percent of respondents expressed concern about how companies might use their personal data, Furthermore, over 27 percent of respondents said they used a tool to block advertisements on the internet at least some of the time
Digital privacy concerns among adult internet users in Mexico 2023
statista.com
Updated Mar 13, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Digital privacy concerns among adult internet users in Mexico 2023 [Dataset]. https://www.statista.com/statistics/1310175/mexico-digital-privacy-concers-and-actions/
Explore at:
Dataset updated
Mar 13, 2024
Dataset authored and provided by
Statistahttp://statista.com/
Area covered
Mexico
Description
As of the third quarter of 2023, a survey among adult internet users in Mexico found that about 57.8 percent of the respondents expressed concern about what was real or fake on the internet. Around 42.1 of Mexican online users felt doubt about how companies used their data. Regarding possible measures, 34.5 percent of them stated that they declined or deleted cookies from their browsers, and 22 percent used tools to block advertisements at least some of the time. Meanwhile, 19.6 percent of the respondents used virtual private networks when accessing the internet at least sometimes.

Facebook

Twitter

Click to copy link

Link copied

Cite

Akib Mohi Ud Din Khanday; Mudasir Ahmad Wani; Syed Tanzeel Rabani; Qamar Rayees Khan; Ahmed A. Abd El-Latif (2024). Recent work related to propaganda. [Dataset]. http://doi.org/10.1371/journal.pone.0302583.t001

Recent work related to propaganda.

Explore at:

xlsAvailable download formats

Unique identifier

https://doi.org/10.1371/journal.pone.0302583.t001

Dataset updated

Jul 10, 2024

Dataset provided by

PLOS ONE

Authors

Akib Mohi Ud Din Khanday; Mudasir Ahmad Wani; Syed Tanzeel Rabani; Qamar Rayees Khan; Ahmed A. Abd El-Latif

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Social media platforms serve as communication tools where users freely share information regardless of its accuracy. Propaganda on these platforms refers to the dissemination of biased or deceptive information aimed at influencing public opinion, encompassing various forms such as political campaigns, fake news, and conspiracy theories. This study introduces a Hybrid Feature Engineering Approach for Propaganda Identification (HAPI), designed to detect propaganda in text-based content like news articles and social media posts. HAPI combines conventional feature engineering methods with machine learning techniques to achieve high accuracy in propaganda detection. This study is conducted on data collected from Twitter via its API, and an annotation scheme is proposed to categorize tweets into binary classes (propaganda and non-propaganda). Hybrid feature engineering entails the amalgamation of various features, including Term Frequency-Inverse Document Frequency (TF-IDF), Bag of Words (BoW), Sentimental features, and tweet length, among others. Multiple Machine Learning classifiers undergo training and evaluation utilizing the proposed methodology, leveraging a selection of 40 pertinent features identified through the hybrid feature selection technique. All the selected algorithms including Multinomial Naive Bayes (MNB), Support Vector Machine (SVM), Decision Tree (DT), and Logistic Regression (LR) achieved promising results. The SVM-based HaPi (SVM-HaPi) exhibits superior performance among traditional algorithms, achieving precision, recall, F-Measure, and overall accuracy of 0.69, 0.69, 0.69, and 69.2%, respectively. Furthermore, the proposed approach is compared to well-known existing approaches where it overperformed most of the studies on several evaluation metrics. This research contributes to the development of a comprehensive system tailored for propaganda identification in textual content. Nonetheless, the purview of propaganda detection transcends textual data alone. Deep learning algorithms like Artificial Neural Networks (ANN) offer the capability to manage multimodal data, incorporating text, images, audio, and video, thereby considering not only the content itself but also its presentation and contextual nuances during dissemination.

Clear search

Close search

Google apps

Main menu

Recent work related to propaganda.

DummyNVSSMBEEAnnual

Synthetic (fake) youth mental health datasets and data dictionaries

Fake Email Address Generator Report

Data Sheet 1_OLTW-TEC: online learning with sliding windows for text...

Data from: Supersharers of fake news on Twitter

Dataset of development of business during the COVID-19 crisis

Google Analytics Sample

Context

Content

Acknowledgements

Inspiration

Data Masking Market - Size & Share

TTU (Transfer to Utility) R package - EQ-5D vignette output

Tweets used to explore the potential role of social media data in responding...

A Corpus of Online Drug Usage Guideline Documents Annotated with Type of...

Data from: Data for Multiple Linear Regresion social media on Cost of...

metabarcoding data for: Benchmark of bioinformatics tools for fast and...

Online privacy attitudes in the UK Q3 2023

Data from: Eclipse CDT code analysis and unit testing

Tweets used to study reports of food fraud related to fish products 2018 -...

TIMIT-TTS: a Text-to-Speech Dataset for Synthetic Speech Detection

Digital privacy concerns among adult internet users in Italy Q3 2023

Digital privacy concerns among adult internet users in Mexico 2023

Recent work related to propaganda.See More Versions

Recent work related to propaganda.