82 datasets found
  1. Z

    NewsUnravel Dataset

    • data.niaid.nih.gov
    • zenodo.org
    Updated Jul 11, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    anon (2024). NewsUnravel Dataset [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_8344890
    Explore at:
    Dataset updated
    Jul 11, 2024
    Dataset authored and provided by
    anon
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    About the NUDA DatasetMedia bias is a multifaceted problem, leading to one-sided views and impacting decision-making. A way to address bias in news articles is to automatically detect and indicate it through machine-learning methods. However, such detection is limited due to the difficulty of obtaining reliable training data. To facilitate the data-gathering process, we introduce NewsUnravel, a news-reading web application leveraging an initially tested feedback mechanism to collect reader feedback on machine-generated bias highlights within news articles. Our approach augments dataset quality by significantly increasing inter-annotator agreement by 26.31% and improving classifier performance by 2.49%. As the first human-in-the-loop application for media bias, NewsUnravel shows that a user-centric approach to media bias data collection can return reliable data while being scalable and evaluated as easy to use. NewsUnravel demonstrates that feedback mechanisms are a promising strategy to reduce data collection expenses, fluidly adapt to changes in language, and enhance evaluators' diversity.

    General

    This dataset was created through user feedback on automatically generated bias highlights on news articles on the website NewsUnravel made by ANON. Its goal is to improve the detection of linguistic media bias for analysis and to indicate it to the public. Support came from ANON. None of the funders played any role in the dataset creation process or publication-related decisions.

    The dataset consists of text, namely biased sentences with binary bias labels (processed, biased or not biased) as well as metadata about the article. It includes all feedback that was given. The single ratings (unprocessed) used to create the labels with correlating User IDs are included.

    For training, this dataset was combined with the BABE dataset. All data is completely anonymous. Some sentences might be offensive or triggering as they were taken from biased or more extreme news sources. The dataset does not identify sub-populations or can be considered sensitive to them, nor is it possible to identify individuals.

    Description of the Data Files

    This repository contains the datasets for the anonymous NewsUnravel submission. The tables contain the following data:

    NUDAdataset.csv: the NUDA dataset with 310 new sentences with bias labelsStatistics.png: contains all Umami statistics for NewsUnravel's usage dataFeedback.csv: holds the participantID of a single feedback with the sentence ID (contentId), the bias rating, and provided reasonsContent.csv: holds the participant ID of a rating with the sentence ID (contentId) of a rated sentence and the bias rating, and reason, if givenArticle.csv: holds the article ID, title, source, article metadata, article topic, and bias amount in %Participant.csv: holds the participant IDs and data processing consent

    Collection Process

    Data was collected through interactions with the Feedback Mechanism on NewsUnravel. A news article was displayed with automatically generated bias highlights. Each highlight could be selected, and readers were able to agree or disagree with the automatic label. Through a majority vote, labels were generated from those feedback interactions. Spammers were excluded through a spam detection approach.

    Readers came to our website voluntarily through posts on LinkedIn and social media as well as posts on university boards. The data collection period lasted for one week, from March 4th to March 11th (2023). The landing page informed them about the goal and the data processing. After being informed, they could proceed to the article overview.

    So far, the dataset has been used on top of BABE to train a linguistic bias classifier, adopting hyperparameter configurations from BABE with a pre-trained model from Hugging Face.The dataset will be open source. On acceptance, a link with all details and contact information will be provided. No third parties are involved.

    The dataset will not be maintained as it captures the first test of NewsUnravel at a specific point in time. However, new datasets will arise from further iterations. Those will be linked in the repository. Please cite the NewsUnravel paper if you use the dataset and contact us if you're interested in more information or joining the project.

  2. Unveiling Engagement and Platform Algorithmic Biases in Social Media Data...

    • osf.io
    Updated Feb 16, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Wei Zhong (2025). Unveiling Engagement and Platform Algorithmic Biases in Social Media Data Collection and Analysis: An Experimental Study [Dataset]. https://osf.io/ys7a8
    Explore at:
    Dataset updated
    Feb 16, 2025
    Dataset provided by
    Center for Open Sciencehttps://cos.io/
    Authors
    Wei Zhong
    Description

    No description was included in this Dataset collected from the OSF

  3. NewsUnravel Dataset

    • zenodo.org
    csv
    Updated Sep 14, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    anonymous; anonymous (2023). NewsUnravel Dataset [Dataset]. http://doi.org/10.5281/zenodo.8344882
    Explore at:
    csvAvailable download formats
    Dataset updated
    Sep 14, 2023
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    anonymous; anonymous
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    About the Dataset
    Media bias is a multifaceted problem, leading to one-sided views and impacting decision-making. A way to address bias in news articles is to automatically detect and indicate it through machine-learning methods. However, such detection is limited due to the difficulty of obtaining reliable training data. To facilitate the data-gathering process, we introduce NewsUnravel, a news-reading web application leveraging an initially tested feedback mechanism to collect reader feedback on machine-generated bias highlights within news articles. Our approach augments dataset quality by significantly increasing inter-annotator agreement by 26.31% and improving classifier performance by 2.49%. As the first human-in-the-loop application for media bias, NewsUnravel shows that a user-centric approach to media bias data collection can return reliable data while being scalable and evaluated as easy to use. NewsUnravel demonstrates that feedback mechanisms are a promising strategy to reduce data collection expenses, fluidly adapt to changes in language, and enhance evaluators' diversity.

    Description of the data files
    This repository contains the datasets for the anonymous NewsUnravel submission. The tables contain following data:

    NUDAdataset.csv: the NUDA dataset with 310 new sentences with bias labels
    Statistics.png: contains all Umami statistics for NewsUnravel's usage data
    Feedback.csv: holds the participantID of a single feedback with the sentence ID (contentId), the bias rating, and provided reasons
    Content.csv: holds the participant ID of a rating with the sentence ID (contentId) of a rated sentences and the bias rating, and reason, if given
    Article.csv: holds the article ID, title, source, article meta data, article topic, and bias amount in %
    Participant.csv: holds the participant IDs and data processing consent

  4. d

    Replication Data for: Reducing Political Bias in Political Science Estimates...

    • search.dataone.org
    • dataverse.harvard.edu
    Updated Nov 21, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Zigerell, Lawrence (2023). Replication Data for: Reducing Political Bias in Political Science Estimates [Dataset]. http://doi.org/10.7910/DVN/PZLCJM
    Explore at:
    Dataset updated
    Nov 21, 2023
    Dataset provided by
    Harvard Dataverse
    Authors
    Zigerell, Lawrence
    Description

    Political science researchers have flexibility in how to analyze data, how to report data, and whether to report on data. Review of examples of reporting flexibility from the race and sex discrimination literature illustrates how research design choices can influence estimates and inferences. This reporting flexibility—coupled with the political imbalance among political scientists—creates the potential for political bias in reported political science estimates, but this potential for political bias can be reduced or eliminated through preregistration and preacceptance, in which researchers commit to a research design before completing data collection. Removing the potential for reporting flexibility can raise the credibility of political science research.

  5. News Ninja Dataset

    • zenodo.org
    • data.niaid.nih.gov
    bin, csv
    Updated Feb 20, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    anon; anon (2024). News Ninja Dataset [Dataset]. http://doi.org/10.5281/zenodo.10683029
    Explore at:
    csv, binAvailable download formats
    Dataset updated
    Feb 20, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    anon; anon
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    About
    Recent research shows that visualizing linguistic media bias mitigates its negative effects. However, reliable automatic detection methods to generate such visualizations require costly, knowledge-intensive training data. To facilitate data collection for media bias datasets, we present News Ninja, a game employing data-collecting game mechanics to generate a crowdsourced dataset. Before annotating sentences, players are educated on media bias via a tutorial. Our findings show that datasets gathered with crowdsourced workers trained on News Ninja can reach significantly higher inter-annotator agreements than expert and crowdsourced datasets. As News Ninja encourages continuous play, it allows datasets to adapt to the reception and contextualization of news over time, presenting a promising strategy to reduce data collection expenses, educate players, and promote long-term bias mitigation.

    General
    This dataset was created through player annotations in the News Ninja Game made by ANON. Its goal is to improve the detection of linguistic media bias. Support came from ANON. None of the funders played any role in the dataset creation process or publication-related decisions.

    The dataset includes sentences with binary bias labels (processed, biased or not biased) as well as the annotations of single players used for the majority vote. It includes all game-collected data. All data is completely anonymous. The dataset does not identify sub-populations or can be considered sensitive to them, nor is it possible to identify individuals.

    Some sentences might be offensive or triggering as they were taken from biased or more extreme news sources. The dataset contains topics such as violence, abortion, and hate against specific races, genders, religions, or sexual orientations.

    Description of the Data Files
    This repository contains the datasets for the anonymous News Ninja submission. The tables contain the following data:

    ExportNewsNinja.csv: Contains 370 BABE sentences and 150 new sentences with their text (sentence), words labeled as biased (words), BABE ground truth (ground_Truth), and the sentence bias label from the player annotations (majority_vote). The first 370 sentences are re-annotated BABE sentences, and the following 150 sentences are new sentences.

    AnalysisNewsNinja.xlsx: Contains 370 BABE sentences and 150 new sentences. The first 370 sentences are re-annotated BABE sentences, and the following 150 sentences are new sentences. The table includes the full sentence (Sentence), the sentence bias label from player annotations (isBiased Game), the new expert label (isBiased Expert), if the game label and expert label match (Game VS Expert), if differing labels are a false positives or false negatives (false negative, false positive), the ground truth label from BABE (isBiasedBABE), if Expert and BABE labels match (Expert VS BABE), and if the game label and BABE label match (Game VS BABE). It also includes the analysis of the agreement between the three rater categories (Game, Expert, BABE).

    demographics.csv: Contains demographic information of News Ninja players, including gender, age, education, English proficiency, political orientation, news consumption, and consumed outlets.

    Collection Process
    Data was collected through interactions with the NewsNinja game. All participants went through a tutorial before annotating 2x10 BABE sentences and 2x10 new sentences. For this first test, players were recruited using Prolific. The game was hosted on a costume-built responsive website. The collection period was from 20.02.2023 to 28.02.2023. Before starting the game, players were informed about the goal and the data processing. After consenting, they could proceed to the tutorial.

    The dataset will be open source. A link with all details and contact information will be provided upon acceptance. No third parties are involved.

    The dataset will not be maintained as it captures the first test of NewsNinja at a specific point in time. However, new datasets will arise from further iterations. Those will be linked in the repository. Please cite the NewsNinja paper if you use the dataset and contact us if you're interested in more information or joining the project.

  6. d

    Data from: Questioning Bias: Validating a Bias Crime Assessment Tool in...

    • catalog.data.gov
    • icpsr.umich.edu
    • +1more
    Updated Mar 12, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    National Institute of Justice (2025). Questioning Bias: Validating a Bias Crime Assessment Tool in California and New Jersey, 2016-2017 [Dataset]. https://catalog.data.gov/dataset/questioning-bias-validating-a-bias-crime-assessment-tool-in-california-and-new-jersey-2016-a062f
    Explore at:
    Dataset updated
    Mar 12, 2025
    Dataset provided by
    National Institute of Justice
    Area covered
    New Jersey, California
    Description

    These data are part of NACJD's Fast Track Release and are distributed as they were received from the data depositor. The files have been zipped by NACJD for release, but not checked or processed except for the removal of direct identifiers. Users should refer to the accompanying readme file for a brief description of the files available with this collection and consult the investigator(s) if further information is needed. This study investigates experiences surrounding hate and bias crimes and incidents and reasons and factors affecting reporting and under-reporting among youth and adults in LGBT, immigrant, Hispanic, Black, and Muslim communities in New Jersey and Los Angeles County, California. The collection includes 1 SPSS data file (QB_FinalDataset-Revised.sav (n=1,326; 513 variables)). The collection also contains 24 qualitative data files of transcripts from focus groups and interviews with key informants, which are not included in this release.

  7. c

    Biased cognition in East Asian and Western Cultures: Behavioural data...

    • datacatalogue.cessda.eu
    Updated Mar 25, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yiend, J (2025). Biased cognition in East Asian and Western Cultures: Behavioural data 2016-2018 [Dataset]. http://doi.org/10.5255/UKDA-SN-853644
    Explore at:
    Dataset updated
    Mar 25, 2025
    Dataset provided by
    King
    Authors
    Yiend, J
    Time period covered
    Nov 30, 2013 - Aug 29, 2017
    Area covered
    Hong Kong, United Kingdom
    Variables measured
    Individual
    Measurement technique
    Participants: Local Hong Kong and UK natives; short term and long term migrants in each country, aged 16-65 with no current major physical illness or psychological disorder, who were not receiving psychological therapy or medication for psychological conditions.Sampling procedure: Participants were recruited using circular emails, public flyers and other advertisements in local venues, universities and clubs. Data collection: Participants completed four previously developed and validated cognitive bias tasks (emotional Stroop, attention probe, similarity ratings task and scrambled sentence task) in their native language. They also completed socio-demographic information and questionnaires.
    Description

    This data collection consists of behavioural task data for measures of attention and interpretation bias, specifically: emotional Stroop, attention probe (both measuring attention bias) and similarity ratings task and scrambled sentence task (both measuring interpretation bias). Data on the following 6 participant groups are included in the dataset: native UK (n=36), native HK (n=39), UK migrants to HK (short term = 31, long term = 28) and HK migrants to UK (short term = 37, long term = 31). Also included are personal characteristics and questionnaire measures.

    The way in which we process information in the world around us has a significant effect on our health and well being. For example, some people are more prone than others to notice potential dangers, to remember bad things from the past and assume the worst, when the meaning of an event or comment is uncertain. These tendencies are called negative cognitive biases and can lead to low mood and poor quality of life. They also make people vulnerable to mental illnesses. In contrast, those with positive cognitive biases tend to function well and remain healthy. To date most of this work has been conducted on white, western populations and we do not know whether similar cognitive biases exist in Eastern cultures. This project will examine cognitive biases in Eastern (Hong Kong nationals ) and Western (UK nationals) people to see whether there are any differences between the two. It will also examine what happens to cognitive biases when someone migrates to a different culture. This will tell us whether influences from the society and culture around us have any effect on our cognitive biases. Finally the project will consider how much our own cognitive biases are inherited from our parents. Together these results will tell us whether the known good and bad effects of cognitive biases apply to non Western cultural groups as well, and how much cognitive biases are decided by our genes or our environment.

  8. d

    Data from: Relative bee abundance varies by collection method and flowering...

    • datadryad.org
    • data.niaid.nih.gov
    • +1more
    zip
    Updated Apr 28, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Philip Hahn; Marirose Kuhlman; Skylar Burrows; Dan Mummey; Philip Ramsey (2021). Relative bee abundance varies by collection method and flowering richness: implications for understanding patterns in bee community data [Dataset]. http://doi.org/10.5061/dryad.2z34tmpmd
    Explore at:
    zipAvailable download formats
    Dataset updated
    Apr 28, 2021
    Dataset provided by
    Dryad
    Authors
    Philip Hahn; Marirose Kuhlman; Skylar Burrows; Dan Mummey; Philip Ramsey
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Time period covered
    2021
    Description

    See metadata file for additional details.

  9. d

    Data from: Mapping species richness using opportunistic samples: a case...

    • datadryad.org
    • data.niaid.nih.gov
    • +1more
    zip
    Updated Dec 6, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Thomas Neyens; Peter Diggle; Christel Faes; Natalie Beenaerts; Tom Artois; Emanuele Giorgi (2019). Mapping species richness using opportunistic samples: a case study on ground-floor bryophyte species richness in the Belgian province of Limburg [Dataset]. http://doi.org/10.5061/dryad.brv15dv5r
    Explore at:
    zipAvailable download formats
    Dataset updated
    Dec 6, 2019
    Dataset provided by
    Dryad
    Authors
    Thomas Neyens; Peter Diggle; Christel Faes; Natalie Beenaerts; Tom Artois; Emanuele Giorgi
    Time period covered
    2019
    Area covered
    Belgium
    Description

    In species richness studies, citizen-science surveys where participants make individual decisions regarding sampling strategies provide a cost-effective approach to collect a large amount of data. However, it is unclear to what extent the bias inherent to opportunistically collected samples may invalidate our inferences. Here, we compare spatial predictions of forest ground-floor bryophyte species richness in Limburg (Belgium), based on crowd- and expert-sourced data, where the latter are collected by adhering to a rigorous geographical randomisation and data collection protocol. We develop a log-Gaussian Cox process model to analyse the opportunistic sampling process of the crowd-sourced data and assess its sampling bias. We then fit two geostatistical Poisson models to both data-sets and compare the parameter estimates and species richness predictions. We find that the citizens had a higher propensity for locations that were close to their homes and environmentally more valuable. The ...

  10. f

    Biases and mitigation strategies in Classical and Digital Epidemiology.

    • plos.figshare.com
    xls
    Updated Jan 13, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sara Mesquita; Lília Perfeito; Daniela Paolotti; Joana Gonçalves-Sá (2025). Biases and mitigation strategies in Classical and Digital Epidemiology. [Dataset]. http://doi.org/10.1371/journal.pdig.0000670.t001
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jan 13, 2025
    Dataset provided by
    PLOS Digital Health
    Authors
    Sara Mesquita; Lília Perfeito; Daniela Paolotti; Joana Gonçalves-Sá
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Biases and mitigation strategies in Classical and Digital Epidemiology.

  11. Z

    Data from: Approach-induced biases in human information sampling

    • data.niaid.nih.gov
    • zenodo.org
    • +1more
    Updated Jul 19, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dolan, Raymond J. (2024). Data from: Approach-induced biases in human information sampling [Dataset]. https://data.niaid.nih.gov/resources?id=ZENODO_4946669
    Explore at:
    Dataset updated
    Jul 19, 2024
    Dataset provided by
    Malalasekera, W. M. Nishantha
    Dolan, Raymond J.
    Rutledge, Robb B.
    Hunt, Laurence T.
    Kennerley, Steven W.
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    IInformation sampling is often biased towards seeking evidence that confirms one's prior beliefs. Despite such biases being a pervasive feature of human behavior, their underlying causes remain unclear. Many accounts of these biases appeal to limitations of human hypothesis testing and cognition, de facto evoking notions of bounded rationality, but neglect more basic aspects of behavioral control. Here, we investigated a potential role for Pavlovian approach in biasing which information humans will choose to sample. We collected a large novel dataset from 32,445 human subjects, making over 3 million decisions, who played a gambling task designed to measure the latent causes and extent of information-sampling biases. We identified three novel approach-related biases, formalized by comparing subject behavior to a dynamic programming model of optimal information gathering. These biases reflected the amount of information sampled ("positive evidence approach"), the selection of which information to sample ("sampling the favorite"), and the interaction between information sampling and subsequent choices ("rejecting unsampled options"). The prevalence of all three biases was related to a Pavlovian approach-avoid parameter quantified within an entirely independent economic decision task. Our large dataset also revealed that individual differences in the amount of information gathered are a stable trait across multiple gameplays and can be related to demographic measures, including age and educational attainment. As well as revealing limitations in cognitive processing, our findings suggest information sampling biases reflect the expression of primitive, yet potentially ecologically adaptive, behavioral repertoires. One such behavior is sampling from options that will eventually be chosen, even when other sources of information are more pertinent for guiding future action.

  12. Z

    Data from: Sampling biases shape our view of the natural world

    • data.niaid.nih.gov
    • zenodo.org
    • +1more
    Updated Jun 4, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Qiao, Huijie (2022). Sampling biases shape our view of the natural world [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_4950129
    Explore at:
    Dataset updated
    Jun 4, 2022
    Dataset provided by
    Qiao, Huijie
    Hughes, Alice
    Orr, Michael
    Yang, Qinmin
    Zhu, Chaodong
    Ma, Keping
    Costello, Mark
    Waller, John
    Provoost, Pieter
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Spatial patterns of biodiversity are inextricably linked to their collection methods, yet no synthesis of bias patterns or their consequences exists. As such, views of organismal distribution and the ecosystems they make up may be incorrect, undermining countless ecological and evolutionary studies. Using 742 million records of 374,900 species, we explore the global patterns and impacts of biases related to taxonomy, accessibility, ecotype, and data type across terrestrial and marine systems. Pervasive sampling and observation biases exist across animals, with only 6.74% of the globe sampled, and disproportionately poor tropical sampling. High -elevations and deep -seas are particularly unknown. Over 50% of records in most groups account for under 2% of species, and citizen-science only exacerbates biases. Additional data will be needed to overcome many of these biases, but we must increasingly value data publication to bridge this gap and better represent species' distributions from more distant and inaccessible areas, and provide the necessary basis for conservation and management.

  13. f

    Logistic regressions for each selection variable.

    • plos.figshare.com
    xls
    Updated Jun 21, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Arthur A. Stone; Stefan Schneider; Joshua M. Smyth; Doerte U. Junghaenel; Cheng Wen; Mick P. Couper; Sarah Goldstein (2023). Logistic regressions for each selection variable. [Dataset]. http://doi.org/10.1371/journal.pone.0282591.t001
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 21, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Arthur A. Stone; Stefan Schneider; Joshua M. Smyth; Doerte U. Junghaenel; Cheng Wen; Mick P. Couper; Sarah Goldstein
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Although the potential for participant selection bias is readily acknowledged in the momentary data collection literature, very little is known about uptake rates in these studies or about differences in the people that participate versus those who do not. This study analyzed data from an existing Internet panel of older people (age 50 and greater) who were offered participation into a momentary study (n = 3,169), which made it possible to compute uptake and to compare many characteristics of participation status. Momentary studies present participants with brief surveys multiple times a day over several days; these surveys ask about immediate or recent experiences. A 29.1% uptake rate was observed when all respondents were considered, whereas a 39.2% uptake rate was found when individuals who did not have eligible smartphones (necessary for ambulatory data collection) were eliminated from the analyses. Taking into account the participation rate for being in this Internet panel, we estimate uptake rates for the general population to be about 5%. A consistent pattern of differences emerged between those who accepted the invitation to participate versus those who did not (in univariate analyses): participants were more likely to be female, younger, have higher income, have higher levels of education, rate their health as better, be employed, not be retired, not be disabled, have better self-rated computer skills, and to have participated in more prior Internet surveys (all p < .0026). Many variables were not associated with uptake including race, big five personality scores, and subjective well-being. For several of the predictors, the magnitude of the effects on uptake was substantial. These results indicate the possibility that, depending upon the associations being investigated, person selection bias could be present in momentary data collection studies.

  14. f

    Distribution of survey modes in analyzed dataset by the United Nations world...

    • plos.figshare.com
    xls
    Updated Jun 21, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Adam Rybak (2023). Distribution of survey modes in analyzed dataset by the United Nations world regions. [Dataset]. http://doi.org/10.1371/journal.pone.0283092.t001
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 21, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Adam Rybak
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    World, United Nations, United States
    Description

    Distribution of survey modes in analyzed dataset by the United Nations world regions.

  15. Z

    Data from: Distorted views of biodiversity: spatial and temporal bias in...

    • data.niaid.nih.gov
    Updated May 31, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Fuller, Richard A. (2022). Data from: Distorted views of biodiversity: spatial and temporal bias in species occurrence data [Dataset]. https://data.niaid.nih.gov/resources?id=ZENODO_4950232
    Explore at:
    Dataset updated
    May 31, 2022
    Dataset provided by
    Clark, Natalie E.
    Mace, Georgina M.
    Fuller, Richard A.
    Chang-qing, Ding
    McGowan, Philip J. K.
    O'Connor, Kim
    Boakes, Elizabeth H.
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Historical as well as current data on species distributions are needed to track changes in biodiversity. Species distribution data are found in a variety of sources but is likely that they include different biases towards certain time periods or places. By collating a large historical database of ~170,000 records of species in the avian order Galliformes dating back over two centuries and covering Europe and Asia, we investigate patterns of spatial and temporal bias in five sources of species distribution data; museum collections, the scientific literature, ringing records, ornithological atlases and website reports from 'citizen scientists'. Museum data were found to provide the most comprehensive historical coverage of species' ranges but often proved extremely time-expensive to collect. Literature records have increased in their number and coverage through time whereas ringing, atlas and website data are almost exclusively restricted to the last few decades. Geographically, our data were biased towards Western Europe and Southeast Asia. Museums were the only data source to provide reasonably even spatial coverage across the entire study region. In the last three decades, literature data have become increasingly focussed towards threatened species and protected areas and currently no source is providing reliable baseline information, a role once filled by museum collections. As well as securing historical data for the future, and making it available for users, the sampling biases will need to be understood and addressed if we are to obtain a true picture of biodiversity change.

  16. d

    Bias estimation for seven precipitation datasets for the eastern MENA region...

    • catalog.data.gov
    • data.usgs.gov
    • +1more
    Updated Jul 6, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. Geological Survey (2024). Bias estimation for seven precipitation datasets for the eastern MENA region [Dataset]. https://catalog.data.gov/dataset/bias-estimation-for-seven-precipitation-datasets-for-the-eastern-mena-region
    Explore at:
    Dataset updated
    Jul 6, 2024
    Dataset provided by
    United States Geological Surveyhttp://www.usgs.gov/
    Area covered
    Middle East and North Africa
    Description

    Information on the spatio-temporal distribution of rainfall is critical for addressing water-related disasters, especially in the Middle East and North Africa's (MENA) arid to semi-arid regions. However, the availability of reliable rainfall datasets for most river basins is limited. In this study, we utilized observations from satellite-based rainfall data, in situ rain gauge observations, and rainfall climatology to determine the most suitable precipitation dataset in the MENA region. This dataset includes the supporting data and graphics for the analysis. The collection includes a spreadsheet containing all the data for the tables and charts, as well as the text file for the in situ data collected and used for the analysis.

  17. Data from: XBT and CTD pairs dataset Version 1

    • data.csiro.au
    • data.gov.au
    Updated Oct 16, 2014
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rebecca Cowley; Tim Boyer; Shoichi Kizu; Kimio Hanawa; Gustavo Goni; Esmee Van Wijk; Steve Rintoul; Mark Rosenberg (2014). XBT and CTD pairs dataset Version 1 [Dataset]. http://doi.org/10.4225/08/52AE99A4663B1
    Explore at:
    Dataset updated
    Oct 16, 2014
    Dataset provided by
    CSIROhttp://www.csiro.au/
    Authors
    Rebecca Cowley; Tim Boyer; Shoichi Kizu; Kimio Hanawa; Gustavo Goni; Esmee Van Wijk; Steve Rintoul; Mark Rosenberg
    License

    Attribution 3.0 (CC BY 3.0)https://creativecommons.org/licenses/by/3.0/
    License information was derived automatically

    Time period covered
    Jan 1, 1967 - Dec 31, 2011
    Area covered
    Dataset funded by
    National Oceanic and Atmospheric Administrationhttp://www.noaa.gov/
    University of Miami
    ACE/CRC
    Tohoku University
    Bundesamt für Seeschifffahrt und Hydrographie (BSH)
    CSIROhttp://www.csiro.au/
    Description

    The XBT/CTD pairs dataset (Version 1) is the dataset used to calculate the historical XBT fall rate and temperature corrections presented in Cowley, R., Wijffels, S., Cheng, L., Boyer, T., and Kizu, S. (2013). Biases in Expendable Bathythermograph Data: A New View Based on Historical Side-by-Side Comparisons. Journal of Atmospheric and Oceanic Technology, 30, 1195–1225, doi:10.1175/JTECH-D-12-00127.1.
    http://journals.ametsoc.org/doi/abs/10.1175/JTECH-D-12-00127.1

    4,115 pairs from 114 datasets were used to derive the fall rate and temperature corrections. Each dataset contains the scientifically quality controlled version and (where available) the originator's data. The XBT/CTD pairs are identified in the document 'XBT_CTDpairs_metadata_V1.csv'. Note that future versions of the XBT/CTD pairs database may supersede this version. Please check more recent versions for updates to individual datasets. Lineage: Data is sourced from the World Ocean Database, NOAA, CSIRO Marine and Atmospheric Research, Bundesamt für Seeschifffahrt und Hydrographie (BSH), Hamburg, Germany, Australian Antarctic Division. Original and raw data files are included where available. Quality controlled datasets follow the procedure of Bailey, R., Gronell, A., Phillips, H., Tanner, E., and Meyers, G. (1994). Quality control cookbook for XBT data, Version 1.1. CSIRO Marine Laboratories Reports, 221. Quality controlled data is in the 'MQNC' format used at CSIRO Marine and Atmospheric Research. The MQNC format is described in the document 'XBT_CTDpairs_descriptionV1.pdf'. Note that future versions of the XBT/CTD pairs database may supersede this version. Please check more recent versions for updates to individual datasets.

  18. H

    Replication data for: Looking Beyond Demographics: Panel Attrition in the...

    • dataverse.harvard.edu
    Updated Oct 1, 2014
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Harvard Dataverse (2014). Replication data for: Looking Beyond Demographics: Panel Attrition in the ANES and GSS [Dataset]. http://doi.org/10.7910/DVN/RRDHGR
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Oct 1, 2014
    Dataset provided by
    Harvard Dataverse
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Time period covered
    2006 - 2010
    Area covered
    United States
    Description

    Longitudinal or panel surveys offer unique benefits for social science research, but they typically suffer from attrition, which reduces sample size and can result in biased inferences. Previous research tends to focus on the demographic predictors of attrition, conceptualizing attrition propensity as a stable, individual- level characteristic—some individuals (e.g., young, poor, residentially mobile) are more likely to drop out of a study than others. We argue that panel attrition reflects both the characteristics of the individual respondent as well as her survey experience, a factor shaped by the design and implementation features of the study. In this paper, we examine and compare the predictors of panel attrition in the 2008-2009 American National Election Study, an on- line panel, and the 2006-2010 General Social Survey, a face-to-face panel. In both cases, survey experience variables are predictive of panel attrition above and beyond the standard demographic predictors, but the particular measures of relevance differ across the two surveys. The findings inform statistical corrections for panel attrition bias and provide study design insights for future panel data collections.

  19. c

    Understanding Society: Interviewer Survey, 2014

    • datacatalogue.cessda.eu
    • beta.ukdataservice.ac.uk
    Updated Nov 28, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    University of Essex (2024). Understanding Society: Interviewer Survey, 2014 [Dataset]. http://doi.org/10.5255/UKDA-SN-7615-2
    Explore at:
    Dataset updated
    Nov 28, 2024
    Dataset provided by
    Institute for Social and Economic Research
    Authors
    University of Essex
    Time period covered
    May 1, 2014
    Area covered
    Great Britain
    Variables measured
    Individuals, Survey interviewers, National
    Measurement technique
    Self-completion
    Description

    Abstract copyright UK Data Service and data collection copyright owner.


    The Understanding Society: Interviewer Survey 2014 data file is the output of a research project that collected information from interviewers who worked on the Understanding Society data collection for Great Britain in the first round of interviews, which took place over the course of the years 2009 and 2010.

    One of the specific aims of the data collection was to better understand the nature of bias in respondent consent to data linkage and, in particular, the role of interviewers in obtaining consent and in affecting consent bias. Hence, the interviewer survey includes a number of questions eliciting interviewer attitudes to consent to data linkage and their experiences with asking for consent during the fieldwork. There are, however, also a great deal of questions that will be of interest to a greater audience of researchers, such as interviewer attitudes to persuasion, measures of their personality traits and markers of general trust.


    Main Topics:

    The dataset included the following topics:
    • interviewers' opinions on their job
    • attitudes to persuasion, surveys in general
    • personality traits and trust
    • data privacy and administration of Understanding Society
    • experience asking for consent to link to administrative data
    • usefulness of linked data and their own likelihood of consent to linked data in different domains

  20. Data from: Accounting for imperfect detection in data from museums and...

    • zenodo.org
    • data.niaid.nih.gov
    • +1more
    bin, csv
    Updated Jun 4, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kelley D. Erickson; Kelley D. Erickson; Adam B. Smith; Adam B. Smith (2022). Accounting for imperfect detection in data from museums and herbaria when modeling species distributions: Combining and contrasting data-level versus model-level bias correction [Dataset]. http://doi.org/10.5061/dryad.51c59zw8b
    Explore at:
    bin, csvAvailable download formats
    Dataset updated
    Jun 4, 2022
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Kelley D. Erickson; Kelley D. Erickson; Adam B. Smith; Adam B. Smith
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    The digitization of museum collections as well as an explosion in citizen science initiatives has resulted in a wealth of data that can be useful for understanding the global distribution of biodiversity, provided that the well-documented biases inherent in unstructured opportunistic data are accounted for. While traditionally used to model imperfect detection using structured data from systematic surveys of wildlife, occupancy models provide a framework for modelling the imperfect collection process that results in digital specimen data. In this study, we explore methods for adapting occupancy models for use with biased opportunistic occurrence data from museum specimens and citizen science platforms using 7 species of Anacardiaceae in Florida as a case study. We explored two methods of incorporating information about collection effort to inform our uncertainty around species presence: (1) filtering the data to exclude collectors unlikely to collect the focal species and (2) incorporating collection covariates (collection type, time of collection, and history of previous detections) into a model of collection probability. We found that the best models incorporated both the background data filtration step as well as collector covariates. Month, method of collection and whether a collector had previously collected the focal species were important predictors of collection probability. Efforts to standardize meta-data associated with data collection will improve efforts for modeling the spatial distribution of a variety of species.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
anon (2024). NewsUnravel Dataset [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_8344890

NewsUnravel Dataset

Explore at:
Dataset updated
Jul 11, 2024
Dataset authored and provided by
anon
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

About the NUDA DatasetMedia bias is a multifaceted problem, leading to one-sided views and impacting decision-making. A way to address bias in news articles is to automatically detect and indicate it through machine-learning methods. However, such detection is limited due to the difficulty of obtaining reliable training data. To facilitate the data-gathering process, we introduce NewsUnravel, a news-reading web application leveraging an initially tested feedback mechanism to collect reader feedback on machine-generated bias highlights within news articles. Our approach augments dataset quality by significantly increasing inter-annotator agreement by 26.31% and improving classifier performance by 2.49%. As the first human-in-the-loop application for media bias, NewsUnravel shows that a user-centric approach to media bias data collection can return reliable data while being scalable and evaluated as easy to use. NewsUnravel demonstrates that feedback mechanisms are a promising strategy to reduce data collection expenses, fluidly adapt to changes in language, and enhance evaluators' diversity.

General

This dataset was created through user feedback on automatically generated bias highlights on news articles on the website NewsUnravel made by ANON. Its goal is to improve the detection of linguistic media bias for analysis and to indicate it to the public. Support came from ANON. None of the funders played any role in the dataset creation process or publication-related decisions.

The dataset consists of text, namely biased sentences with binary bias labels (processed, biased or not biased) as well as metadata about the article. It includes all feedback that was given. The single ratings (unprocessed) used to create the labels with correlating User IDs are included.

For training, this dataset was combined with the BABE dataset. All data is completely anonymous. Some sentences might be offensive or triggering as they were taken from biased or more extreme news sources. The dataset does not identify sub-populations or can be considered sensitive to them, nor is it possible to identify individuals.

Description of the Data Files

This repository contains the datasets for the anonymous NewsUnravel submission. The tables contain the following data:

NUDAdataset.csv: the NUDA dataset with 310 new sentences with bias labelsStatistics.png: contains all Umami statistics for NewsUnravel's usage dataFeedback.csv: holds the participantID of a single feedback with the sentence ID (contentId), the bias rating, and provided reasonsContent.csv: holds the participant ID of a rating with the sentence ID (contentId) of a rated sentence and the bias rating, and reason, if givenArticle.csv: holds the article ID, title, source, article metadata, article topic, and bias amount in %Participant.csv: holds the participant IDs and data processing consent

Collection Process

Data was collected through interactions with the Feedback Mechanism on NewsUnravel. A news article was displayed with automatically generated bias highlights. Each highlight could be selected, and readers were able to agree or disagree with the automatic label. Through a majority vote, labels were generated from those feedback interactions. Spammers were excluded through a spam detection approach.

Readers came to our website voluntarily through posts on LinkedIn and social media as well as posts on university boards. The data collection period lasted for one week, from March 4th to March 11th (2023). The landing page informed them about the goal and the data processing. After being informed, they could proceed to the article overview.

So far, the dataset has been used on top of BABE to train a linguistic bias classifier, adopting hyperparameter configurations from BABE with a pre-trained model from Hugging Face.The dataset will be open source. On acceptance, a link with all details and contact information will be provided. No third parties are involved.

The dataset will not be maintained as it captures the first test of NewsUnravel at a specific point in time. However, new datasets will arise from further iterations. Those will be linked in the repository. Please cite the NewsUnravel paper if you use the dataset and contact us if you're interested in more information or joining the project.

Search
Clear search
Close search
Google apps
Main menu