Audio-visual data is ubiquitous in politics. Campaign advertisements, political debates, and the news cycle all constantly generate sound bites and imagery, which in turn inform and affect voters. Though these sources of information have been a topic of research in political science for decades, their study has been limited by the cost of human coding. To name but one example, to answer questions about the effects of negative campaign advertisements, humans must watch tens of thousands of advertisements and manually label them. And even if the necessary resources can be mustered for such a study, future researchers may be interested in a different set of labels, and so must either recode every advertisement or discard the exercise entirely. Through three separate models, this dissertation resolves this limitation by developing automated methods to study the most common types of audio-video data in political science. The first two models are neural networks, the third a hierarchical hidden Markov model. In Chapter 1, I introduce neural networks and their complications to political science, building up from familiar statistical methods. I then develop a novel neural network for classifying newspaper articles, using both the text of the article and the imagery as data. The model is applied to an original data set of articles about fake news, which I collected by developing and deploying bots to concurrently crawl the online pages of newspapers and download news text and images. This is a novel engineering effort that future researchers can leverage to collect effectively limitless amounts of data about the news. Building on the methodological foundations established in Chapter 1, in Chapter 2 I develop a second neural network for classifying political video and demonstrate that the model can automate classification of campaign advertisements, using both the visual and the audio information. In Chapter 3 (joint with Dean Knox), I develop a hierarchical hidden Markov model for speech classification and demonstrate it with an application to speech on the Supreme Court. Finally, in Chapter 4 (joint with Volha Charnysh and Prerna Singh), I demonstrate the behavioral effects of imagery through a dictator game in which a visual image reduces out-group bias. In sum, this dissertation introduces a new type of data to political science, validates its substantive importance, and develops models for its study in the substantive context of politics.
Political scientists rely on complex software to conduct research, and much of the software they use is written and distributed for free by other researchers. We argue that creating and maintaining these public goods is very costly for individual software developers, but that it is not adequately incentivized by the academic community. We demonstrate that statistical software is widely used but rarely cited in political science, and we highlight a partial solution to this problem: software bibliographies. To facilitate their creation, we introduce an \texttt{R} package which scans analysis scripts, detects the software used in those scripts, and creates bibliographies automatically. We hope that recognizing the contribution of software developers to science will encourage more academics to create public goods, which could yield important downstream benefits.
In Italy, the number of new books and editions published in the category ‘political science, economics and finance’ generally increased from 2007 to 2019. In 2018, more than 1.8 thousand books about political science, economics and finance were released. This number dropped to around 1.6 thousand books by 2019.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Each R script replicates all of the example code from one chapter from the book. All required data for each script are also uploaded, as are all data used in the practice problems at the end of each chapter. The data are drawn from a wide array of sources, so please cite the original work if you ever use any of these data sets for research purposes.
The Cross-National Time-Series Data Archive provides more than 200 years of annual data for nations and empires of the world including those that no longer exist. It covers demographic, social, political, and economic topics. Select data goes back to 1815. Not all indicators are available for all countries or in all years. Fore data definitions, list of variables and countries covered, consult the accompanying codebook and user manuals. More information on topics, list of variables and countries covered is also available on CNTS website. DATA AVAILABLE FOR YEARS: 1815-2023
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Benford's test statistics based on polling centers.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Vote shares of the smaller parties and a few selected coalitions of interest. These shares are calculated based on a pooled survey (at each date pooled sample was calculated based on aggregation of polls from different German pollsters). For each date this data set provides a subset of 500 simulations from a Dirichlet distribution, which is the (Bayesian) Posterior distribution of a Multinomial Likelihood (pooled survey) and a uninformative (flat) Dirichlet-Prior. Each simulation provides a possible election outcome. The vote shares provided here are based on "redistributed votes" (i.e. vote shares of parties below 5% threshold are redistributed proportionally to parties above the 5% threshold). Data from October 2016 until September 22nd, 2017.
http://www.apache.org/licenses/LICENSE-2.0http://www.apache.org/licenses/LICENSE-2.0
# Replication Package for 'Political Expression of Academics on Social Media' by Prashant Garg and Thiemo Fetzer.
## Overview
This replication package contains all necessary scripts and data to replicate the main figures and tables presented in the paper.
## Folder Structure
### 1. `1_scripts`
This folder contains all scripts required to replicate the main figures and tables of the paper. The scripts are numbers with a prefix (e.g. "1_") in the order they should be run. Output will also be produced in this folder.
- `0_init.Rmd`: An R Markdown file that installs and loads all packages necessary for the subsequent scripts.
- `1_fig_1.Rmd`: Primarily produces Figure 1 (Zipf's plots) and conducts statistical tests to support underlying statistical claims made through the figure.
- `2_fig_2_to_4.Rmd`: Primarily produces Figures 2 to 4 (average levels of expression) and conducts statistical tests to support underlying statistical claims made through the figures. This includes conducting t-tests to establish subgroup differences.
The script also includes The file table_controlling_how.csv contains the full set of regression results for the analysis of subgroup differences in political stances, controlling for emotionality, egocentrism, and toxicity. This file includes effect sizes, standard errors, confidence intervals, and p-values for each stance, group variable, and confounder.
- `3_fig_5_to_6.Rmd`: Primarily produces Figures 5 to 6 (trends in expression) and conducts statistical tests to support underlying statistical claims made through the figures. This includes conducting t-tests to establish subgroup differences.
- `4_tab_1_to_2.Rmd`: Produces Tables 1 to 2, and shows code for Table A5 (descriptive tables).
Expected run time for each script is under 3 minutes and requires around 4GB RAM. Script `3_fig_5_to_6.Rmd` can take up to 3-4 minutes and requires up to 6GB RAM. Installation of each package for the first time user may take around 2 minutes each, except 'tidyverse', which may take around 4 minutes.
We have not provided a demo since the actual dataset used for analysis is small enough and computations are efficient enough to be run in most systems.
Each script starts with a layperson explanation to overview the functionality of the code and a pseudocode for a detailed procedure, followed by the actual code.
### 2. `2_data`
This folder contains all data used to replicate the main results. The data is called by the respective scripts automatically using relative paths.
- `data_dictionary.txt`: Provides a description of all variables as they are coded in the various datasets, especially the main author by time level dataset called `repl_df.csv`.
- Processed data at individual author by time (year by month) level aggregated measures are provided, as raw data containing raw tweets cannot be shared.
## Installation Instructions
### Prerequisites
This project uses R and RStudio. Make sure you have the following installed:
- [R](https://cran.r-project.org/) (version 4.0.0 or later)
- [RStudio](https://www.rstudio.com/products/rstudio/download/)
Once installed, to ensure the correct versions of the required packages are installed, use the following R markdown script '0_init.Rmd'. This script will install the `remotes` package (if not already installed) and then install the specified versions of the required packages.
## Running the Scripts
Open 0_init.Rmd in RStudio and run all chunks to install and load the required packages.
Run the remaining scripts (1_fig_1.Rmd, 2_fig_2_to_4.Rmd, 3_fig_5_to_6.Rmd, and 4_tab_1_to_2.Rmd) in the order they are listed to reproduce the figures and tables from the paper.
# Contact
For any questions, feel free to contact Prashant Garg at prashant.garg@imperial.ac.uk.
# License
This project is licensed under the Apache License 2.0 - see the license.txt file for details.
If the publication decisions of journals are a function of the statistical significance of research findings, the published literature may suffer from “publication bias.” This paper describes a method for detecting publication bias. We point out that to achieve statistical significance, the effect size must be larger in small samples. If publications tend to be biased against statistically insignificant results, we should observe that the effect size diminishes as sample sizes increase. This proposition is tested and confirmed using the experimental literature on voter mobilization.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
National Center for Education Statistics (NCES) dataset of 127 political science MA programs from the College Navigator tool with program website information added.
More than half of surveyed Finnish respondents agreed in 2021 that scientists should be involved in political debates in order to ensure decisions are made in accordance with scientific evidence. Only two percent of the respondents totally disagreed with this idea.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Benford's test statistics based on electoral units with 100 or more votes for Chávez.
The module was administered as a post-election interview. The resulting data are provided along with voting, demographic, district and macro variables in a single dataset.
CSES Variable List The list of variables is being provided on the CSES Website to help in understanding what content is available from CSES, and to compare the content available in each module.
Themes: MICRO-LEVEL DATA:
Identification and study administration variables: mode of interview; gender of interviewer; date questionnaire administered; election type; weighting factors; if multiple rounds: percent of vote selected parties received in first round; selection of head of state; direct election of head of state and process of direct election; threshold for first-round victory; selection of candidates for the final round; simple majority or absolute majority for 2nd round victory; primary electoral district of respondent; number of days the interview was conducted after the election
Demography: age; gender; education; marital status; union membership; union membership of others in household; business association membership, farmers´ association membership; professional association membership; current employment status; main occupation; socio economic status; employment type - public or private; industrial sector; current employment status, occupation, socio economic status, employment type - public or private and industrial sector of spouse; household income; number of persons in household; number of children in household under the age of 18; attendance at religious services; race; ethnicity; religiosity; religious denomination; language usually spoken at home; region of residence; rural or urban residence
Survey variables: political participation during the recent election campaign (persuade others, campaign activities) and frequency of political participation; contacted by candidate or party during the campaign; respondent cast a ballot at the current and the previous election; vote choice (presidential, lower house and upper house elections) at the current and the previous election; respondent cast candidate preference vote at the current election; most important issue; evaluation of governments performance concerning the most important issue and in general; satisfaction with the democratic process in the country; attitude towards selected statements: it makes a difference who is in power and who people vote for; democracy is better than any other form of government; respondent cast candidate preference vote at the previous election; judgement of the performance of the party the respondent voted for in the previous election; judgement how well voters´ views are represented in elections; party and leader that represent respondent´s view best; form of questionnaire (long or short); party identification; intensity of party identification; sympathy scale for selected parties; assessment of parties and political leaders on a left-right-scale; political participation during the last 5 years: contacted a politician or government, protest or demonstration, work with others who share the same concern; respect for individual freedom and human rights; assessment how much corruption is widespread in the country; self-placement on a left-right-scale; political information items
DISTRICT-LEVEL DATA:
number of seats contested in electoral district, number of candidates, number of party lists, percent vote of different parties, official voter turnout in electoral district
MACRO-LEVEL DATA:
percent of popular vote received by parties in current (lower house/upper house) legislative election; percent of seats in lower house received by parties in current lower house/upper house election; percentage of official voter turnout; number of portfolios held by each party in cabinet, prior to and after the most recent election; year of party foundation; ideological family the parties are closest to; European parliament political group and international organization the parties belong to; significant parties not represented before and after the election; left-right position of parties; general concensus on these left-right placements among informed observers in the country; alternative dimension placements; consensus on the alternative dimension placements; most salient factors in the election; consensus on the salience ranking; electoral alliances permitted during the election campaign; name of alliance and participant parties; number of elected legislative chambers; for lower house and upper house was asked: number of electoral segments; number of primary districts; number of seats; district magnitude (number of members elected from each district); number of secondary and tertiary electoral districts; compulsory voting; votes cast; voting procedure; transferrable votes; cumulated votes if more than one can be cast; party threshold; used electoral formula; party lists close, open, or flexible; parties can run joint lists; possibility of...
Teaching undergraduate political methodology courses is a challenging task, yet has garnered little pedagogical discussion within the discipline. With the growing use of technology in the classroom, as well as the growing demand for data science and data literacy in our society, better understanding how we use statistical software in these courses is warranted. In this short paper, we shed light on current practices in teaching political methodology courses, with a particular emphasis on the use of statistical software. Combining an analysis of 93 course syllabi with a quantitative survey of research method instructors, we provide key information on the structure of these courses and how they incorporate statistical software. Our results reflect the growing importance of data literacy within the discipline, and suggest that more intentional discussions of research method pedagogy are needed in the future.
We propose a new methodology for inferring political actors' latent memberships in communities of collective activity that drive their observable interactions. Unlike existing methods, the proposed Bipartite Link Community Model (biLCM) (1) applies to two groups of actors, (2) takes into account that actors may be members of more than one community, and (3) allows a pair of actors to interact in more than one way. We apply this method to characterize legislative communities of special interest groups and politicians in the 113th U.S. Congress. Previous empirical studies of interest group politics have been limited by the difficulty of observing the ties between interest groups and politicians directly. We therefore first construct an original dataset that connects the politicians who sponsor congressional bills with the interest groups that lobby on those bills based on more than two million textual descriptions of lobbying activities. We then use the biLCM to make quantitative measurements of actors' community memberships ranging from narrow targeted interactions according to industry interests and jurisdictional committee membership to broad multifaceted connections across multiple policy domains.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Predicting votes for Mr. Obama (1) versus Mr. McCain (0) from explicit and implicit prejudice toward Blacks and their interactions with confidence. Controlling for date of implicit attitude measure administration. Model 1 examines explicit prejudice separately (N = 2,056). Model 2 examines implicit prejudice separately (N = 2,024). Model 3 examines both prejudice measures simultaneously (N = 2,024). CCC: correctly classified cases; B: regression weight B (log odds); SE: standard error of the regression weight B; Wald: Wald test statistic; OR: Odds ratio. Relative amount by which the odds increase (OR >1.0) or decrease (OR
Based on the academic reputation scores awarded on the QS World University Rankings by Subject in 2020, the Universidade de São Paulo in Brazil was ranked first by academic reputation for political science and international relations, with a score of 66.4 out of 100. It was followed by the Universidad de los Andes in Colombia, with 63.9 points.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
State capacity is a core concept in political science research, and it is widely recognized that state institutions exert considerable influence on outcomes such as economic development, civil conflict, democratic consolidation, and international security. Yet, researchers across these fields of inquiry face common problems involved in conceptualizing and measuring state capacity. In this article, we examine these conceptual issues, identify three core dimensions of state capacity, and develop the expectation that they are mutually supporting and interlinked. We then use Bayesian latent variable analysis to estimate state capacity at the conjunction of indicators related to these dimensions. We find strong interrelationships between the three dimensions and produce a new, general-purpose measure of state capacity with demonstrated validity for use in a wide range of empirical inquiries. It is hoped that this project will provide effective guidance and tools for researchers studying the causes and consequences of state capacity.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Note to researchers: for publicly available ideology estimates for municipalities, federal electoral districts, and 2019 Canadian Election Study respondents, see this link: https://doi.org/10.5683/SP2/BLYP7X. These files contain data and code for Jack Lucas and David A. Armstrong II, "Policy Ideology and Local Ideological Representation in Canada." Canadian Journal of Political Science.
The Swedish National Election Study 2014 was conducted in collaboration between the Department of Political Science in Gothenburg and Statistics Sweden (SCB). This collaboration has covered all of the parliamentary elections, referendums and elections to the European Parliament since 1956. The Department of Political Science is responsible for the questionnaires, interview instructions, processing of data, coding and analyses. SCB is responsible for sampling, field work and reporting to the official statistics.
The election study 2014 was conducted in a collaboration between the Department of Political Science in Gothenburg and Statistics Sweden (SCB). This collaboration has covered all of the parliamentary elections, referendums and elections to the European Parliament since 1956. The Department of Political Science is responsible for the questionnaires, interview instructions, processing of data, coding and analyses. SCB is responsible for sampling, field work and reporting to the official statistics.
The election study of the parliamentary election 2014 follows the same design as all of the previous election studies since the election in 1973. The study includes a representative sample of 3971 Swedish citizens who are qualified to vote in the age range 18–80 years old. Swedish citizens living abroad were excluded from the sample. 2230 individuals from the 3971 individuals included in the sample participated in an interview (56 percent). The number of drop-outs (non-response) was 1741 individuals. 845 individuals from this group declined participation, 800 individuals were not available and 91 persons could not participate because of illness or similar reasons (information is missing for 5 individuals).
The study is divided into two stages, one stage before the election and one stage after the election. Half of the sample is selected to be interviewed before the election, and the other half of the sample is to be interviewed after the election. The individuals who are interviewed before the election also receive a shorter questionnaire after the election, including questions about their voting choice in the election 2014 (parliament, county council and local municipality). The individuals who could, for different reasons, not be interviewed before the election by SCB are contacted again after the election. The respondents will then answer the post-election questionnaire.
The sample is is comprised of a two-step-panel. Half of the individuals in the selection participated in the election study of 2010 and are interviewed again in the election study 2014. The other half of the individuals in the sample are interviewed for the first time 2014 and are contacted again for an interview 2018.
Election studies, as with all survey studies, are affected by increasingly large numbers of drop-outs (non-responses). All of the individuals who hesitate/decline to participate in an interview are therefore asked to participate in a shortened version of the interview in order to mitigate the effect of this problem. Individuals who hesitate to participate in the shortened version are asked to participate in an extremely shortened version of the questionnaire.
The field work in stage A, the pre-election interviews, started the 18th of August and finished the 12th of September. The individuals who responded in stage A received a shorter post election-questionnaire by mail on the 15th of September. The field work in stage B, the post-election interviews, started the day after the election, the 15th of September, and finished the 10th of October. The work that followed was finished by the beginning of November.
Audio-visual data is ubiquitous in politics. Campaign advertisements, political debates, and the news cycle all constantly generate sound bites and imagery, which in turn inform and affect voters. Though these sources of information have been a topic of research in political science for decades, their study has been limited by the cost of human coding. To name but one example, to answer questions about the effects of negative campaign advertisements, humans must watch tens of thousands of advertisements and manually label them. And even if the necessary resources can be mustered for such a study, future researchers may be interested in a different set of labels, and so must either recode every advertisement or discard the exercise entirely. Through three separate models, this dissertation resolves this limitation by developing automated methods to study the most common types of audio-video data in political science. The first two models are neural networks, the third a hierarchical hidden Markov model. In Chapter 1, I introduce neural networks and their complications to political science, building up from familiar statistical methods. I then develop a novel neural network for classifying newspaper articles, using both the text of the article and the imagery as data. The model is applied to an original data set of articles about fake news, which I collected by developing and deploying bots to concurrently crawl the online pages of newspapers and download news text and images. This is a novel engineering effort that future researchers can leverage to collect effectively limitless amounts of data about the news. Building on the methodological foundations established in Chapter 1, in Chapter 2 I develop a second neural network for classifying political video and demonstrate that the model can automate classification of campaign advertisements, using both the visual and the audio information. In Chapter 3 (joint with Dean Knox), I develop a hierarchical hidden Markov model for speech classification and demonstrate it with an application to speech on the Supreme Court. Finally, in Chapter 4 (joint with Volha Charnysh and Prerna Singh), I demonstrate the behavioral effects of imagery through a dictator game in which a visual image reduces out-group bias. In sum, this dissertation introduces a new type of data to political science, validates its substantive importance, and develops models for its study in the substantive context of politics.