https://dataverse.harvard.edu/api/datasets/:persistentId/versions/1.2/customlicense?persistentId=doi:10.7910/DVN/FHD6M2https://dataverse.harvard.edu/api/datasets/:persistentId/versions/1.2/customlicense?persistentId=doi:10.7910/DVN/FHD6M2
Audio-visual data is ubiquitous in politics. Campaign advertisements, political debates, and the news cycle all constantly generate sound bites and imagery, which in turn inform and affect voters. Though these sources of information have been a topic of research in political science for decades, their study has been limited by the cost of human coding. To name but one example, to answer questions about the effects of negative campaign advertisements, humans must watch tens of thousands of advertisements and manually label them. And even if the necessary resources can be mustered for such a study, future researchers may be interested in a different set of labels, and so must either recode every advertisement or discard the exercise entirely. Through three separate models, this dissertation resolves this limitation by developing automated methods to study the most common types of audio-video data in political science. The first two models are neural networks, the third a hierarchical hidden Markov model. In Chapter 1, I introduce neural networks and their complications to political science, building up from familiar statistical methods. I then develop a novel neural network for classifying newspaper articles, using both the text of the article and the imagery as data. The model is applied to an original data set of articles about fake news, which I collected by developing and deploying bots to concurrently crawl the online pages of newspapers and download news text and images. This is a novel engineering effort that future researchers can leverage to collect effectively limitless amounts of data about the news. Building on the methodological foundations established in Chapter 1, in Chapter 2 I develop a second neural network for classifying political video and demonstrate that the model can automate classification of campaign advertisements, using both the visual and the audio information. In Chapter 3 (joint with Dean Knox), I develop a hierarchical hidden Markov model for speech classification and demonstrate it with an application to speech on the Supreme Court. Finally, in Chapter 4 (joint with Volha Charnysh and Prerna Singh), I demonstrate the behavioral effects of imagery through a dictator game in which a visual image reduces out-group bias. In sum, this dissertation introduces a new type of data to political science, validates its substantive importance, and develops models for its study in the substantive context of politics.
Each R script replicates all of the example code from one chapter from the book. All required data for each script are also uploaded, as are all data used in the practice problems at the end of each chapter. The data are drawn from a wide array of sources, so please cite the original work if you ever use any of these data sets for research purposes.
The Cross-National Time-Series Data Archive provides more than 200 years of annual data for nations and empires of the world including those that no longer exist. It covers demographic, social, political, and economic topics. Select data goes back to 1815. Not all indicators are available for all countries or in all years. Fore data definitions, list of variables and countries covered, consult the accompanying codebook and user manuals. More information on topics, list of variables and countries covered is also available on CNTS website. DATA AVAILABLE FOR YEARS: 1815-2023
In Italy, the number of new books and editions published in the category ‘political science, economics and finance’ generally increased from 2007 to 2019. In 2018, more than 1.8 thousand books about political science, economics and finance were released. This number dropped to around 1.6 thousand books by 2019.
Political science researchers have flexibility in how to analyze data, how to report data, and whether to report on data. Review of examples of reporting flexibility from the race and sex discrimination literature illustrates how research design choices can influence estimates and inferences. This reporting flexibility—coupled with the political imbalance among political scientists—creates the potential for political bias in reported political science estimates, but this potential for political bias can be reduced or eliminated through preregistration and preacceptance, in which researchers commit to a research design before completing data collection. Removing the potential for reporting flexibility can raise the credibility of political science research.
Abstract copyright UK Data Service and data collection copyright owner.
The Data for Undergraduate Political Science Courses datasets have been derived from three major public opinion studies: Eurobarometer 64.2: the European Constitution, Globalization, Energy Resources, and Agricultural Policy, October - November, 2005 (held at the UKDA under SN 5505); British Election Study, 2005 (BES) (held under SNs 5494-5496); and the British Social Attitudes Survey, 2005 (BSA) (held under SN 5618), for the purpose of teaching data analysis to undergraduates in political science. The datasets have been 'cleaned' in order to aid students using data for the first time. Some variables have been removed, many variable names have been changed to enable more substantive meaning to be taken from them, and new codebooks have been created for each of the three derived datasets.CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Political scientists rely on complex software to conduct research, and much of the software they use is written and distributed for free by other researchers. We argue that creating and maintaining these public goods is very costly for individual software developers, but that it is not adequately incentivized by the academic community. We demonstrate that statistical software is widely used but rarely cited in political science, and we highlight a partial solution to this problem: software bibliographies. To facilitate their creation, we introduce an \texttt{R} package which scans analysis scripts, detects the software used in those scripts, and creates bibliographies automatically. We hope that recognizing the contribution of software developers to science will encourage more academics to create public goods, which could yield important downstream benefits.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Benford's test statistics based on polling centers.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Predicting votes for Mr. Obama (1) versus Mr. McCain (0) from explicit and implicit prejudice toward Blacks and their interactions with confidence. Controlling for date of implicit attitude measure administration. Model 1 examines explicit prejudice separately (N = 2,056). Model 2 examines implicit prejudice separately (N = 2,024). Model 3 examines both prejudice measures simultaneously (N = 2,024). CCC: correctly classified cases; B: regression weight B (log odds); SE: standard error of the regression weight B; Wald: Wald test statistic; OR: Odds ratio. Relative amount by which the odds increase (OR >1.0) or decrease (OR
http://www.apache.org/licenses/LICENSE-2.0http://www.apache.org/licenses/LICENSE-2.0
# Replication Package for 'Political Expression of Academics on Social Media' by Prashant Garg and Thiemo Fetzer.
## Overview
This replication package contains all necessary scripts and data to replicate the main figures and tables presented in the paper.
## Folder Structure
### 1. `1_scripts`
This folder contains all scripts required to replicate the main figures and tables of the paper. The scripts are numbers with a prefix (e.g. "1_") in the order they should be run. Output will also be produced in this folder.
- `0_init.Rmd`: An R Markdown file that installs and loads all packages necessary for the subsequent scripts.
- `1_fig_1.Rmd`: Primarily produces Figure 1 (Zipf's plots) and conducts statistical tests to support underlying statistical claims made through the figure.
- `2_fig_2_to_4.Rmd`: Primarily produces Figures 2 to 4 (average levels of expression) and conducts statistical tests to support underlying statistical claims made through the figures. This includes conducting t-tests to establish subgroup differences.
The script also includes The file table_controlling_how.csv contains the full set of regression results for the analysis of subgroup differences in political stances, controlling for emotionality, egocentrism, and toxicity. This file includes effect sizes, standard errors, confidence intervals, and p-values for each stance, group variable, and confounder.
- `3_fig_5_to_6.Rmd`: Primarily produces Figures 5 to 6 (trends in expression) and conducts statistical tests to support underlying statistical claims made through the figures. This includes conducting t-tests to establish subgroup differences.
- `4_tab_1_to_2.Rmd`: Produces Tables 1 to 2, and shows code for Table A5 (descriptive tables).
Expected run time for each script is under 3 minutes and requires around 4GB RAM. Script `3_fig_5_to_6.Rmd` can take up to 3-4 minutes and requires up to 6GB RAM. Installation of each package for the first time user may take around 2 minutes each, except 'tidyverse', which may take around 4 minutes.
We have not provided a demo since the actual dataset used for analysis is small enough and computations are efficient enough to be run in most systems.
Each script starts with a layperson explanation to overview the functionality of the code and a pseudocode for a detailed procedure, followed by the actual code.
### 2. `2_data`
This folder contains all data used to replicate the main results. The data is called by the respective scripts automatically using relative paths.
- `data_dictionary.txt`: Provides a description of all variables as they are coded in the various datasets, especially the main author by time level dataset called `repl_df.csv`.
- Processed data at individual author by time (year by month) level aggregated measures are provided, as raw data containing raw tweets cannot be shared.
## Installation Instructions
### Prerequisites
This project uses R and RStudio. Make sure you have the following installed:
- [R](https://cran.r-project.org/) (version 4.0.0 or later)
- [RStudio](https://www.rstudio.com/products/rstudio/download/)
Once installed, to ensure the correct versions of the required packages are installed, use the following R markdown script '0_init.Rmd'. This script will install the `remotes` package (if not already installed) and then install the specified versions of the required packages.
## Running the Scripts
Open 0_init.Rmd in RStudio and run all chunks to install and load the required packages.
Run the remaining scripts (1_fig_1.Rmd, 2_fig_2_to_4.Rmd, 3_fig_5_to_6.Rmd, and 4_tab_1_to_2.Rmd) in the order they are listed to reproduce the figures and tables from the paper.
# Contact
For any questions, feel free to contact Prashant Garg at prashant.garg@imperial.ac.uk.
# License
This project is licensed under the Apache License 2.0 - see the license.txt file for details.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Benford's test statistics based on electoral units with 100 or more votes for Chávez.
Data on political science journals. See https://github.com/resulumit/psjournals for more details.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The Digitalisation in Parties (DIGIPART) dataset (v.1) comprises information on party digitalisation features from 72 parties across five major European countries: Germany, Italy, France, Spain, and the United Kingdom. Compared to the initial version (v.0), which included data from 62 parties, version 1.1 of the DIGIPART dataset has been expanded to include new data on additional regional parties within these countries (n=76).
The dataset, stored in Excel format (xlsx) along with a codebook, captures information and evidence from various parties, collected and coded between July 2021 and September 2022.
Despite numerous studies examining the influence of digital technologies on political parties, a comprehensive comparative analysis of parties' responses to digitalisation remains scarce. The DIGIPART dataset aims to address this gap by mapping and analysing parties' digitalisation efforts.
DIGIPART includes fundamental data for identifying units of analysis, such as COUNTRY_ID and COUNTRY codes following Eurostat conventions, PARTY_ID codes, party acronyms, party names in English, year of foundation, ideology based on the Chapel Hill Experts Survey, election year, percentage of votes, and share of MPs in the national parliament's Lower Chamber. Vote and MP data are sourced from the Parlgov database or press sources for parties not covered in Parlgov.
Structured according to Fitzpatrick’s Five Pillar model, with adaptations for alternative digital democracy conceptions, the dataset provides insights into six main dimensions of party functions and activities: elections (EL), deliberation (DEL), participation (PART), resources (SOURCE), and communication (COM). Each dimension features several dichotomously coded indicators: 0 for no evidence of digital activity, 1 for evidence, and a dot (.) for controversial evidence or when none is found. Overall, the dataset offers specific information on 23 indicators, making it the most comprehensive account of party digitalisation to date.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
National Center for Education Statistics (NCES) dataset of 127 political science MA programs from the College Navigator tool with program website information added.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Why politics matters : an introduction to political science is a book. It was written by Kevin L. Dooley and published by Cengage in 2021.
The module was administered as a post-election interview. The resulting data are provided along with voting, demographic, district and macro variables in a single dataset.
CSES Variable List The list of variables is being provided on the CSES Website to help in understanding what content is available from CSES, and to compare the content available in each module.
Themes: MICRO-LEVEL DATA:
Identification and study administration variables: mode of interview; gender of interviewer; date questionnaire administered; election type; weighting factors; if multiple rounds: percent of vote selected parties received in first round; selection of head of state; direct election of head of state and process of direct election; threshold for first-round victory; selection of candidates for the final round; simple majority or absolute majority for 2nd round victory; primary electoral district of respondent; number of days the interview was conducted after the election
Demography: age; gender; education; marital status; union membership; union membership of others in household; business association membership, farmers´ association membership; professional association membership; current employment status; main occupation; socio economic status; employment type - public or private; industrial sector; current employment status, occupation, socio economic status, employment type - public or private and industrial sector of spouse; household income; number of persons in household; number of children in household under the age of 18; attendance at religious services; race; ethnicity; religiosity; religious denomination; language usually spoken at home; region of residence; rural or urban residence
Survey variables: political participation during the recent election campaign (persuade others, campaign activities) and frequency of political participation; contacted by candidate or party during the campaign; respondent cast a ballot at the current and the previous election; vote choice (presidential, lower house and upper house elections) at the current and the previous election; respondent cast candidate preference vote at the current election; most important issue; evaluation of governments performance concerning the most important issue and in general; satisfaction with the democratic process in the country; attitude towards selected statements: it makes a difference who is in power and who people vote for; democracy is better than any other form of government; respondent cast candidate preference vote at the previous election; judgement of the performance of the party the respondent voted for in the previous election; judgement how well voters´ views are represented in elections; party and leader that represent respondent´s view best; form of questionnaire (long or short); party identification; intensity of party identification; sympathy scale for selected parties; assessment of parties and political leaders on a left-right-scale; political participation during the last 5 years: contacted a politician or government, protest or demonstration, work with others who share the same concern; respect for individual freedom and human rights; assessment how much corruption is widespread in the country; self-placement on a left-right-scale; political information items
DISTRICT-LEVEL DATA:
number of seats contested in electoral district, number of candidates, number of party lists, percent vote of different parties, official voter turnout in electoral district
MACRO-LEVEL DATA:
percent of popular vote received by parties in current (lower house/upper house) legislative election; percent of seats in lower house received by parties in current lower house/upper house election; percentage of official voter turnout; number of portfolios held by each party in cabinet, prior to and after the most recent election; year of party foundation; ideological family the parties are closest to; European parliament political group and international organization the parties belong to; significant parties not represented before and after the election; left-right position of parties; general concensus on these left-right placements among informed observers in the country; alternative dimension placements; consensus on the alternative dimension placements; most salient factors in the election; consensus on the salience ranking; electoral alliances permitted during the election campaign; name of alliance and participant parties; number of elected legislative chambers; for lower house and upper house was asked: number of electoral segments; number of primary districts; number of seats; district magnitude (number of members elected from each district); number of secondary and tertiary electoral districts; compulsory voting; votes cast; voting procedure; transferrable votes; cumulated votes if more than one can be cast; party threshold; used electoral formula; party lists close, open, or flexible; parties can run joint lists; possibility of...
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Replication files for "Big Data meets Open Political Science: An Empirical Assessment of Transparency Standards 2008-2019". The analysis_replication.do file reproduces figures and tables using the replication.dta data file. For ease of access, the .pdf contains the code from analysis.do, while the .xls and .csv files contain the replication data in different formats.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The Comparative Political Economy Database (CPEDB) began at the Centre for Learning, Social Economy and Work (CLSEW) at the Ontario Institute for Studies in Education at the University of Toronto (OISE/UT) as part of the Changing Workplaces in a Knowledge Economy (CWKE) project. This data base was initially conceived and developed by Dr. Wally Seccombe (independent scholar) and Dr. D.W. Livingstone (Professor Emeritus at the University of Toronto). Seccombe has conducted internationally recognized historical research on evolving family structures of the labouring classes (A Millennium of Family Change: Feudalism to Capitalism in Northwestern Europe and Weathering the Storm: Working Class Families from the Industrial Revolution to the Fertility Decline). Livingstone has conducted decades of empirical research on class and labour relations. A major part of this research has used the Canadian Class Structure survey done at the Institute of Political Economy (IPE) at Carleton University in 1982 as a template for Canadian national surveys in 1998, 2004, 2010 and 2016, culminating in Tipping Point for Advanced Capitalism: Class, Class Consciousness and Activism in the ‘Knowledge Economy’ (https://fernwoodpublishing.ca/book/tipping-point-for-advanced-capitalism) and a publicly accessible data base including all five of these Canadian surveys (https://borealisdata.ca/dataverse/CanadaWorkLearningSurveys1998-2016). Seccombe and Livingstone have collaborated on a number of research studies that recognize the need to take account of expanded modes of production and reproduction. Both Seccombe and Livingstone are Research Associates of CLSEW at OISE/UT. The CPEDB Main File (an SPSS data file) covers the following areas (in order): demography, family/household, class/labour, government, electoral democracy, inequality (economic, political & gender), health, environment, internet, macro-economic and financial variables. In its present form, it contains annual data on 725 variables from 12 countries (alphabetically listed): Canada, Denmark, France, Germany, Greece, Italy, Japan, Norway, Spain, Sweden, United Kingdom and United States. A few of the variables date back to 1928, and the majority date from 1960 to 1990. Where these years are not covered in the source, a minority of variables begin with more recent years. All the variables end at the most recent available year (1999 to 2022). In the next version developed in 2025, the most recent years (2023 and 2024) will be added whenever they are present in the sources’ datasets. For researchers who are not using SPSS, refer to the Chart files for overviews, summaries and information on the dataset. For a current list of the variable names and their labels in the CPEDB data base, see the excel file: Outline of SPSS file Main CPEDB, Nov 6, 2023. At the end of each variable label in this file and the SPSS datafile, you will find the source of that variable in a bracket. If I have combined two variables from a given source, the bracket will begin with WS and then register the variables combined. In the 14 variables David created at the beginning of the Class Labour section, you will find DWL in these brackets with his description as to how it was derived. The CPEDB’s variables have been derived from many databases; the main ones are OECD (their Statistics and Family Databases), World Bank, ILO, IMF, WHO, WIID (World Income Inequality Database), OWID (Our World in Data), Parlgov (Parliaments and Governments Database), and V-Dem (Varieties of Democracy). The Institute for Political Economy at Carleton University is currently the main site for continuing refinement of the CPEDB. IPE Director Justin Paulson and other members are involved along with Seccombe and Livingstone in further development and safe storage of this updated database both at the IPE at Carleton and the UT dataverse. All those who explore the CPEDB are invited to share their perceptions of the entire database, or any of its sections, with Seccombe generally (wseccombe@sympatico.ca) and Livingstone for class/labour issues (davidlivingstone@utoronto.ca). They welcome any suggestions for additional variables together with their data sources. A new version CPEDB will be created in the spring of 2025 and installed as soon as the revision is completed. This revised version is intended to be a valuable resource for researchers in all of the included countries as well as Canada.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This is an archive of all versions of V-Dem data and associated documentation: aggregated and disaggregated data, codebook, citation instructions, variable labels, etc.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Contributions in political science is a book series. It includes 230 books, written by 211 different authors.
https://dataverse.harvard.edu/api/datasets/:persistentId/versions/1.2/customlicense?persistentId=doi:10.7910/DVN/FHD6M2https://dataverse.harvard.edu/api/datasets/:persistentId/versions/1.2/customlicense?persistentId=doi:10.7910/DVN/FHD6M2
Audio-visual data is ubiquitous in politics. Campaign advertisements, political debates, and the news cycle all constantly generate sound bites and imagery, which in turn inform and affect voters. Though these sources of information have been a topic of research in political science for decades, their study has been limited by the cost of human coding. To name but one example, to answer questions about the effects of negative campaign advertisements, humans must watch tens of thousands of advertisements and manually label them. And even if the necessary resources can be mustered for such a study, future researchers may be interested in a different set of labels, and so must either recode every advertisement or discard the exercise entirely. Through three separate models, this dissertation resolves this limitation by developing automated methods to study the most common types of audio-video data in political science. The first two models are neural networks, the third a hierarchical hidden Markov model. In Chapter 1, I introduce neural networks and their complications to political science, building up from familiar statistical methods. I then develop a novel neural network for classifying newspaper articles, using both the text of the article and the imagery as data. The model is applied to an original data set of articles about fake news, which I collected by developing and deploying bots to concurrently crawl the online pages of newspapers and download news text and images. This is a novel engineering effort that future researchers can leverage to collect effectively limitless amounts of data about the news. Building on the methodological foundations established in Chapter 1, in Chapter 2 I develop a second neural network for classifying political video and demonstrate that the model can automate classification of campaign advertisements, using both the visual and the audio information. In Chapter 3 (joint with Dean Knox), I develop a hierarchical hidden Markov model for speech classification and demonstrate it with an application to speech on the Supreme Court. Finally, in Chapter 4 (joint with Volha Charnysh and Prerna Singh), I demonstrate the behavioral effects of imagery through a dictator game in which a visual image reduces out-group bias. In sum, this dissertation introduces a new type of data to political science, validates its substantive importance, and develops models for its study in the substantive context of politics.