3 datasets found

o
Data and Code for: Measuring Geopolitical Risk
openicpsr.org
delimited
Updated Nov 16, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Matteo Iacoviello; Dario Caldara (2021). Data and Code for: Measuring Geopolitical Risk [Dataset]. http://doi.org/10.3886/E154781V1
Explore at:
delimitedAvailable download formats
Unique identifier
https://doi.org/10.3886/E154781V1
Dataset updated
Nov 16, 2021
Dataset provided by
American Economic Association
Authors
Matteo Iacoviello; Dario Caldara
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
Jan 1, 1900 - Dec 31, 2020
Area covered
United States and other countries
Description
OverviewThis archive contains the files to reproduce the results in "Measuring Geopolitical Risk" as well as any additional documentation referred in the paper. Each directory is self-contained. For each directory, download all the files in order to run the necessary scripts. Instructions are given in the README files.Updated data can be found on the geopolitical risk index webpage, which can be found at the following url: https://www.matteoiacoviello.com/gpr.htm For questions or comments, please contact iacoviel@gmail.comData Availability StatementAll the data used in this paper are provided in this repository, with the exception of the Compustat quarterly firm-level data, which can be downloaded from https://wrds-www.wharton.upenn.edu/pages/ with a registered account.Software used The codes here run and have been tested either on Stata/MP 16.0 (for *.do files), on Matlab R2019/A (for *.m files), on R Version 4.04 (for *.R files), and on Anaconda 3 (for *.py, *.ipynb files). Most codes run in seconds/minutes on a personal laptop with 16GB ram, with the exception of the R code to estimate disaster episodes, which takes about 2 days using the standard settings from the Nakamura et al (2013) paper (nIter = 50,000, nRuns = 40). Directory list and list of main input files - if any - in each directory1. Monthly Geopolitical Risk Data Used in the Paper (data_paper)See README.txt file in the directory for detailsdata_gpr_export.dta (Stata format)data_gpr_export.xls (Excel format)2. Replication of Section I: Tables 1-2, Figures 1-8, Appendix Tables A.3-A.6, and Appendix Figures A.1-A.4 and A.10-A.14 (figures_paper) (requires Stata)See README.txt in the directory for detailsinput file: run_figures_tables.do3. Replication of Section III : VAR Evidence - Figures 9-10 and Appendix Figures A.5-A.7 (var_results)(requires Matlab)See README.txt in the directory for detailsinput file: run_all.m4. Replication of Section IV : Country-Specific GPR and Disaster Probability and Quantile Regressions - Tables 3-4 (disaster_regressions)(requires Stata)See README.txt in the directory for detailsinput file: run_replication_country_gpr.do5. Replication of Section V : Firm-Specific Geopolitical Risk - Table 5, Figure 11, Appendix Table A.7, and Appendix Figure A.9 (firm_regressions)(requires Stata)See README.txt file in the directory for details.input file: run_replication_firm_shuffled.do(Note that replication of the results here requires downloading firm-level balance sheet data through Compustat/WRDS. See firm_documentation below for instructions on how to build the firm_level.dta file)6. Auxiliary Material (Section V): Construction of Industry-Specific Exposure to Geopolitical Risk - Appendix Figure A.8 (industry_regressions)(requires Stata)See README.txt file in the directory for details.input file: run_replication_industry.do7. Auxiliary Material: Documentation on how to Build the firm_level.dta file (firm_documentation)See README_BUILD.txt file in the directory for details.8. Auxiliary Material (Section II): Tabulations of Daily Narrative GPR Data from The New York Times (narrative_index)See README.txt file in the directory for details.9. Appendix: Details on the Construction of the Human GPR Index (human_index)See README.txt file in the directory for details.10. Appendix: Audit of Articles Belonging to the GPR Index Described in Appendix Table A.3 (audit_coded)See README.txt file in the directory for details.11. Appendix: Granger Causality Tests --- Appendix Table A.8 (granger_causality)(requires Stata)See README.txt file in the directory for details.input file: run_granger_test.do12. Appendix: Replication of Textual Analysis in Appendix Tables A.1 and A.2 (text_analysis)(requires Matlab, including text analytics toolbox, and Stata for generating the formatted tables in the appendix)See README.txt file in the directory for details.input files: run_find_grams_textanalytics.m and run_app_tables_1_2.do 13. Auxiliary Material: Estimation of the Country Disaster Events from 1900 through 2019 (disaster_estimation)(requires R)See README.txt file in the directory for details.14. Auxiliary Material: Stata File with Firm-Level Geopolitical Risk Data (firm_level_gpr)See README.txt file in the directory for details.15. Auxiliary Material: Search Queries for News-Based GPR Index (news_searches)See README.txt file in the directory
H
Replication Code for: Cross-sectional identification of private information
dataverse.harvard.edu
Updated Jul 7, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dominik Roesch (2025). Replication Code for: Cross-sectional identification of private information [Dataset]. http://doi.org/10.7910/DVN/PEL9QD
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.7910/DVN/PEL9QD
Dataset updated
Jul 7, 2025
Dataset provided by
Harvard Dataverse
Authors
Dominik Roesch
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
This document provides code to replicate the results of “Cross-Sectional Identification of Private Information” (Bongaerts, Rösch, and van Dijk, 2025) using data available through Wharton Research Data Services (WRDS). Specifically, it reproduces parts of Tables 1, 2, 3, 5, 6, and 8. The document is written as a knitr (R Markdown) file. To execute the code and generate a PDF, the recommended approach is to use a local installation of RStudio. Upon execution, the code automatically downloads and stores all necessary data from WRDS locally. To enable this, you must first configure WRDS access as described below. Note: While the published article uses data from Refinitiv/TRTH, this replication uses WRDS and TAQ data. The results are qualitatively robust to these differences in data sources and partially different methodologies, as discussed in the relevant sections below.
l
LSC (Leicester Scientific Corpus)
figshare.le.ac.uk
Updated Apr 15, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Neslihan Suzen (2020). LSC (Leicester Scientific Corpus) [Dataset]. http://doi.org/10.25392/leicester.data.9449639.v2
Explore at:
Unique identifier
https://doi.org/10.25392/leicester.data.9449639.v2
Dataset updated
Apr 15, 2020
Dataset provided by
University of Leicester
Authors
Neslihan Suzen
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Leicester
Description
The LSC (Leicester Scientific Corpus)

April 2020 by Neslihan Suzen, PhD student at the University of Leicester (ns433@leicester.ac.uk) Supervised by Prof Alexander Gorban and Dr Evgeny MirkesThe data are extracted from the Web of Science [1]. You may not copy or distribute these data in whole or in part without the written consent of Clarivate Analytics.[Version 2] A further cleaning is applied in Data Processing for LSC Abstracts in Version 1*. Details of cleaning procedure are explained in Step 6.* Suzen, Neslihan (2019): LSC (Leicester Scientific Corpus). figshare. Dataset. https://doi.org/10.25392/leicester.data.9449639.v1.Getting StartedThis text provides the information on the LSC (Leicester Scientific Corpus) and pre-processing steps on abstracts, and describes the structure of files to organise the corpus. This corpus is created to be used in future work on the quantification of the meaning of research texts and make it available for use in Natural Language Processing projects.LSC is a collection of abstracts of articles and proceeding papers published in 2014, and indexed by the Web of Science (WoS) database [1]. The corpus contains only documents in English. Each document in the corpus contains the following parts:1. Authors: The list of authors of the paper2. Title: The title of the paper 3. Abstract: The abstract of the paper 4. Categories: One or more category from the list of categories [2]. Full list of categories is presented in file ‘List_of _Categories.txt’. 5. Research Areas: One or more research area from the list of research areas [3]. Full list of research areas is presented in file ‘List_of_Research_Areas.txt’. 6. Total Times cited: The number of times the paper was cited by other items from all databases within Web of Science platform [4] 7. Times cited in Core Collection: The total number of times the paper was cited by other papers within the WoS Core Collection [4]The corpus was collected in July 2018 online and contains the number of citations from publication date to July 2018. We describe a document as the collection of information (about a paper) listed above. The total number of documents in LSC is 1,673,350.Data ProcessingStep 1: Downloading of the Data Online

The dataset is collected manually by exporting documents as Tab-delimitated files online. All documents are available online.Step 2: Importing the Dataset to R

The LSC was collected as TXT files. All documents are extracted to R.Step 3: Cleaning the Data from Documents with Empty Abstract or without CategoryAs our research is based on the analysis of abstracts and categories, all documents with empty abstracts and documents without categories are removed.Step 4: Identification and Correction of Concatenate Words in AbstractsEspecially medicine-related publications use ‘structured abstracts’. Such type of abstracts are divided into sections with distinct headings such as introduction, aim, objective, method, result, conclusion etc. Used tool for extracting abstracts leads concatenate words of section headings with the first word of the section. For instance, we observe words such as ConclusionHigher and ConclusionsRT etc. The detection and identification of such words is done by sampling of medicine-related publications with human intervention. Detected concatenate words are split into two words. For instance, the word ‘ConclusionHigher’ is split into ‘Conclusion’ and ‘Higher’.The section headings in such abstracts are listed below:

Background Method(s) Design Theoretical Measurement(s) Location Aim(s) Methodology Process Abstract Population Approach Objective(s) Purpose(s) Subject(s) Introduction Implication(s) Patient(s) Procedure(s) Hypothesis Measure(s) Setting(s) Limitation(s) Discussion Conclusion(s) Result(s) Finding(s) Material (s) Rationale(s) Implications for health and nursing policyStep 5: Extracting (Sub-setting) the Data Based on Lengths of AbstractsAfter correction, the lengths of abstracts are calculated. ‘Length’ indicates the total number of words in the text, calculated by the same rule as for Microsoft Word ‘word count’ [5].According to APA style manual [6], an abstract should contain between 150 to 250 words. In LSC, we decided to limit length of abstracts from 30 to 500 words in order to study documents with abstracts of typical length ranges and to avoid the effect of the length to the analysis.

Step 6: [Version 2] Cleaning Copyright Notices, Permission polices, Journal Names and Conference Names from LSC Abstracts in Version 1Publications can include a footer of copyright notice, permission policy, journal name, licence, author’s right or conference name below the text of abstract by conferences and journals. Used tool for extracting and processing abstracts in WoS database leads to attached such footers to the text. For example, our casual observation yields that copyright notices such as ‘Published by Elsevier ltd.’ is placed in many texts. To avoid abnormal appearances of words in further analysis of words such as bias in frequency calculation, we performed a cleaning procedure on such sentences and phrases in abstracts of LSC version 1. We removed copyright notices, names of conferences, names of journals, authors’ rights, licenses and permission policies identiﬁed by sampling of abstracts.Step 7: [Version 2] Re-extracting (Sub-setting) the Data Based on Lengths of AbstractsThe cleaning procedure described in previous step leaded to some abstracts having less than our minimum length criteria (30 words). 474 texts were removed.Step 8: Saving the Dataset into CSV FormatDocuments are saved into 34 CSV files. In CSV files, the information is organised with one record on each line and parts of abstract, title, list of authors, list of categories, list of research areas, and times cited is recorded in fields.To access the LSC for research purposes, please email to ns433@le.ac.uk.References[1]Web of Science. (15 July). Available: https://apps.webofknowledge.com/ [2]WoS Subject Categories. Available: https://images.webofknowledge.com/WOKRS56B5/help/WOS/hp_subject_category_terms_tasca.html [3]Research Areas in WoS. Available: https://images.webofknowledge.com/images/help/WOS/hp_research_areas_easca.html [4]Times Cited in WoS Core Collection. (15 July). Available: https://support.clarivate.com/ScientificandAcademicResearch/s/article/Web-of-Science-Times-Cited-accessibility-and-variation?language=en_US [5]Word Count. Available: https://support.office.com/en-us/article/show-word-count-3c9e6a11-a04d-43b4-977c-563a0e0d5da3 [6]A. P. Association, Publication manual. American Psychological Association Washington, DC, 1983.
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

Matteo Iacoviello; Dario Caldara (2021). Data and Code for: Measuring Geopolitical Risk [Dataset]. http://doi.org/10.3886/E154781V1

Data and Code for: Measuring Geopolitical Risk

Explore at:

3 scholarly articles cite this dataset (View in Google Scholar)

delimitedAvailable download formats

Unique identifier

https://doi.org/10.3886/E154781V1

Dataset updated

Nov 16, 2021

Dataset provided by

American Economic Association

Authors

Matteo Iacoviello; Dario Caldara

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Time period covered

Jan 1, 1900 - Dec 31, 2020

Area covered

United States and other countries

Description

OverviewThis archive contains the files to reproduce the results in "Measuring Geopolitical Risk" as well as any additional documentation referred in the paper. Each directory is self-contained. For each directory, download all the files in order to run the necessary scripts. Instructions are given in the README files.Updated data can be found on the geopolitical risk index webpage, which can be found at the following url: https://www.matteoiacoviello.com/gpr.htm For questions or comments, please contact iacoviel@gmail.comData Availability StatementAll the data used in this paper are provided in this repository, with the exception of the Compustat quarterly firm-level data, which can be downloaded from https://wrds-www.wharton.upenn.edu/pages/ with a registered account.Software used The codes here run and have been tested either on Stata/MP 16.0 (for *.do files), on Matlab R2019/A (for *.m files), on R Version 4.04 (for *.R files), and on Anaconda 3 (for *.py, *.ipynb files). Most codes run in seconds/minutes on a personal laptop with 16GB ram, with the exception of the R code to estimate disaster episodes, which takes about 2 days using the standard settings from the Nakamura et al (2013) paper (nIter = 50,000, nRuns = 40). Directory list and list of main input files - if any - in each directory1. Monthly Geopolitical Risk Data Used in the Paper (data_paper)See README.txt file in the directory for detailsdata_gpr_export.dta (Stata format)data_gpr_export.xls (Excel format)2. Replication of Section I: Tables 1-2, Figures 1-8, Appendix Tables A.3-A.6, and Appendix Figures A.1-A.4 and A.10-A.14 (figures_paper) (requires Stata)See README.txt in the directory for detailsinput file: run_figures_tables.do3. Replication of Section III : VAR Evidence - Figures 9-10 and Appendix Figures A.5-A.7 (var_results)(requires Matlab)See README.txt in the directory for detailsinput file: run_all.m4. Replication of Section IV : Country-Specific GPR and Disaster Probability and Quantile Regressions - Tables 3-4 (disaster_regressions)(requires Stata)See README.txt in the directory for detailsinput file: run_replication_country_gpr.do5. Replication of Section V : Firm-Specific Geopolitical Risk - Table 5, Figure 11, Appendix Table A.7, and Appendix Figure A.9 (firm_regressions)(requires Stata)See README.txt file in the directory for details.input file: run_replication_firm_shuffled.do(Note that replication of the results here requires downloading firm-level balance sheet data through Compustat/WRDS. See firm_documentation below for instructions on how to build the firm_level.dta file)6. Auxiliary Material (Section V): Construction of Industry-Specific Exposure to Geopolitical Risk - Appendix Figure A.8 (industry_regressions)(requires Stata)See README.txt file in the directory for details.input file: run_replication_industry.do7. Auxiliary Material: Documentation on how to Build the firm_level.dta file (firm_documentation)See README_BUILD.txt file in the directory for details.8. Auxiliary Material (Section II): Tabulations of Daily Narrative GPR Data from The New York Times (narrative_index)See README.txt file in the directory for details.9. Appendix: Details on the Construction of the Human GPR Index (human_index)See README.txt file in the directory for details.10. Appendix: Audit of Articles Belonging to the GPR Index Described in Appendix Table A.3 (audit_coded)See README.txt file in the directory for details.11. Appendix: Granger Causality Tests --- Appendix Table A.8 (granger_causality)(requires Stata)See README.txt file in the directory for details.input file: run_granger_test.do12. Appendix: Replication of Textual Analysis in Appendix Tables A.1 and A.2 (text_analysis)(requires Matlab, including text analytics toolbox, and Stata for generating the formatted tables in the appendix)See README.txt file in the directory for details.input files: run_find_grams_textanalytics.m and run_app_tables_1_2.do 13. Auxiliary Material: Estimation of the Country Disaster Events from 1900 through 2019 (disaster_estimation)(requires R)See README.txt file in the directory for details.14. Auxiliary Material: Stata File with Firm-Level Geopolitical Risk Data (firm_level_gpr)See README.txt file in the directory for details.15. Auxiliary Material: Search Queries for News-Based GPR Index (news_searches)See README.txt file in the directory

Clear search

Close search

Google apps

Main menu

Data and Code for: Measuring Geopolitical Risk

Replication Code for: Cross-sectional identification of private information

LSC (Leicester Scientific Corpus)

Data and Code for: Measuring Geopolitical Risk