Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset and code package is designed for execution in Google Colab, which provides a free cloud-based Python environment.
Follow these steps to reproduce the results.
Sign in with your Google account.
This repository contains two analysis notebooks:
Polis.ipynb
cpi.ipynb
Download them from Zenodo, or open them directly in Colab using File → Upload notebook.
Mounting Google Drive allows you to store the data permanently instead of uploading it each time.
from google.colab import drive
drive.mount('/content/drive')
After mounting, place all dataset files in a folder inside your Drive (e.g., My Drive/CorruptionStudy/
).
Ensure the following files are available in your Colab session (either uploaded directly or stored in Drive):
File | Description |
---|---|
estat_sdg_16_50_en.csv | Eurostat CPI dataset |
V-Dem-CY-Core-v15.csv | V-Dem Core dataset |
Controls.xlsx | Control variables |
Institutional.xlsx | Institutional variables |
Core.xlsx | Additional core variables |
If you are not using Google Drive, upload all files at the start of your session:
from google.colab import files
uploaded = files.upload()
Select all required .csv
and .xlsx
files when prompted.
Run the following command in a Colab cell:
!pip install pandas numpy statsmodels linearmodels openpyxl
If files are uploaded directly in Colab:
EUROSTAT_CPI_PATH = "/content/estat_sdg_16_50_en.csv"
VDEM_PATH = "/content/V-Dem-CY-Core-v15.csv"
CONTROLS_PATH = "/content/Controls.xlsx"
INSTITUTIONAL_PATH= "/content/Institutional.xlsx"
CORE_PATH = "/content/Core.xlsx"
If files are stored in Google Drive:
EUROSTAT_CPI_PATH = "/content/drive/My Drive/CorruptionStudy/estat_sdg_16_50_en.csv"
VDEM_PATH = "/content/drive/My Drive/CorruptionStudy/V-Dem-CY-Core-v15.csv"
Execute all cells in order (Runtime → Run all).
The notebook will:
Load CPI and V-Dem data
Merge with control variables
Standardize variables
Estimate two-way fixed effects (Driscoll–Kraay standard errors)
Output model summaries
To save results to Google Drive:
df.to_excel("/content/drive/My Drive/CorruptionStudy/results.xlsx")
To download directly:
from google.colab import files
files.download("results.xlsx")
If using this dataset or code, please cite the Zenodo record as indicated in the Cite As section.
Zenodo Dataset Description:
Title: Epistemic Legitimacy Traps in High-Trust Democracies: Replication Data and Code
Description:
This dataset contains replication materials for "Epistemic Legitimacy Traps: How High-Trust Institutions Silence Inconvenient Truths" - a study examining how friendship-based corruption persists in democratic institutions through systematic exclusion of internal critics.
Contents:
Key Variables:
Methodology: Two-way fixed effects panel regression (institutional analysis) and OLS with robust standard errors (individual analysis) testing the relationship between corruption measures, institutional quality, and public perceptions in high-trust democratic contexts.
Research Questions: How do high-trust institutions maintain legitimacy while systematically excluding internal criticism? What role do friendship networks play in enabling "clean corruption" that operates through relationships rather than material exchanges?
Keywords: corruption, epistemic injustice, institutional legitimacy, democracy, trust, whistleblowing, friendship networks, panel data
Citation: [Author], [Year]. "Epistemic Legitimacy Traps: How High-Trust Institutions Silence Inconvenient Truths." Business Ethics Quarterly [forthcoming].
Data Sources: V-Dem Institute, Eurostat, [Original Survey Data Source]
License: Creative Commons Attribution 4.0 International
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Material published at "https://opencodecom.net/post/2021-07-22-como-baixar-e-zipar-csv-utilizando-python/"
The purpose of this code is to produce a line graph visualization of COVID-19 data. This Jupyter notebook was built and run on Google Colab. This code will serve mostly as a guide and will need to be adapted where necessary to be run locally. The separate COVID-19 datasets uploaded to this Dataverse can be used with this code. This upload is made up of the IPYNB and PDF files of the code.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
We developed a pix2pix deep learning model for segmentation of subretinal fluid area in fundus photographs to detect central serous chorioretinopathy (CSC).
The dataset include fundus photographs and segmentation images from 105 eyes with CSC and 40 healthy eyes. We retrospectively reviewed the medical records and multimodal images of a total of 115 images of patients with had CSC at Aerospace Medical Center and from publicly accessible databases. Finally, the total dataset includes fundus photographs and segmentation images from 115 eyes with CSC and 40 healthy eyes from the medical center and publicly accessible datasets. The reference segmentation for subretinal fluid area was performed manually by an expert ophthalmologist.
First, the user should upload "pix2pix_csc_segmentation.ipynb" file in the Google drive. And open the file in the Google drive page. Second, please link the datasets to this colab notebook using Google drive. For example, we save the training dataset at "csc/segmentation/seg_pix/" (in the example.zip file) and the test dataset at "csc/segmentation/seg_pix_test/" (in the example.zip file, too). Third, run the codes in Google Colab by clicking buttons.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This upload contains data and documentation for the Python analysis undertaken in Google Colab as part of Episode 1 of the webinar series, conducted by Sambodhi's Center for Health Systems Research and Implementation (CHSRI). You can find the link to the Google Colab notebook here.
All the data uploaded here is open data published by the Toronto Police Public Safety Data Portal and the Ontario Ministry of Health.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Instructions (with screenshots) to replicate results from Section 3 of the manuscript are available in "Step-by-step Instructions to Replicate Results.pdf".-------------------------------------------------------------------------------------------------------------------Step 1: Download the replication materialsDownload the whole replication folder on figshare containing the code, data and replication files.Step 2: Replicate Tables in Section 3All of the data is available inside the sub-folder replication/Data. To replicate Tables 1 and 2 from section 3 of the manuscript run the Python file replicate_section3_tables.py locally on your computer. This will produce two .csv files containing Tables 1 and 2 (already provided). Note that it is not necessary to run the code in order to replicate the tables. The output data needed for replication is provided.Step 3: Replicate Figures in QGISThe Figures must be replicated using QGIS, freely available at https://www.qgis.org/. Open the QGIS project replicate_figures.qgz inside the replication/Replicate Figures sub-folder. It should auto-find the layer data. The Figures are replicated as layers in the project. Step 4: Running the code from scratchThe accompanying code for the manuscript IJGIS-2024-1305, entitled "Route-based Geocoding of Traffic Congestion-Related Social Media Texts on a Complex Network" runs on Google Colab as Python notebooks. Please follow the instructions below to run the entire geocoder and network mapper from scratch. The expected running time is of the order of 10 hours on free tier Google Colab. 4a) Upload to Google DriveUpload the entire replication folder to your Google Drive. Note the path (location) to which you have uploaded it. There are two Google Colab notebooks that need to be executed in their entirety. These are Code/Geocoder/The_Geocoder.ipynb and Code/Complex_Network/Complex_network_code.ipynb. They need to be run in order (Geocoder first and Complex Network second). 4b) Set the path In each Google Colab notebook, you have to set the variable called “REPL_PATH” to the location on your Google Drive where you uploaded the replication folder. Include the replication folder in the path. For example "/content/drive/MyDrive/replication"4c) Run the codeThe code is available in two sub-folders, replication/Code/Geocoder and replication/Code/Complex_Network. You may simply open the Google Colab notebooks inside each folder, mount your Google Drive, set the path and run all cells.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Scientific and related management challenges in the water domain require synthesis of data from multiple domains. Many data analysis tasks are difficult because datasets are large and complex; standard formats for data types are not always agreed upon nor mapped to an efficient structure for analysis; water scientists may lack training in methods needed to efficiently tackle large and complex datasets; and available tools can make it difficult to share, collaborate around, and reproduce scientific work. Overcoming these barriers to accessing, organizing, and preparing datasets for analyses will be an enabler for transforming scientific inquiries. Building on the HydroShare repository’s established cyberinfrastructure, we have advanced two packages for the Python language that make data loading, organization, and curation for analysis easier, reducing time spent in choosing appropriate data structures and writing code to ingest data. These packages enable automated retrieval of data from HydroShare and the USGS’s National Water Information System (NWIS), loading of data into performant structures keyed to specific scientific data types and that integrate with existing visualization, analysis, and data science capabilities available in Python, and then writing analysis results back to HydroShare for sharing and eventual publication. These capabilities reduce the technical burden for scientists associated with creating a computational environment for executing analyses by installing and maintaining the packages within CUAHSI’s HydroShare-linked JupyterHub server. HydroShare users can leverage these tools to build, share, and publish more reproducible scientific workflows. The HydroShare Python Client and USGS NWIS Data Retrieval packages can be installed within a Python environment on any computer running Microsoft Windows, Apple MacOS, or Linux from the Python Package Index using the PIP utility. They can also be used online via the CUAHSI JupyterHub server (https://jupyterhub.cuahsi.org/) or other Python notebook environments like Google Collaboratory (https://colab.research.google.com/). Source code, documentation, and examples for the software are freely available in GitHub at https://github.com/hydroshare/hsclient/ and https://github.com/USGS-python/dataretrieval.
This presentation was delivered as part of the Hawai'i Data Science Institute's regular seminar series: https://datascience.hawaii.edu/event/data-science-and-analytics-for-water/
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This archive reproduces a figure titled "Figure 3.2 Boone County population distribution" from Wang and vom Hofe (2007, p.60). The archive provides a Jupyter Notebook that uses Python and can be run in Google Colaboratory. The workflow uses the Census API to retrieve data, reproduce the figure, and ensure reproducibility for anyone accessing this archive.The Python code was developed in Google Colaboratory, or Google Colab for short, which is an Integrated Development Environment (IDE) of JupyterLab and streamlines package installation, code collaboration, and management. The Census API is used to obtain population counts from the 2000 Decennial Census (Summary File 1, 100% data). Shapefiles are downloaded from the TIGER/Line FTP Server. All downloaded data are maintained in the notebook's temporary working directory while in use. The data and shapefiles are stored separately with this archive. The final map is also stored as an HTML file.The notebook features extensive explanations, comments, code snippets, and code output. The notebook can be viewed in a PDF format or downloaded and opened in Google Colab. References to external resources are also provided for the various functional components. The notebook features code that performs the following functions:install/import necessary Python packagesdownload the Census Tract shapefile from the TIGER/Line FTP Serverdownload Census data via CensusAPI manipulate Census tabular data merge Census data with TIGER/Line shapefileapply a coordinate reference systemcalculate land area and population densitymap and export the map to HTMLexport the map to ESRI shapefileexport the table to CSVThe notebook can be modified to perform the same operations for any county in the United States by changing the State and County FIPS code parameters for the TIGER/Line shapefile and Census API downloads. The notebook can be adapted for use in other environments (i.e., Jupyter Notebook) as well as reading and writing files to a local or shared drive, or cloud drive (i.e., Google Drive).
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
2,121,458 records
I used Google Colab to check out this dataset and pull the column names using Pandas.
Sample code example: Python Pandas read csv file compressed with gzip and load into Pandas dataframe https://pastexy.com/106/python-pandas-read-csv-file-compressed-with-gzip-and-load-into-pandas-dataframe
Columns: ['Date received', 'Product', 'Sub-product', 'Issue', 'Sub-issue', 'Consumer complaint narrative', 'Company public response', 'Company', 'State', 'ZIP code', 'Tags', 'Consumer consent provided?', 'Submitted via', 'Date sent to company', 'Company response to consumer', 'Timely response?', 'Consumer disputed?', 'Complaint ID']
I did not modify the dataset.
Use it to practice with dataframes - Pandas or PySpark on Google Colab:
!unzip complaints.csv.zip
import pandas as pd df = pd.read_csv('complaints.csv') df.columns
df.head() etc.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Google Colab is a free product from Google Research, which allows programming in Python from a browser, and is primarily suitable for (1) machine learning, (2) data analysis, and (3) education. Google Colab is an online programming environment that requires no installation. It has basic math (Math), deep learning (TensorFlow), machine learning (Scikit-learn) and graphing (Matplotlib) tools. Like other Google online tools, it has collaborative properties that allow parallel programming by different users, whose work environments are stored in Google Drive. In addition, it is possible to document the codes with multimedia material, and to publish and import from GitHub.
Therefore, this project aims to use Google Colab as an assistance teaching tool that takes into account the interests and competencies of male and female biomedical engineering students to improve their experience and academic performance. The project proposes to use Google Colab in three ways: (1) implementing study cases in the health area with illustrative materials (e.g., images, sounds, web pages), (2) continuous monitoring by the teacher, and (3) asynchronous collaborative programming. For this purpose, a teacher's guide and a repository of example activities will be implemented in accordance with student feedback.
The project seeks to develop mathematical analytical thinking to quantitatively interpret measurements of biological systems. On the one hand, male students are expected to increase their interest in mathematical analysis through computational development (characteristic more preferred by men). On the other hand, female students are expected to have online mathematical counseling and study cases in the health area (characteristics more preferred by women).
The overall goal is to change the dynamics of teaching applied mathematics, which is an important factor of withdraw, mainly of women in engineering. Men and women have different interests and competencies in engineering, and the frequent ignoring of this fact in the teaching process could be a factor in the current gender gap in STEM (Science, Tech, Engineering and Math).
This proposal is scalable because Google Colab is a free, friendly and executable programming environment in the cloud. It does not involve economic, administrative or infrastructural costs. It is also transferable not only for other blocks and subjects of biomedical engineering, but also for any other engineering, where programming tools are indispensable. Google Colab is a simple and easy to learn environment.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Here's a clear Zenodo description for your dataset:
Dataset Description
This dataset supports the research paper "Nou Pa Bèt: Civic Substitution and Expressive Freedoms in Post-State Governance" which examines how civic participation functions as institutional substitution in fragile states, with Haiti as the primary case study. The dataset combines governance indicators from the World Bank's Worldwide Governance Indicators (WGI) with civic engagement measures from the Varieties of Democracy (V-Dem) project.
Files Included:
How to Use in Google Colab:
Step 1: Upload Files
from google.colab import files
import pandas as pd
import numpy as np
# Upload the files to your Colab environment
uploaded = files.upload()
# Select and upload: CivicEngagement_SelectedCountries_Last10Years.xlsx and wgidataset.xlsx
Step 2: Load the Datasets
# Load the civic engagement data (main analysis dataset)
civic_data = pd.read_excel('CivicEngagement_SelectedCountries_Last10Years.xlsx')
# Load the WGI data (if needed for extended analysis)
wgi_data = pd.read_excel('wgidataset.xlsx')
# Display basic information
print("Civic Engagement Dataset Shape:", civic_data.shape)
print("
Columns:", civic_data.columns.tolist())
print("
First few rows:")
civic_data.head()
Step 3: Run the Analysis Notebook
# Download and run the complete analysis notebook
!wget https://zenodo.org/record/[RECORD_ID]/files/civic.ipynb
# Then open civic.ipynb in Colab or copy/paste the code cells
Key Variables:
Dependent Variables (WGI):
Control_of_Corruption
- Extent to which public power is exercised for private gainGovernment_Effectiveness
- Quality of public services and policy implementationIndependent Variables (V-Dem):
v2x_partip
- Participatory Component Indexv2x_cspart
- Civil Society Participation Indexv2cademmob
- Freedom of Peaceful Assemblyv2cafres
- Freedom of Expressionv2csantimv
- Anti-System Movementsv2xdd_dd
- Direct Popular Vote IndexSample Countries: 21 fragile states including Haiti, Sierra Leone, Liberia, DRC, CAR, Guinea-Bissau, Chad, Niger, Burundi, Yemen, South Sudan, Mozambique, Sudan, Eritrea, Somalia, Mali, Afghanistan, Papua New Guinea, Togo, Cambodia, and Timor-Leste.
Quick Start Analysis:
# Install required packages
!pip install statsmodels scipy
# Basic regression replication
import statsmodels.api as sm
from statsmodels.stats.outliers_influence import variance_inflation_factor
# Prepare variables for regression
X = civic_data[['v2x_partip', 'v2x_cspart', 'v2cademmob', 'v2cafres', 'v2csantimv', 'v2xdd_dd']].dropna()
y_corruption = civic_data['Control_of_Corruption'].dropna()
y_effectiveness = civic_data['Government_Effectiveness'].dropna()
# Run regression (example for Control of Corruption)
X_const = sm.add_constant(X)
model = sm.OLS(y_corruption, X_const).fit(cov_type='HC3')
print(model.summary())
Citation: Brown, Scott M., Fils-Aime, Jempsy, & LaTortue, Paul. (2025). Nou Pa Bèt: Civic Substitution and Expressive Freedoms in Post-State Governance [Dataset]. Zenodo. https://doi.org/10.5281/zenodo.15058161
License: Creative Commons Attribution 4.0 International (CC BY 4.0)
Contact: For questions about data usage or methodology, please contact the corresponding author through the institutional affiliations provided in the paper.
This description provides clear, step-by-step instructions for researchers to immediately begin working with your data in Google Colab while explaining the theoretical and methodological context.
Prediction of Phakic Intraocular Lens Vault Using Machine Learning of Anterior Segment Optical Coherence Tomography Metrics. Authors: Kazutaka Kamiya, MD, PhD, Ik Hee Ryu, MD, MS, Tae Keun Yoo, MD, Jung Sub Kim MD, In Sik Lee, MD, PhD, Jin Kook Kim MD, Wakako Ando CO, Nobuyuki Shoji, MD, PhD, Tomofusa, Yamauchi, MD, PhD, Hitoshi Tabuchi, MD, PhD.
We hypothesize that machine learning of preoperative biometric data obtained by the As-OCT may be clinically beneficial for predicting the actual ICL vault. Therefore, we built the machine learning model using Random Forest to predict ICL vault after surgery.
This multicenter study comprised one thousand seven hundred forty-five eyes of 1745 consecutive patients (656 men and 1089 women), who underwent EVO ICL implantation (V4c and V5 Visian ICL with KS-AquaPORT) for the correction of moderate to high myopia and myopic astigmatism, and who completed at least a 1-month follow-up, at Kitasato University Hospital (Kanagawa, Japan), or at B&VIIT Eye Center (Seoul, Korea).
This data file (RFR_model(feature=12).mat) is the final trained random forest model for MATLAB 2020a.
Python version:
from sklearn.model_selection import train_test_split import pandas as pd import numpy as np from sklearn.ensemble import RandomForestClassifier from sklearn.ensemble import RandomForestRegressor
from google.colab import auth auth.authenticate_user() from google.colab import drive drive.mount('/content/gdrive')
dataset = pd.read_csv('gdrive/My Drive/ICL/data_icl.csv') dataset.head()
y = dataset['Vault_1M'] X = dataset.drop(['Vault_1M'], axis = 1)
train_X, test_X, train_y, test_y = train_test_split(X, y, test_size=0.2, random_state=0)
parameters = {'bootstrap': True, 'min_samples_leaf': 3, 'n_estimators': 500, 'criterion': 'mae' 'min_samples_split': 10, 'max_features': 'sqrt', 'max_depth': 6, 'max_leaf_nodes': None}
RF_model = RandomForestRegressor(**parameters) RF_model.fit(train_X, train_y) RF_predictions = RF_model.predict(test_X) importance = RF_model.feature_importances_
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Super-resolution image data and Bicubic image data by Teco-GAN using Google Colabratory Pro. Teco-GAN was taken from GitHub. How to run on Google Colab is described in the memo. The original images and the evaluated Excel data are also saved.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Data files and code (python notebooks, originally executed in the Google Colab environment) to reproduce the analyses and figures for the original submission of the manuscript "Geologic controls on apparent root-zone storage capacity". Note that for the initial data processing a Google Earth Engine account is required; however, you can skip this step and start by ingesting the resulting geotiff directly from a mounted google drive folder (just change the folder paths in the appropriate locations).
Contains replication material for the paper Psychological Characteristics of Leaders (PsyCL): A New Data Set by Schafer and Lambert, (Forthcoming). Files include necessary csvs and two Jupyter notebooks with python code. Download all files into a folder run notebooks with Google Colab or personal distribution of Anaconda. Detailed code is contained within.
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
OverviewThe Jupyter notebook (Bears_Ears_Economic_Impact.pynb) includes all Python code to process data and create all figures reported in the manuscript. The code can also be accessed via Google Colab here (https://colab.research.google.com/drive/19QptKut-FHMs0OIG6N9O7_C9qr_pSZC8?usp=sharing). All code is heavily commented and should be interpretable.Bureau of Labor Statistics Data and AnalysesAll Bureau of Labor Statistics data were acquired from the agency’s Quarterly Census on Employment and Wages online data portal (https://www.bls.gov/cew/downloadable-data-files.htm). These data are provided in the ‘BLS.zip’ file. You can extract these data and place them in a local drive, access the files via the Python code provided, and proceed through the creation of the figures.Economic Impact DataEconomic impact data, provided after the analyses for each scenario was run, are provided in both the ‘Economic_Impact_and_Tax_Revenues_Results.xlsx’ and ‘economic_indicators_data.csv’ files. The former is more interpretable for humans, the latter is called by the Python code provided to create the figures shown in the paper. The latter file will need to be placed in a local drive before executing the Python code which calls it.Comments or QuestionsPlease direct any questions to Dr. Jordan W. Smith (jordan.smith@usu.edu).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
2025-07-20
This dataset and accompanying Python notebook support the empirical analysis for the study "The Denial of Governance Failure in High-Trust Democracies". The project investigates how interpersonal trust and informal social norms interact with perceptions of institutional legitimacy and income distribution, using large-scale survey data.
The core research question: Do trust-based informal norms mask or reinforce corruption-like mechanisms in high-trust democracies through preferential behavior?
Included Files:
Controls.xlsx
– Demographic and background controls
Core.xlsx
– Variables on trust, fairness, helpfulness, and informal norms
Institutional.xlsx
– Institutional trust and confidence indicators
Polis.ipynb
– Google Colab–ready Python notebook containing full data processing and regression analysis pipeline (OLS with robust and clustered SEs, VIF checks, and visualization).
Open Google Colab: https://colab.research.google.com
Upload Files:
Click the folder icon (📁) in the left sidebar
Click the upload icon and add the 4 files: Controls.xlsx
, Core.xlsx
, Institutional.xlsx
, Polis.ipynb
Open the Notebook:
Double-click Polis.ipynb
in the file browser to open it
Run the Notebook:
Follow the step-by-step cells, which load data, clean it, run regressions, compute VIFs, and produce tables and plots
You may modify file paths if needed:
Replace '/content/Controls.xlsx'
with the corresponding uploaded path if you mount Google Drive instead
Requirements: The notebook auto-installs required packages:
!pip install pandas statsmodels openpyxl
Creative Commons Attribution 4.0 International (CC BY 4.0)
Copyright (C) 2025 The Authors.
Social capital, corruption, trust, governance, informal norms, OLS regression, high-trust democracies, GSS, inequality, institutional confidence
English
Python 3.11
Notebook-compatible with Google Colab and Jupyter
v1.0.0
Zenodo
This research received no specific grant but draws on publicly available survey data.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The accompanying files provide the data and processing code for the analyses and figure/table generation for the manuscript "Emergence from drought: Direct and remote observations of moisture storage deficits, plant water stress, and groundwater reveal ecohydrologic recovery" by Whiting et al.
The processing code is in the form of python notebooks, which were originally executed via Google's Colab environment.
To run the 'primary_script' as is, simply upload it to a Google drive, run it, and follow the instructions to permit the code to connect with your Google drive. If there are issues loading the data, the code can be executed by re-arranging the file paths to load in the static .CSV saved data files in the CSVs folder as pandas data frames at the appropriate locations.
The 'PML_LSTM_RANCHO' script is included, but please contact David Dralle (david.dralle@usda.gov) if you are interested in running the code as is.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
ancestry-anonymized.txt
file.#### File: collator-ipynb.txt**Description:** The Jupyter Notebook containing the Python script used to process raw conversation data into the collated-anonymized.txt
transcript.#### File: ancestry-anonymized.txt**Description:** A "genealogy" for every sentence in the final manuscript. This file, generated by the sentence-ancestry-ipynb.txt
script, traces the origin of each sentence backward through the conversational record. It uses semantic similarity (cosine similarity) to identify the most likely predecessor for each sentence at every stage of the dialogue, providing a granular view of how ideas and phrasings evolved over time.#### File: collated-anonymized.txt**Description:** A chronologically-sorted transcript of the human-AI conversations. This file was generated by the collator-ipynb.txt
script, which parses raw JSON exports from ChatGPT, preserves the non-linear, branching structure of the dialogues using indentation, and assigns a unique line number and type-code ([P] for Prompt, [C] for Completion, [N] for Neither/metadata) to every line.## Code/software### Viewing Data FilesThe data files (collated-anonymized.txt
and ancestry-anonymized.txt
) are plain text (.txt
) files and can be viewed using any standard text editor available on all major operating systems (e.g., Notepad on Windows, TextEdit on macOS, or cross-platform editors like VS Code, Sublime Text, or Notepad++).For optimal viewing of collated-anonymized.txt
, a text editor that can handle long lines without word wrapping is recommended to preserve the indentation that represents the conversational branching structure.### Running Code/SoftwareThe provided scripts (collator-ipynb.txt
and sentence-ancestry-ipynb.txt
) are Jupyter Notebooks and require a Python 3 environment to run. The code can be executed using free and open-source software such as Jupyter Notebook, JupyterLab, or a cloud-based service like Google Colab.Note that the ipynb files (if extracted from the PDF rather than downloaded from the figshare item) need to be renamed to use ipynb
as the extension rather than txt
. I used txt
because Adobe Acrobat would not allow me to use ipynb
.The following Python packages must be installed:* torch
* sentence-transformers
* odfpy
These can be installed via pip:pip install torch sentence-transformers odfpy
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The accompanying files provide the data and processing code for the analyses and figure/table generation for the manuscript "Bedrock vadose zone storage dynamics under extreme drought: consequences for plant water availability, recharge, and runoff" by Hahm et al.
The processing code is in the form of python notebooks, which were originally excuted via Google's colab environment.
To run the code as-is, the entire folder should be placed into the appropriate folder path structure on a user's google drive folder, a Google Earth Engine account must exist, and the code should be run from Colab. This folder structure is: 'My Drive/Colab Notebooks/Rancho - Rock Moisture/'
If this is not possible, the code can be executed by re-arranging the file paths to load in the static .CSV saved data files in the CSVs folder as pandas data frames at the appropriate locations.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset and code package is designed for execution in Google Colab, which provides a free cloud-based Python environment.
Follow these steps to reproduce the results.
Sign in with your Google account.
This repository contains two analysis notebooks:
Polis.ipynb
cpi.ipynb
Download them from Zenodo, or open them directly in Colab using File → Upload notebook.
Mounting Google Drive allows you to store the data permanently instead of uploading it each time.
from google.colab import drive
drive.mount('/content/drive')
After mounting, place all dataset files in a folder inside your Drive (e.g., My Drive/CorruptionStudy/
).
Ensure the following files are available in your Colab session (either uploaded directly or stored in Drive):
File | Description |
---|---|
estat_sdg_16_50_en.csv | Eurostat CPI dataset |
V-Dem-CY-Core-v15.csv | V-Dem Core dataset |
Controls.xlsx | Control variables |
Institutional.xlsx | Institutional variables |
Core.xlsx | Additional core variables |
If you are not using Google Drive, upload all files at the start of your session:
from google.colab import files
uploaded = files.upload()
Select all required .csv
and .xlsx
files when prompted.
Run the following command in a Colab cell:
!pip install pandas numpy statsmodels linearmodels openpyxl
If files are uploaded directly in Colab:
EUROSTAT_CPI_PATH = "/content/estat_sdg_16_50_en.csv"
VDEM_PATH = "/content/V-Dem-CY-Core-v15.csv"
CONTROLS_PATH = "/content/Controls.xlsx"
INSTITUTIONAL_PATH= "/content/Institutional.xlsx"
CORE_PATH = "/content/Core.xlsx"
If files are stored in Google Drive:
EUROSTAT_CPI_PATH = "/content/drive/My Drive/CorruptionStudy/estat_sdg_16_50_en.csv"
VDEM_PATH = "/content/drive/My Drive/CorruptionStudy/V-Dem-CY-Core-v15.csv"
Execute all cells in order (Runtime → Run all).
The notebook will:
Load CPI and V-Dem data
Merge with control variables
Standardize variables
Estimate two-way fixed effects (Driscoll–Kraay standard errors)
Output model summaries
To save results to Google Drive:
df.to_excel("/content/drive/My Drive/CorruptionStudy/results.xlsx")
To download directly:
from google.colab import files
files.download("results.xlsx")
If using this dataset or code, please cite the Zenodo record as indicated in the Cite As section.
Zenodo Dataset Description:
Title: Epistemic Legitimacy Traps in High-Trust Democracies: Replication Data and Code
Description:
This dataset contains replication materials for "Epistemic Legitimacy Traps: How High-Trust Institutions Silence Inconvenient Truths" - a study examining how friendship-based corruption persists in democratic institutions through systematic exclusion of internal critics.
Contents:
Key Variables:
Methodology: Two-way fixed effects panel regression (institutional analysis) and OLS with robust standard errors (individual analysis) testing the relationship between corruption measures, institutional quality, and public perceptions in high-trust democratic contexts.
Research Questions: How do high-trust institutions maintain legitimacy while systematically excluding internal criticism? What role do friendship networks play in enabling "clean corruption" that operates through relationships rather than material exchanges?
Keywords: corruption, epistemic injustice, institutional legitimacy, democracy, trust, whistleblowing, friendship networks, panel data
Citation: [Author], [Year]. "Epistemic Legitimacy Traps: How High-Trust Institutions Silence Inconvenient Truths." Business Ethics Quarterly [forthcoming].
Data Sources: V-Dem Institute, Eurostat, [Original Survey Data Source]
License: Creative Commons Attribution 4.0 International