Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset and code package is designed for execution in Google Colab, which provides a free cloud-based Python environment.
Follow these steps to reproduce the results.
Sign in with your Google account.
This repository contains two analysis notebooks:
Polis.ipynb
cpi.ipynb
Download them from Zenodo, or open them directly in Colab using File → Upload notebook.
Mounting Google Drive allows you to store the data permanently instead of uploading it each time.
from google.colab import drive
drive.mount('/content/drive')
After mounting, place all dataset files in a folder inside your Drive (e.g., My Drive/CorruptionStudy/
).
Ensure the following files are available in your Colab session (either uploaded directly or stored in Drive):
File | Description |
---|---|
estat_sdg_16_50_en.csv | Eurostat CPI dataset |
V-Dem-CY-Core-v15.csv | V-Dem Core dataset |
Controls.xlsx | Control variables |
Institutional.xlsx | Institutional variables |
Core.xlsx | Additional core variables |
If you are not using Google Drive, upload all files at the start of your session:
from google.colab import files
uploaded = files.upload()
Select all required .csv
and .xlsx
files when prompted.
Run the following command in a Colab cell:
!pip install pandas numpy statsmodels linearmodels openpyxl
If files are uploaded directly in Colab:
EUROSTAT_CPI_PATH = "/content/estat_sdg_16_50_en.csv"
VDEM_PATH = "/content/V-Dem-CY-Core-v15.csv"
CONTROLS_PATH = "/content/Controls.xlsx"
INSTITUTIONAL_PATH= "/content/Institutional.xlsx"
CORE_PATH = "/content/Core.xlsx"
If files are stored in Google Drive:
EUROSTAT_CPI_PATH = "/content/drive/My Drive/CorruptionStudy/estat_sdg_16_50_en.csv"
VDEM_PATH = "/content/drive/My Drive/CorruptionStudy/V-Dem-CY-Core-v15.csv"
Execute all cells in order (Runtime → Run all).
The notebook will:
Load CPI and V-Dem data
Merge with control variables
Standardize variables
Estimate two-way fixed effects (Driscoll–Kraay standard errors)
Output model summaries
To save results to Google Drive:
df.to_excel("/content/drive/My Drive/CorruptionStudy/results.xlsx")
To download directly:
from google.colab import files
files.download("results.xlsx")
If using this dataset or code, please cite the Zenodo record as indicated in the Cite As section.
Zenodo Dataset Description:
Title: Epistemic Legitimacy Traps in High-Trust Democracies: Replication Data and Code
Description:
This dataset contains replication materials for "Epistemic Legitimacy Traps: How High-Trust Institutions Silence Inconvenient Truths" - a study examining how friendship-based corruption persists in democratic institutions through systematic exclusion of internal critics.
Contents:
Key Variables:
Methodology: Two-way fixed effects panel regression (institutional analysis) and OLS with robust standard errors (individual analysis) testing the relationship between corruption measures, institutional quality, and public perceptions in high-trust democratic contexts.
Research Questions: How do high-trust institutions maintain legitimacy while systematically excluding internal criticism? What role do friendship networks play in enabling "clean corruption" that operates through relationships rather than material exchanges?
Keywords: corruption, epistemic injustice, institutional legitimacy, democracy, trust, whistleblowing, friendship networks, panel data
Citation: [Author], [Year]. "Epistemic Legitimacy Traps: How High-Trust Institutions Silence Inconvenient Truths." Business Ethics Quarterly [forthcoming].
Data Sources: V-Dem Institute, Eurostat, [Original Survey Data Source]
License: Creative Commons Attribution 4.0 International
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Classify video clips with natural scenes of actions performed by people visible in the videos.
See the UCF101 Dataset web page: https://www.crcv.ucf.edu/data/UCF101.php#Results_on_UCF101
This example datasets consists of the 5 most numerous video from the UCF101 dataset. For the top 10 version see: https://doi.org/10.5281/zenodo.7882861 .
Based on this code: https://keras.io/examples/vision/video_classification/ (needs to be updated, if has not yet been already; see the issue: https://github.com/keras-team/keras-io/issues/1342).
Testing if data can be downloaded from figshare with `wget`, see: https://github.com/mojaveazure/angsd-wrapper/issues/10
For generating the subset, see this notebook: https://colab.research.google.com/github/sayakpaul/Action-Recognition-in-TensorFlow/blob/main/Data_Preparation_UCF101.ipynb -- however, it also needs to be adjusted (if has not yet been already - then I will post a link to the notebook here or elsewhere, e.g., in the corrected notebook with Keras example).
I would like to thank Sayak Paul for contacting me about his example at Keras documentation being out of date.
Cite this dataset as:
Soomro, K., Zamir, A. R., & Shah, M. (2012). UCF101: A dataset of 101 human actions classes from videos in the wild. arXiv preprint arXiv:1212.0402. https://doi.org/10.48550/arXiv.1212.0402
To download the dataset via the command line, please use:
wget -q https://zenodo.org/record/7924745/files/ucf101_top5.tar.gz -O ucf101_top5.tar.gz
tar xf ucf101_top5.tar.gz
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
## Overview
Ro90 is a dataset for object detection tasks - it contains Accepts Or Rejects annotations for 3,900 images.
## Getting Started
You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
## License
This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
The DeepNet-1M dataset was published Facebook researchers in with the paper Parameter Prediction for Unseen Deep Architectures (NeurIPS 2021). The relevant code is maintained in this repository.
Source: ppuda-repository |
https://github.com/facebookresearch/ppuda/raw/main/data/figs/colab.gif" alt="gif">
This dataset was downloaded using scripts shared at this repository by the researchers.
The data is licensed according to the License in this GitHub repository. At the time of creating this dataset, the repository was licensed under MIT license.
We developed the first publicly available spatially explicit estimates of the human alterations along the global floodplains during the recent 27 years (1992-2019) at 250-m resolution. To maximize the reuse of our datasets and advance the open science of human floodplain alteration, we developed three web-based programming tools: (1) Floodplain Mapping Tool, (2) Land Use Change Tool, and (3) Human Alteration Tool supported with tutorials and step-by-step audiovisual instructions. Our data reveal a significant loss of natural floodplains worldwide with 460,000 km2 of new agricultural and 140,000 km2 of new developed areas between 1992 and 2019. This dataset offers critical new insights into how floodplains are being destroyed, which will help decision-makers to reinforce strategies to conserve and restore floodplain functions and habitat. This dataset is not publicly accessible because: EPA scientists provided context and commentary but did not do any of the analyses or handle any of the data. It can be accessed through the following means: The entire data record can be downloaded as a single zip file from this web link: http://www.hydroshare.org/resource/cdb5fd97e0644a14b22e58d05299f69b. The global floodplain alteration dataset is derived entirely through ArcGIS 10.5 and ENVI 5.1 geospatial analysis platforms. To assist in reuse and application of the dataset, we developed additional Python codes aggregated as three web-based tools: Floodplain Mapping Tool: https://colab.research.google.com/drive/1xQlARZXKPexmDInYV-EMoJ-HZxmFL-eW?usp=sharing. Land Use Change Tool: https://colab.research.google.com/drive/1vmIaUCkL66CoTv4rNRIWpJXYXp4TlAKd?usp=sharing. Human Alteration Tool: https://colab.research.google.com/drive/1r2zNJNpd3aWSuDV2Kc792qSEjvDbFtBy?usp=share_link See Usage Notes section in the journal article for details. Format: The global floodplain alteration dataset is available through the HydroShare open geospatial data platform. Our data record also includes all corresponding input data, intermediate calculations, and supporting information. This dataset is associated with the following publication: Rajib, A., Q. Zheng, C. Lane, H. Golden, J. Christensen, I. Isibor, and K. Johnson. Human alterations of the global floodplains 1992–2019. Scientific Data. Springer Nature, New York, NY, USA, 10: 499, (2023).
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Common Voice is a corpus of speech data read by users on the Common Voice website, and based upon text from a number of public domain sources like user submitted blog posts, old books, movies, and other public speech corpora. Its primary purpose is to enable the training and testing of automatic speech recognition (ASR) systems.
In google colab, i downloaded the .tar.gz from common-voice (mozilla). And placed the compressed file in a folder marked the folder as dataset and straight-up uploaded it
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
🚨 New Dataset Version Released!
We are excited to announce the release of Version [1.1] of our dataset!
This update includes:
[L2A & L1C support]. [Temporal support]. [Check the data without downloading (Cloud-optimized properties)].
📥 Go to: https://huggingface.co/datasets/tacofoundation/cloudsen12 and follow the instructions in colab
CloudSEN12 HIGH-QUALITY
A Benchmark Dataset for Cloud Semantic Understanding
CloudSEN12… See the full description on the dataset page: https://huggingface.co/datasets/csaybar/CloudSEN12-high.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Accident Detection Model is made using YOLOv8, Google Collab, Python, Roboflow, Deep Learning, OpenCV, Machine Learning, Artificial Intelligence. It can detect an accident on any accident by live camera, image or video provided. This model is trained on a dataset of 3200+ images, These images were annotated on roboflow.
https://user-images.githubusercontent.com/78155393/233774342-287492bb-26c1-4acf-bc2c-9462e97a03ca.png" alt="Survey">
About this data An ultrasound dataset to use in the discovery of ultrasound features associated with pain and radiographic change in KOA is highly innovative and will be a major step forward for the field. These ultrasound images originate from the diverse and inclusive population-based Johnston County Health Study (JoCoHS). This dataset is designed to adhere to FAIR principles and was funded in part by an Administrative Supplement to Improve the AI/ML-Readiness of NIH-Supported Data (3R01AR077060-03S1).
Working with this dataset WorkingWithTheDataset.ipynb Jupyter notebook If you are familiar with working with Jupyter notebooks, we recommend using the WorkingWithTheDataset.ipynb
Jupyter notebook to retrieve, validate, and learn more about the dataset. You should downloading the latest WorkingWithTheDataset.ipynb
file and uploading it to an online Jupyter environment such as https://colab.research.google.com or use the notebook in your Jupyter environment of choice. You will also need to download the CONFIGURATION_SETTINGS.template.md
file from this dataset since the contents are used to configure the Jupyter notebook. Note: at the time of this writing, we do not recommend using Binder (mybinder.org) if you are interested in only reviewing the WorkingWithTheDataset.ipynb notebook. When Binder loads the dataset, it will download all files from this dataset, resulting in a long build time. However, if you plan to work with all files in the dataset then Binder might work for you. We do not offer support for this service or other Jupyter Lab environments.
Metadata The DatasetMetadata.json
file contains general information about the files and variables within this dataset. We use it as our validation metadata to verify the data we are importing into this Dataverse dataset. This file is also the most comprehensive with regards to the dataset metadata.
Data collection in progress This dataset is not complete and will be updated regularly as additional data is collected.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This data set contains all classifications that the Gravity Spy Machine Learning model for LIGO glitches from the first three observing runs (O1, O2 and O3, where O3 is split into O3a and O3b). Gravity Spy classified all noise events identified by the Omicron trigger pipeline in which Omicron identified that the signal-to-noise ratio was above 7.5 and the peak frequency of the noise event was between 10 Hz and 2048 Hz. To classify noise events, Gravity Spy made Omega scans of every glitch consisting of 4 different durations, which helps capture the morphology of noise events that are both short and long in duration.
There are 22 classes used for O1 and O2 data (including No_Glitch and None_of_the_Above), while there are two additional classes used to classify O3 data (while None_of_the_Above was removed).
For O1 and O2, the glitch classes were: 1080Lines, 1400Ripples, Air_Compressor, Blip, Chirp, Extremely_Loud, Helix, Koi_Fish, Light_Modulation, Low_Frequency_Burst, Low_Frequency_Lines, No_Glitch, None_of_the_Above, Paired_Doves, Power_Line, Repeating_Blips, Scattered_Light, Scratchy, Tomte, Violin_Mode, Wandering_Line, Whistle
For O3, the glitch classes were: 1080Lines, 1400Ripples, Air_Compressor, Blip, Blip_Low_Frequency, Chirp, Extremely_Loud, Fast_Scattering, Helix, Koi_Fish, Light_Modulation, Low_Frequency_Burst, Low_Frequency_Lines, No_Glitch, None_of_the_Above, Paired_Doves, Power_Line, Repeating_Blips, Scattered_Light, Scratchy, Tomte, Violin_Mode, Wandering_Line, Whistle
The data set is described in Glanzer et al. (2023), which we ask to be cited in any publications using this data release. Example code using the data can be found in this Colab notebook.
If you would like to download the Omega scans associated with each glitch, then you can use the gravitational-wave data-analysis tool GWpy. If you would like to use this tool, please install anaconda if you have not already and create a virtual environment using the following command
conda create --name gravityspy-py38 -c conda-forge python=3.8 gwpy pandas psycopg2 sqlalchemy
After downloading one of the CSV files for a specific era and interferometer, please run the following Python script if you would like to download the data associated with the metadata in the CSV file. We recommend not trying to download too many images at one time. For example, the script below will read data on Hanford glitches from O2 that were classified by Gravity Spy and filter for only glitches that were labelled as Blips with 90% confidence or higher, and then download the first 4 rows of the filtered table.
from gwpy.table import GravitySpyTable
H1_O2 = GravitySpyTable.read('H1_O2.csv')
H1_O2[(H1_O2["ml_label"] == "Blip") & (H1_O2["ml_confidence"] > 0.9)]
H1_O2[0:4].download(nproc=1)
Each of the columns in the CSV files are taken from various different inputs:
[‘event_time’, ‘ifo’, ‘peak_time’, ‘peak_time_ns’, ‘start_time’, ‘start_time_ns’, ‘duration’, ‘peak_frequency’, ‘central_freq’, ‘bandwidth’, ‘channel’, ‘amplitude’, ‘snr’, ‘q_value’] contain metadata about the signal from the Omicron pipeline.
[‘gravityspy_id’] is the unique identifier for each glitch in the dataset.
[‘1400Ripples’, ‘1080Lines’, ‘Air_Compressor’, ‘Blip’, ‘Chirp’, ‘Extremely_Loud’, ‘Helix’, ‘Koi_Fish’, ‘Light_Modulation’, ‘Low_Frequency_Burst’, ‘Low_Frequency_Lines’, ‘No_Glitch’, ‘None_of_the_Above’, ‘Paired_Doves’, ‘Power_Line’, ‘Repeating_Blips’, ‘Scattered_Light’, ‘Scratchy’, ‘Tomte’, ‘Violin_Mode’, ‘Wandering_Line’, ‘Whistle’] contain the machine learning confidence for a glitch being in a particular Gravity Spy class (the confidence in all these columns should sum to unity). These use the original 22 classes in all cases.
[‘ml_label’, ‘ml_confidence’] provide the machine-learning predicted label for each glitch, and the machine learning confidence in its classification.
[‘url1’, ‘url2’, ‘url3’, ‘url4’] are the links to the publicly-available Omega scans for each glitch. ‘url1’ shows the glitch for a duration of 0.5 seconds, ‘url2’ for 1 seconds, ‘url3’ for 2 seconds, and ‘url4’ for 4 seconds.
For the most recently uploaded training set used in Gravity Spy machine learning algorithms, please see Gravity Spy Training Set on Zenodo.
For detailed information on the training set used for the original Gravity Spy machine learning paper, please see Machine learning for Gravity Spy: Glitch classification and dataset on Zenodo.
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This is a dataset for detecting banana quality using ML. This dataset contains four categories: Unripe, Ripe, Overripe and Rotten. In this dataset, there are enormous amount of images which will help users to train the ML model conveniently and easily.
NOTE: THIS DATASET HAS BEEN PICKED FROM https://universe.roboflow.com/roboflow-universe-projects/banana-ripeness-classification. I WAS FACING DIFFICULTIES WHILE DOWNLOADING DATASET DIRECTLY TO THE GOOGLE COLAB TO TRAIN MY CNN MODEL AS A PART OF UNIVERSITY PROJECT. ALL CREDITS FOR THIS DATASET, AS FAR AS MY KNOWLEDGE GOES, GOES TO ROBOFLOW. I DO NOT INTEND TO TAKE ANY CREDITS MYSELF OR UNETHICALLY CLAIM OWNERSHIP, I JUST UPLOADED DATASET HERE FOR MY CONVENIENCE, THANK YOU.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This repository contains the quantified single cell dataset for the chimeric overexpression spheroid experiment described in:
A quantitative analysis of the interplay of environment, neighborhood, and cell state in 3D spheroids
Vito RT Zanotelli
Matthias Leutenegger
Xiao‐Kang Lun
Fanny Georgi
Natalie de Souza
Bernd Bodenmiller
Mol Syst Biol. (2020) 16: e9798
https://doi.org/10.15252/msb.20209798
Please cite this article if you re-use any of the data or code.
This is an export of the processed dataset after quality control. Please consult the README bellow for a description of the data.
An example script to browse this data using Python can be found here: https://github.com/BodenmillerGroup/SpheroidPublication/blob/oexp_analysis/workflow/notebooks/99_browse_export_data.py.ipynb
or be interactively tried on Google Colab:
Export Oexp Analysis
by Vito Zanotelli et al, Bodenmiller Lab UZH, 2020
This is the export of the overexpression dataset from the paper: "A quantitative analysis of the interplay of environment, neighborhood and cell state in 3D spheroids" Raw data: 10.5281/zenodo.4055780 Please cite the paper if you use this data!
###Experimental design (More details in the paper):
Overexpressing 51 signaling constructs, 4 control contstructs (2x GFP, 1x HcRed, 1x Luciferase) and 1 'empty' mock transfection controls grown in 5 replicates on 5 different plates ('empty' control has 35 replicates).
Most signaling constructs have a GFP tag. Typically only a subset of cells per sphere were overexpresing.
4 plates were pooled into one block with 240 well barcoding, 2 plates in one block with 120 well barcoding.
A pellet of each pool was generated and cut into several 6um thick sections
A subset of these sections (='sites') were stained with an IMC pane and acquired as 1 or more 'acquisitions' containing multiple spheres each.
Spheres in these acquisitions were identified via computer vision and croped into individual 'images'
In each image the following 'objects' were identified via computer vision:
The data was exported using the 'anndata' csv format: https://anndata.readthedocs.io/en/stable/anndata.AnnData.html
Some notes on the files and their columns:
{object}_X.csv:
{object}_var.csv:
- Shape: #features x #columns
- Columns:
- measurement_id: unique measurement id
- measurement_name: Name of measurement (this export: all compensated mean intensity)
- measurement_type: Type of measurement (this export: only Intensity features)
- channel_name, metal: Isotope name
- stack_name: multicolor image stack containing this channel
- ref_plane_number: position of the measured channel in it's image stack
- goodname: The name of the marker
no prefix: total protein
p-: phopho protein
[]: phospho residue
BC: barcoding metal
- Antibody Clone: antibody clone name
- is_cc: bool, indication if this marker is considered a classical cell cycle marker
- working: bool, indicates if the markers are working and of biological value. I would only look at the marker with working=1
Not important:
- scale: scale of raw data (data is already scaled)
- plane_id: database id for image plane.
image_meta.csv:
Columns:
Image metadata:
All cuts in the same section were stained together.
sampleblock_id: unique id of the sample block this sphere was pooled and processed in.
Not important:
</li>
<li>
<p>Experimental metadata:</p>
<ul>
<li>condition_id: id of the physical spheroid the slice belongs to. Unique to each sphere replicate.</li>
<li>condition_name: name of the growth condition this sphere came from</li>
<li>plate_id: id of the plate the spheroid was grown in</li>
<li>well_name: position of the well the spheroid was grown in</li>
<li>sampleblock_id/sampleblock_name: id/name of the pooled block the spheroid was processed in</li>
<li>site_id: corresponds to the site the spheroid slice was located on. All spheroid slices in the same site were stained together.</li>
<li>file_name: filename of the segmentation mask found in masks_cell</li>
</ul>
</li>
<li>
<p>Filenames:</p>
<ul>
<li>mask<em>filename</em>{object}: filename of the object mask corresponding to this image</li>
<li>image_stack<em>filename</em>{imagestack}: filename of the image stack with this name. Note: all mean intensity measurements are usually done in the 'FullStackFiltered' (raw image with only filtered for strong outliers) and then compensated for metal impurities (as recomended in Chevrier, Zanotelli and Crowell 2018). For
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
GigaSpeech is an evolving, multi-domain English speech recognition corpus with 10,000 hours of high quality labeled audio suitable for supervised training, and 40,000 hours of total audio suitable for semi-supervised and unsupervised training. Around 40,000 hours of transcribed audio is first collected from audiobooks, podcasts and YouTube, covering both read and spontaneous speaking styles, and a variety of topics, such as arts, science, sports, etc. A new forced alignment and segmentation pipeline is proposed to create sentence segments suitable for speech recognition training, and to filter out segments with low-quality transcription. For system training, GigaSpeech provides five subsets of different sizes, 10h, 250h, 1000h, 2500h, and 10000h. For our 10,000-hour XL training subset, we cap the word error rate at 4% during the filtering/validation stage, and for all our other smaller training subsets, we cap it at 0%. The DEV and TEST evaluation sets, on the other hand, are re-processed by professional human transcribers to ensure high transcription quality.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset is PBMC 1, Drop-Seq, from the publication
Ding J, Adiconis X, Simmons SK, Kowalczyk MS, Hession CC, Marjanovic ND, et al. Systematic comparison of single-cell and single-nucleus RNA-sequencing methods. Nature Biotechnology. 2020 Apr 6;1–10.
This is distributed according to the Creative Commons 4.0 license.
The files are also available in human cell atlas. The reason we put the data here is because we want to be able to download the data from Google Colab, and we found no way to do that in the HCA DCP.
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Corpus Nummorum - Coin Image Dataset
This dataset is a collection of ancient coin images from three different sources: the Corpus Nummorum (CN) project, the Münzkabinett Berlin and the Bibliothèque nationale de France, Département des Monnaies, médailles et antiques. It covers Greek and Roman coins from ancient Thrace, Moesia Inferior, Troad and Mysia. This is a selection of the coins published on the CN portal (due to copyrights).
The dataset contains 115,160 images with about 29,000 unique coins. The images are split in three main folders with different assignment of the coins. Each main folder is sorted with the help fo subfolders which hold the coin images. The "dataset_coins" folder contains the coin photos divided into obverse and reverse and arranged by coin types. In the "dataset_types" folder the obverse and reverse image of the coins are concatenated and transformed to a quadratic format with black bars on the top and bottom. The images here are sorted by their coin type. The last folder "dataset_mints" contains the also concatenated images sorted by their mint. An "sources" csv file holds the sources for every image. Due to copyrights the image size is limited to 299*299 pixels. However, this should be sufficient for most ML approaches.
The main purpose for this dataset in the CN project is the training of Machine Learning based Image Recognition models. We use three different Convolutional Neural Network based architectures: VGG16, VGG19 and ResNet50. Our best model (VGG16) archieves on this dataset a 79% Top-1 and a 97% Top-5 accuracy for the coin type recognition. The mint recognition achieves an 79% Top-1 and 94% Top-5 accuracy. We have a Colab notebook with two models (trained on the whole CN dataset) online.
During the summer semester 2023, we held the "Data Challenge" event at our Department of Computer Science at the Goethe-University. We gave our students this dataset with the task to achieve better results than us. Here are their experiments:
Team 1: Voting and stacking of models
Team 4: Dockerized TIMM Computer Vision Backend & FastAPI
Now we would like to invite you to try out your own ideas and models on our coin data.
If you have any questions or suggestions, please, feel free to contact us.
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset and code package is designed for execution in Google Colab, which provides a free cloud-based Python environment.
Follow these steps to reproduce the results.
Sign in with your Google account.
This repository contains two analysis notebooks:
Polis.ipynb
cpi.ipynb
Download them from Zenodo, or open them directly in Colab using File → Upload notebook.
Mounting Google Drive allows you to store the data permanently instead of uploading it each time.
from google.colab import drive
drive.mount('/content/drive')
After mounting, place all dataset files in a folder inside your Drive (e.g., My Drive/CorruptionStudy/
).
Ensure the following files are available in your Colab session (either uploaded directly or stored in Drive):
File | Description |
---|---|
estat_sdg_16_50_en.csv | Eurostat CPI dataset |
V-Dem-CY-Core-v15.csv | V-Dem Core dataset |
Controls.xlsx | Control variables |
Institutional.xlsx | Institutional variables |
Core.xlsx | Additional core variables |
If you are not using Google Drive, upload all files at the start of your session:
from google.colab import files
uploaded = files.upload()
Select all required .csv
and .xlsx
files when prompted.
Run the following command in a Colab cell:
!pip install pandas numpy statsmodels linearmodels openpyxl
If files are uploaded directly in Colab:
EUROSTAT_CPI_PATH = "/content/estat_sdg_16_50_en.csv"
VDEM_PATH = "/content/V-Dem-CY-Core-v15.csv"
CONTROLS_PATH = "/content/Controls.xlsx"
INSTITUTIONAL_PATH= "/content/Institutional.xlsx"
CORE_PATH = "/content/Core.xlsx"
If files are stored in Google Drive:
EUROSTAT_CPI_PATH = "/content/drive/My Drive/CorruptionStudy/estat_sdg_16_50_en.csv"
VDEM_PATH = "/content/drive/My Drive/CorruptionStudy/V-Dem-CY-Core-v15.csv"
Execute all cells in order (Runtime → Run all).
The notebook will:
Load CPI and V-Dem data
Merge with control variables
Standardize variables
Estimate two-way fixed effects (Driscoll–Kraay standard errors)
Output model summaries
To save results to Google Drive:
df.to_excel("/content/drive/My Drive/CorruptionStudy/results.xlsx")
To download directly:
from google.colab import files
files.download("results.xlsx")
If using this dataset or code, please cite the Zenodo record as indicated in the Cite As section.
Zenodo Dataset Description:
Title: Epistemic Legitimacy Traps in High-Trust Democracies: Replication Data and Code
Description:
This dataset contains replication materials for "Epistemic Legitimacy Traps: How High-Trust Institutions Silence Inconvenient Truths" - a study examining how friendship-based corruption persists in democratic institutions through systematic exclusion of internal critics.
Contents:
Key Variables:
Methodology: Two-way fixed effects panel regression (institutional analysis) and OLS with robust standard errors (individual analysis) testing the relationship between corruption measures, institutional quality, and public perceptions in high-trust democratic contexts.
Research Questions: How do high-trust institutions maintain legitimacy while systematically excluding internal criticism? What role do friendship networks play in enabling "clean corruption" that operates through relationships rather than material exchanges?
Keywords: corruption, epistemic injustice, institutional legitimacy, democracy, trust, whistleblowing, friendship networks, panel data
Citation: [Author], [Year]. "Epistemic Legitimacy Traps: How High-Trust Institutions Silence Inconvenient Truths." Business Ethics Quarterly [forthcoming].
Data Sources: V-Dem Institute, Eurostat, [Original Survey Data Source]
License: Creative Commons Attribution 4.0 International