15 datasets found

The Denial of Governance Failure in High-Trust Democracies

zenodo.org

bin, csv

Updated Aug 8, 2025

Facebook

Twitter

Click to copy link

Link copied

Cite

Anon Anon; Anon Anon (2025). The Denial of Governance Failure in High-Trust Democracies [Dataset]. http://doi.org/10.5281/zenodo.16783246

Explore at:

bin, csvAvailable download formats

Unique identifier

https://doi.org/10.5281/zenodo.16783246

Dataset updated

Aug 8, 2025

Dataset provided by

Zenodohttp://zenodo.org/

Authors

Anon Anon; Anon Anon

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

How to Run the Analysis in Google Colab

This dataset and code package is designed for execution in Google Colab, which provides a free cloud-based Python environment.
Follow these steps to reproduce the results.

1. Open Google Colab

Visit https://colab.research.google.com
Sign in with your Google account.

2. Access the Notebooks

This repository contains two analysis notebooks:
- Polis.ipynb
- cpi.ipynb
Download them from Zenodo, or open them directly in Colab using File → Upload notebook.

3. Mount Google Drive (Optional but Recommended)

Mounting Google Drive allows you to store the data permanently instead of uploading it each time.

python

CopiarEditar

from google.colab import drive
drive.mount('/content/drive')

After mounting, place all dataset files in a folder inside your Drive (e.g., My Drive/CorruptionStudy/).

4. Required Dataset Files

Ensure the following files are available in your Colab session (either uploaded directly or stored in Drive):

File	Description
`estat_sdg_16_50_en.csv`	Eurostat CPI dataset
`V-Dem-CY-Core-v15.csv`	V-Dem Core dataset
`Controls.xlsx`	Control variables
`Institutional.xlsx`	Institutional variables
`Core.xlsx`	Additional core variables

5. Upload Files (If Not Using Drive)

If you are not using Google Drive, upload all files at the start of your session:

python

CopiarEditar

from google.colab import files
uploaded = files.upload()

Select all required .csv and .xlsx files when prompted.

6. Install Required Python Packages

Run the following command in a Colab cell:

python

CopiarEditar

!pip install pandas numpy statsmodels linearmodels openpyxl

7. Update File Paths in the Notebook

If files are uploaded directly in Colab:

python

CopiarEditar

EUROSTAT_CPI_PATH = "/content/estat_sdg_16_50_en.csv"
VDEM_PATH     = "/content/V-Dem-CY-Core-v15.csv"
CONTROLS_PATH   = "/content/Controls.xlsx"
INSTITUTIONAL_PATH= "/content/Institutional.xlsx"
CORE_PATH     = "/content/Core.xlsx"

If files are stored in Google Drive:

python

CopiarEditar

EUROSTAT_CPI_PATH = "/content/drive/My Drive/CorruptionStudy/estat_sdg_16_50_en.csv"
VDEM_PATH     = "/content/drive/My Drive/CorruptionStudy/V-Dem-CY-Core-v15.csv"

8. Run the Notebook

Execute all cells in order (Runtime → Run all).
The notebook will:
1. Load CPI and V-Dem data
2. Merge with control variables
3. Standardize variables
4. Estimate two-way fixed effects (Driscoll–Kraay standard errors)
5. Output model summaries

9. Save Results

To save results to Google Drive:

python

CopiarEditar

df.to_excel("/content/drive/My Drive/CorruptionStudy/results.xlsx")

To download directly:

python

CopiarEditar

from google.colab import files
files.download("results.xlsx")

10. Citation

If using this dataset or code, please cite the Zenodo record as indicated in the Cite As section.

Zenodo Dataset Description:

Title: Epistemic Legitimacy Traps in High-Trust Democracies: Replication Data and Code

Description:

This dataset contains replication materials for "Epistemic Legitimacy Traps: How High-Trust Institutions Silence Inconvenient Truths" - a study examining how friendship-based corruption persists in democratic institutions through systematic exclusion of internal critics.

Contents:

Panel Analysis Data (481 country-year observations, 37 European countries, 2010-2022):
- V-Dem Democracy Dataset (v15) corruption measures
- Eurostat Corruption Perceptions Index data
- Merged and cleaned dataset for two-way fixed effects analysis
Individual-Level Survey Data (66,054 observations):
- Controls, Core, and Institutional survey modules
- Variables measuring friendship reciprocity norms, institutional trust, and socioeconomic outcomes
- Cleaned dataset for OLS regression analysis
Replication Code:
- Python scripts for panel data analysis with Driscoll-Kraay standard errors
- Stata/Python code for individual-level OLS regressions with robust and clustered standard errors
- Data cleaning and variable construction procedures

Key Variables:

Corruption Perceptions Index (Eurostat)
V-Dem corruption measures (executive, public sector, composite)
Institutional quality indicators (judicial independence, civil society participation)
Individual friendship reciprocity and trust measures
Sociodemographic controls

Methodology: Two-way fixed effects panel regression (institutional analysis) and OLS with robust standard errors (individual analysis) testing the relationship between corruption measures, institutional quality, and public perceptions in high-trust democratic contexts.

Research Questions: How do high-trust institutions maintain legitimacy while systematically excluding internal criticism? What role do friendship networks play in enabling "clean corruption" that operates through relationships rather than material exchanges?

Keywords: corruption, epistemic injustice, institutional legitimacy, democracy, trust, whistleblowing, friendship networks, panel data

Citation: [Author], [Year]. "Epistemic Legitimacy Traps: How High-Trust Institutions Silence Inconvenient Truths." Business Ethics Quarterly [forthcoming].

Data Sources: V-Dem Institute, Eurostat, [Original Survey Data Source]

License: Creative Commons Attribution 4.0 International

Keras video classification example with a subset of UCF101 - Action...
zenodo.org
data.niaid.nih.gov
application/gzip
Updated May 11, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mikolaj Buchwald; Mikolaj Buchwald (2023). Keras video classification example with a subset of UCF101 - Action Recognition Data Set (top 5 videos) [Dataset]. http://doi.org/10.5281/zenodo.7924745
Explore at:
application/gzipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.7924745
Dataset updated
May 11, 2023
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Mikolaj Buchwald; Mikolaj Buchwald
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Classify video clips with natural scenes of actions performed by people visible in the videos.

See the UCF101 Dataset web page: https://www.crcv.ucf.edu/data/UCF101.php#Results_on_UCF101

This example datasets consists of the 5 most numerous video from the UCF101 dataset. For the top 10 version see: https://doi.org/10.5281/zenodo.7882861 .

Based on this code: https://keras.io/examples/vision/video_classification/ (needs to be updated, if has not yet been already; see the issue: https://github.com/keras-team/keras-io/issues/1342).

Testing if data can be downloaded from figshare with `wget`, see: https://github.com/mojaveazure/angsd-wrapper/issues/10

For generating the subset, see this notebook: https://colab.research.google.com/github/sayakpaul/Action-Recognition-in-TensorFlow/blob/main/Data_Preparation_UCF101.ipynb -- however, it also needs to be adjusted (if has not yet been already - then I will post a link to the notebook here or elsewhere, e.g., in the corrected notebook with Keras example).

I would like to thank Sayak Paul for contacting me about his example at Keras documentation being out of date.

Cite this dataset as:

Soomro, K., Zamir, A. R., & Shah, M. (2012). UCF101: A dataset of 101 human actions classes from videos in the wild. arXiv preprint arXiv:1212.0402. https://doi.org/10.48550/arXiv.1212.0402

To download the dataset via the command line, please use:

wget -q https://zenodo.org/record/7924745/files/ucf101_top5.tar.gz -O ucf101_top5.tar.gz tar xf ucf101_top5.tar.gz
R
Data from: Ro90 Dataset
universe.roboflow.com
zip
Updated Jan 5, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
tine colab (2023). Ro90 Dataset [Dataset]. https://universe.roboflow.com/tine-colab-h3aqw/ro90
Explore at:
zipAvailable download formats
Dataset updated
Jan 5, 2023
Dataset authored and provided by
tine colab
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Variables measured
Accepts Or Rejects Bounding Boxes
Description
Ro90

## Overview Ro90 is a dataset for object detection tasks - it contains Accepts Or Rejects annotations for 3,900 images. ## Getting Started You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model. ## License This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
deepnets_1m_dataset
kaggle.com
opendatalab.com
+1more
Updated Nov 14, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
CypherX (2021). deepnets_1m_dataset [Dataset]. https://www.kaggle.com/datasets/cypherxray/deepnet-1m-dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Nov 14, 2021
Dataset provided by
Kagglehttp://kaggle.com/
Authors
CypherX
Description
Context

The DeepNet-1M dataset was published Facebook researchers in with the paper Parameter Prediction for Unseen Deep Architectures (NeurIPS 2021). The relevant code is maintained in this repository.

https://github.com/facebookresearch/ppuda/raw/main/data/figs/overview.png" alt="image">
Source: ppuda-repository

Content

https://github.com/facebookresearch/ppuda/raw/main/data/figs/colab.gif" alt="gif">

Acknowledgements

This dataset was downloaded using scripts shared at this repository by the researchers.

License

The data is licensed according to the License in this GitHub repository. At the time of creating this dataset, the repository was licensed under MIT license.

Notes

Source of banner image
Human alterations of the global floodplains 1965-2019
catalog.data.gov
s.cnmilf.com
Updated Aug 28, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. EPA Office of Research and Development (ORD) (2023). Human alterations of the global floodplains 1965-2019 [Dataset]. https://catalog.data.gov/dataset/human-alterations-of-the-global-floodplains-1965-2019
Explore at:
Dataset updated
Aug 28, 2023
Dataset provided by
United States Environmental Protection Agencyhttp://www.epa.gov/
Description
We developed the first publicly available spatially explicit estimates of the human alterations along the global floodplains during the recent 27 years (1992-2019) at 250-m resolution. To maximize the reuse of our datasets and advance the open science of human floodplain alteration, we developed three web-based programming tools: (1) Floodplain Mapping Tool, (2) Land Use Change Tool, and (3) Human Alteration Tool supported with tutorials and step-by-step audiovisual instructions. Our data reveal a significant loss of natural floodplains worldwide with 460,000 km2 of new agricultural and 140,000 km2 of new developed areas between 1992 and 2019. This dataset offers critical new insights into how floodplains are being destroyed, which will help decision-makers to reinforce strategies to conserve and restore floodplain functions and habitat. This dataset is not publicly accessible because: EPA scientists provided context and commentary but did not do any of the analyses or handle any of the data. It can be accessed through the following means: The entire data record can be downloaded as a single zip file from this web link: http://www.hydroshare.org/resource/cdb5fd97e0644a14b22e58d05299f69b. The global floodplain alteration dataset is derived entirely through ArcGIS 10.5 and ENVI 5.1 geospatial analysis platforms. To assist in reuse and application of the dataset, we developed additional Python codes aggregated as three web-based tools: Floodplain Mapping Tool: https://colab.research.google.com/drive/1xQlARZXKPexmDInYV-EMoJ-HZxmFL-eW?usp=sharing. Land Use Change Tool: https://colab.research.google.com/drive/1vmIaUCkL66CoTv4rNRIWpJXYXp4TlAKd?usp=sharing. Human Alteration Tool: https://colab.research.google.com/drive/1r2zNJNpd3aWSuDV2Kc792qSEjvDbFtBy?usp=share_link See Usage Notes section in the journal article for details. Format: The global floodplain alteration dataset is available through the HydroShare open geospatial data platform. Our data record also includes all corresponding input data, intermediate calculations, and supporting information. This dataset is associated with the following publication: Rajib, A., Q. Zheng, C. Lane, H. Golden, J. Christensen, I. Isibor, and K. Johnson. Human alterations of the global floodplains 1992–2019. Scientific Data. Springer Nature, New York, NY, USA, 10: 499, (2023).
Common Voice Corpus 5.1
kaggle.com
zip
Updated Sep 15, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Krish Baisoya (2023). Common Voice Corpus 5.1 [Dataset]. https://www.kaggle.com/datasets/krishbaisoya/cv-en-5
Explore at:
zip(54099708635 bytes)Available download formats
Dataset updated
Sep 15, 2023
Authors
Krish Baisoya
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Common Voice is a corpus of speech data read by users on the Common Voice website, and based upon text from a number of public domain sources like user submitted blog posts, old books, movies, and other public speech corpora. Its primary purpose is to enable the training and testing of automatic speech recognition (ASR) systems.

How it is collected ?

In google colab, i downloaded the .tar.gz from common-voice (mozilla). And placed the compressed file in a folder marked the folder as dataset and straight-up uploaded it
h
CloudSEN12-high
huggingface.co
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Cesar Aybar, CloudSEN12-high [Dataset]. https://huggingface.co/datasets/csaybar/CloudSEN12-high
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Authors
Cesar Aybar
License
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Description
🚨 New Dataset Version Released!

We are excited to announce the release of Version [1.1] of our dataset! This update includes:

[L2A & L1C support]. [Temporal support]. [Check the data without downloading (Cloud-optimized properties)].

📥 Go to: https://huggingface.co/datasets/tacofoundation/cloudsen12 and follow the instructions in colab CloudSEN12 HIGH-QUALITY A Benchmark Dataset for Cloud Semantic Understanding

CloudSEN12… See the full description on the dataset page: https://huggingface.co/datasets/csaybar/CloudSEN12-high.
R
Accident Detection Model Dataset
universe.roboflow.com
zip
Updated Apr 8, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Accident detection model (2024). Accident Detection Model Dataset [Dataset]. https://universe.roboflow.com/accident-detection-model/accident-detection-model/model/1
Explore at:
zipAvailable download formats
Dataset updated
Apr 8, 2024
Dataset authored and provided by
Accident detection model
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Variables measured
Accident Bounding Boxes
Description
Accident-Detection-Model

Accident Detection Model is made using YOLOv8, Google Collab, Python, Roboflow, Deep Learning, OpenCV, Machine Learning, Artificial Intelligence. It can detect an accident on any accident by live camera, image or video provided. This model is trained on a dataset of 3200+ images, These images were annotated on roboflow.

Problem Statement

Road accidents are a major problem in India, with thousands of people losing their lives and many more suffering serious injuries every year.

According to the Ministry of Road Transport and Highways, India witnessed around 4.5 lakh road accidents in 2019, which resulted in the deaths of more than 1.5 lakh people.

The age range that is most severely hit by road accidents is 18 to 45 years old, which accounts for almost 67 percent of all accidental deaths.

Accidents survey

https://user-images.githubusercontent.com/78155393/233774342-287492bb-26c1-4acf-bc2c-9462e97a03ca.png" alt="Survey">

Literature Survey

Sreyan Ghosh in Mar-2019, The goal is to develop a system using deep learning convolutional neural network that has been trained to identify video frames as accident or non-accident.

Deeksha Gour Sep-2019, uses computer vision technology, neural networks, deep learning, and various approaches and algorithms to detect objects.

Research Gap

Lack of real-world data - We trained model for more then 3200 images.

Large interpretability time and space needed - Using google collab to reduce interpretability time and space required.

Outdated Versions of previous works - We aer using Latest version of Yolo v8.

Proposed methodology

We are using Yolov8 to train our custom dataset which has been 3200+ images, collected from different platforms.

This model after training with 25 iterations and is ready to detect an accident with a significant probability.

Model Set-up

Preparing Custom dataset

We have collected 1200+ images from different sources like YouTube, Google images, Kaggle.com etc.

Then we annotated all of them individually on a tool called roboflow.

During Annotation we marked the images with no accident as NULL and we drew a box on the site of accident on the images having an accident

Then we divided the data set into train, val, test in the ratio of 8:1:1

At the final step we downloaded the dataset in yolov8 format.
#### Using Google Collab

We are using google colaboratory to code this model because google collab uses gpu which is faster than local environments.

You can use Jupyter notebooks, which let you blend code, text, and visualisations in a single document, to write and run Python code using Google Colab.

Users can run individual code cells in Jupyter Notebooks and quickly view the results, which is helpful for experimenting and debugging. Additionally, they enable the development of visualisations that make use of well-known frameworks like Matplotlib, Seaborn, and Plotly.

In Google collab, First of all we Changed runtime from TPU to GPU.

We cross checked it by running command ‘!nvidia-smi’
#### Coding

First of all, We installed Yolov8 by the command ‘!pip install ultralytics==8.0.20’

Further we checked about Yolov8 by the command ‘from ultralytics import YOLO from IPython.display import display, Image’

Then we connected and mounted our google drive account by the code ‘from google.colab import drive drive.mount('/content/drive')’

Then we ran our main command to run the training process ‘%cd /content/drive/MyDrive/Accident Detection model !yolo task=detect mode=train model=yolov8s.pt data= data.yaml epochs=1 imgsz=640 plots=True’

After the training we ran command to test and validate our model ‘!yolo task=detect mode=val model=runs/detect/train/weights/best.pt data=data.yaml’ ‘!yolo task=detect mode=predict model=runs/detect/train/weights/best.pt conf=0.25 source=data/test/images’

Further to get result from any video or image we ran this command ‘!yolo task=detect mode=predict model=runs/detect/train/weights/best.pt source="/content/drive/MyDrive/Accident-Detection-model/data/testing1.jpg/mp4"’

The results are stored in the runs/detect/predict folder.
Hence our model is trained, validated and tested to be able to detect accidents on any video or image.

Challenges I ran into

I majorly ran into 3 problems while making this model

I got difficulty while saving the results in a folder, as yolov8 is latest version so it is still underdevelopment. so i then read some blogs, referred to stackoverflow then i got to know that we need to writ an extra command in new v8 that ''save=true'' This made me save my results in a folder.

I was facing problem on cvat website because i was not sure what
d
Development of an AI/ML-ready knee ultrasound dataset in a population-based...
dataone.org
Updated Nov 8, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nelson, Amanda (2023). Development of an AI/ML-ready knee ultrasound dataset in a population-based cohort [Dataset]. http://doi.org/10.7910/DVN/SKP9IB
Explore at:
Unique identifier
https://doi.org/10.7910/DVN/SKP9IB
Dataset updated
Nov 8, 2023
Dataset provided by
Harvard Dataverse
Authors
Nelson, Amanda
Description
About this data An ultrasound dataset to use in the discovery of ultrasound features associated with pain and radiographic change in KOA is highly innovative and will be a major step forward for the field. These ultrasound images originate from the diverse and inclusive population-based Johnston County Health Study (JoCoHS). This dataset is designed to adhere to FAIR principles and was funded in part by an Administrative Supplement to Improve the AI/ML-Readiness of NIH-Supported Data (3R01AR077060-03S1). Working with this dataset WorkingWithTheDataset.ipynb Jupyter notebook If you are familiar with working with Jupyter notebooks, we recommend using the WorkingWithTheDataset.ipynb Jupyter notebook to retrieve, validate, and learn more about the dataset. You should downloading the latest WorkingWithTheDataset.ipynb file and uploading it to an online Jupyter environment such as https://colab.research.google.com or use the notebook in your Jupyter environment of choice. You will also need to download the CONFIGURATION_SETTINGS.template.md file from this dataset since the contents are used to configure the Jupyter notebook. Note: at the time of this writing, we do not recommend using Binder (mybinder.org) if you are interested in only reviewing the WorkingWithTheDataset.ipynb notebook. When Binder loads the dataset, it will download all files from this dataset, resulting in a long build time. However, if you plan to work with all files in the dataset then Binder might work for you. We do not offer support for this service or other Jupyter Lab environments. Metadata The DatasetMetadata.json file contains general information about the files and variables within this dataset. We use it as our validation metadata to verify the data we are importing into this Dataverse dataset. This file is also the most comprehensive with regards to the dataset metadata. Data collection in progress This dataset is not complete and will be updated regularly as additional data is collected.
Z
Data from: Gravity Spy Machine Learning Classifications of LIGO Glitches...
data.niaid.nih.gov
zenodo.org
Updated Jan 30, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Katsaggelos, Aggelos (2023). Gravity Spy Machine Learning Classifications of LIGO Glitches from Observing Runs O1, O2, O3a, and O3b [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_5649211
Explore at:
Dataset updated
Jan 30, 2023
Dataset provided by
Banagari, Sharan
Smith, Joshua
Glanzer, Jane
Allen, Sara
Harandi, Mabi
Zevin, Michael
Patane, Oli
Kalogera, Vicky
Noroozi, Vahid
Crowston, Kevin
Katsaggelos, Aggelos
Rohani, Neda
Soni, Siddharth
Berry, Christopher
Trouille, Laura
Osterlund, Carsten
Coughlin, Scott
Bahaadini, Sara
Jackson, Corey
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This data set contains all classifications that the Gravity Spy Machine Learning model for LIGO glitches from the first three observing runs (O1, O2 and O3, where O3 is split into O3a and O3b). Gravity Spy classified all noise events identified by the Omicron trigger pipeline in which Omicron identified that the signal-to-noise ratio was above 7.5 and the peak frequency of the noise event was between 10 Hz and 2048 Hz. To classify noise events, Gravity Spy made Omega scans of every glitch consisting of 4 different durations, which helps capture the morphology of noise events that are both short and long in duration.

There are 22 classes used for O1 and O2 data (including No_Glitch and None_of_the_Above), while there are two additional classes used to classify O3 data (while None_of_the_Above was removed).

For O1 and O2, the glitch classes were: 1080Lines, 1400Ripples, Air_Compressor, Blip, Chirp, Extremely_Loud, Helix, Koi_Fish, Light_Modulation, Low_Frequency_Burst, Low_Frequency_Lines, No_Glitch, None_of_the_Above, Paired_Doves, Power_Line, Repeating_Blips, Scattered_Light, Scratchy, Tomte, Violin_Mode, Wandering_Line, Whistle

For O3, the glitch classes were: 1080Lines, 1400Ripples, Air_Compressor, Blip, Blip_Low_Frequency, Chirp, Extremely_Loud, Fast_Scattering, Helix, Koi_Fish, Light_Modulation, Low_Frequency_Burst, Low_Frequency_Lines, No_Glitch, None_of_the_Above, Paired_Doves, Power_Line, Repeating_Blips, Scattered_Light, Scratchy, Tomte, Violin_Mode, Wandering_Line, Whistle

The data set is described in Glanzer et al. (2023), which we ask to be cited in any publications using this data release. Example code using the data can be found in this Colab notebook.

If you would like to download the Omega scans associated with each glitch, then you can use the gravitational-wave data-analysis tool GWpy. If you would like to use this tool, please install anaconda if you have not already and create a virtual environment using the following command

conda create --name gravityspy-py38 -c conda-forge python=3.8 gwpy pandas psycopg2 sqlalchemy

After downloading one of the CSV files for a specific era and interferometer, please run the following Python script if you would like to download the data associated with the metadata in the CSV file. We recommend not trying to download too many images at one time. For example, the script below will read data on Hanford glitches from O2 that were classified by Gravity Spy and filter for only glitches that were labelled as Blips with 90% confidence or higher, and then download the first 4 rows of the filtered table.

from gwpy.table import GravitySpyTable

H1_O2 = GravitySpyTable.read('H1_O2.csv')

H1_O2[(H1_O2["ml_label"] == "Blip") & (H1_O2["ml_confidence"] > 0.9)]

H1_O2[0:4].download(nproc=1)

Each of the columns in the CSV files are taken from various different inputs:

[‘event_time’, ‘ifo’, ‘peak_time’, ‘peak_time_ns’, ‘start_time’, ‘start_time_ns’, ‘duration’, ‘peak_frequency’, ‘central_freq’, ‘bandwidth’, ‘channel’, ‘amplitude’, ‘snr’, ‘q_value’] contain metadata about the signal from the Omicron pipeline.

[‘gravityspy_id’] is the unique identifier for each glitch in the dataset.

[‘1400Ripples’, ‘1080Lines’, ‘Air_Compressor’, ‘Blip’, ‘Chirp’, ‘Extremely_Loud’, ‘Helix’, ‘Koi_Fish’, ‘Light_Modulation’, ‘Low_Frequency_Burst’, ‘Low_Frequency_Lines’, ‘No_Glitch’, ‘None_of_the_Above’, ‘Paired_Doves’, ‘Power_Line’, ‘Repeating_Blips’, ‘Scattered_Light’, ‘Scratchy’, ‘Tomte’, ‘Violin_Mode’, ‘Wandering_Line’, ‘Whistle’] contain the machine learning confidence for a glitch being in a particular Gravity Spy class (the confidence in all these columns should sum to unity). These use the original 22 classes in all cases.

[‘ml_label’, ‘ml_confidence’] provide the machine-learning predicted label for each glitch, and the machine learning confidence in its classification.

[‘url1’, ‘url2’, ‘url3’, ‘url4’] are the links to the publicly-available Omega scans for each glitch. ‘url1’ shows the glitch for a duration of 0.5 seconds, ‘url2’ for 1 seconds, ‘url3’ for 2 seconds, and ‘url4’ for 4 seconds.

For the most recently uploaded training set used in Gravity Spy machine learning algorithms, please see Gravity Spy Training Set on Zenodo.

For detailed information on the training set used for the original Gravity Spy machine learning paper, please see Machine learning for Gravity Spy: Glitch classification and dataset on Zenodo.
Banana Classification
kaggle.com
Updated Apr 23, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Atri Thakar (2024). Banana Classification [Dataset]. https://www.kaggle.com/datasets/atrithakar/banana-classification
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Apr 23, 2024
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Atri Thakar
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
This is a dataset for detecting banana quality using ML. This dataset contains four categories: Unripe, Ripe, Overripe and Rotten. In this dataset, there are enormous amount of images which will help users to train the ML model conveniently and easily.

NOTE: THIS DATASET HAS BEEN PICKED FROM https://universe.roboflow.com/roboflow-universe-projects/banana-ripeness-classification. I WAS FACING DIFFICULTIES WHILE DOWNLOADING DATASET DIRECTLY TO THE GOOGLE COLAB TO TRAIN MY CNN MODEL AS A PART OF UNIVERSITY PROJECT. ALL CREDITS FOR THIS DATASET, AS FAR AS MY KNOWLEDGE GOES, GOES TO ROBOFLOW. I DO NOT INTEND TO TAKE ANY CREDITS MYSELF OR UNETHICALLY CLAIM OWNERSHIP, I JUST UPLOADED DATASET HERE FOR MY CONVENIENCE, THANK YOU.
Quantified dataset: Overexpression
zenodo.org
zip
Updated Dec 28, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Vito RT Zanotelli; Vito RT Zanotelli; Bernd Bodenmiller; Bernd Bodenmiller (2020). Quantified dataset: Overexpression [Dataset]. http://doi.org/10.5281/zenodo.4288515
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.4288515
Dataset updated
Dec 28, 2020
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Vito RT Zanotelli; Vito RT Zanotelli; Bernd Bodenmiller; Bernd Bodenmiller
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This repository contains the quantified single cell dataset for the chimeric overexpression spheroid experiment described in:

A quantitative analysis of the interplay of environment, neighborhood, and cell state in 3D spheroids

Vito RT Zanotelli
Matthias Leutenegger
Xiao‐Kang Lun
Fanny Georgi
Natalie de Souza
Bernd Bodenmiller

Mol Syst Biol. (2020) 16: e9798
https://doi.org/10.15252/msb.20209798

Please cite this article if you re-use any of the data or code.

This is an export of the processed dataset after quality control. Please consult the README bellow for a description of the data.

An example script to browse this data using Python can be found here: https://github.com/BodenmillerGroup/SpheroidPublication/blob/oexp_analysis/workflow/notebooks/99_browse_export_data.py.ipynb

or be interactively tried on Google Colab:

https://colab.research.google.com/github/BodenmillerGroup/SpheroidPublication/blob/oexp_analysis/workflow/notebooks/99_browse_export_data.py.ipynb

Export Oexp Analysis

by Vito Zanotelli et al, Bodenmiller Lab UZH, 2020

This is the export of the overexpression dataset from the paper: "A quantitative analysis of the interplay of environment, neighborhood and cell state in 3D spheroids" Raw data: 10.5281/zenodo.4055780 Please cite the paper if you use this data!

###Experimental design (More details in the paper):

Overexpressing 51 signaling constructs, 4 control contstructs (2x GFP, 1x HcRed, 1x Luciferase) and 1 'empty' mock transfection controls grown in 5 replicates on 5 different plates ('empty' control has 35 replicates).

Most signaling constructs have a GFP tag. Typically only a subset of cells per sphere were overexpresing.

4 plates were pooled into one block with 240 well barcoding, 2 plates in one block with 120 well barcoding.

A pellet of each pool was generated and cut into several 6um thick sections

A subset of these sections (='sites') were stained with an IMC pane and acquired as 1 or more 'acquisitions' containing multiple spheres each.

Spheres in these acquisitions were identified via computer vision and croped into individual 'images'

In each image the following 'objects' were identified via computer vision:

'cell's (cell sections)

'nucleiexp' (slighly expanded cell centers around nuclei)

'cyto' (cytoplasm, cell region without nuclei) -> In the manuscript only 'cell' level data was used.

The data was exported using the 'anndata' csv format: https://anndata.readthedocs.io/en/stable/anndata.AnnData.html

Some notes on the files and their columns:

{object}_X.csv:

The data matrix

Shape: #objects x #features

column metadata: {object}_var.csv table

row metadata: {object}_obs.csv table

{object}_var.csv:

Variable metadata

For the paper mainly the compensated MeanIntensities (MeanIntensityComp) of an IMC image stack (FullStackFiltered) were used.

For 'cell' objects this export contains additionally measurements of min/max and mean Intensities from a pixel-wise compensated IMC image (FullStackComp), an Imunofluorescent image stack (IfStack, Dapi+GFP channel), a pixel-probability stack (ProbPos, channels: prop-pos, prop-neg) as well as as well as area and location features. Other important features:

distrim: Estimated distance to sphere border -> unit 'um'

Center_X/Y: Centroid of object in image -> unit 'um'

dist-sphere: distance to estimated spheroid section border

dist-other: distance to other spheroid section in image

dist-bg: distance to background pixels

- Shape: #features x #columns - Columns: - measurement_id: unique measurement id - measurement_name: Name of measurement (this export: all compensated mean intensity) - measurement_type: Type of measurement (this export: only Intensity features) - channel_name, metal: Isotope name - stack_name: multicolor image stack containing this channel - ref_plane_number: position of the measured channel in it's image stack - goodname: The name of the marker no prefix: total protein p-: phopho protein []: phospho residue BC: barcoding metal - Antibody Clone: antibody clone name - is_cc: bool, indication if this marker is considered a classical cell cycle marker - working: bool, indicates if the markers are working and of biological value. I would only look at the marker with working=1 Not important: - scale: scale of raw data (data is already scaled) - plane_id: database id for image plane.

{object}_obs.csv:

Object (cell/nuclei/cytoplasma section) level metadata. For the paper only 'cell' level data was used.

Shape: #objects x #columns

Columns:

object_id: Unique object id (unique also accross object types)

image_id: The key linking to the 'image_meta.csv' table

object_number: id corresponding to the object value in the segmentation mask

relations{source}{target}.csv:

Cell relationship graphs

Shape: #relations x #columns

Encoding relations between objects:

cell_neighbors: Neighbourhood graph:

object_id_cell: id of cell

object_id_neighbour: id of neighbor

cell_nuclei: Relationship between cells and nuclei

object_id_cell

object_id_nucleiexp -> This is not necessarily a 1:1 correspondence -cell_cyto: Relationship between cells and cytoplasm

object_id_cell

object_id_cyto -> This is not necessarily a 1:1 correspondence

image_meta.csv:

Image (=spheroid section) metadata

Shape: #images x #columns

Columns:

Image metadata:

image_id: The unique key of this table. Each row corresponds to a single spheroid section

image_shape_h/w: width/heigh of image in pixels/um

acquisition_id: unique id of IMC acquisition this image was cropped from

site_id: unique id of the section this sphere cut comes from.
All cuts in the same section were stained together.

slide_id: unique id for a single slide containing 1 or more sites

sampleblock_id: unique id of the sample block this sphere was pooled and processed in.

Not important:

image_number: original cellprofiler image number

crop_number: object number of the sphere that was used for this crop

image_pos_x/y: top left coordinate of crop of sphere from original acquisition

bc_depth: cells within this distance from border were considered for debarcoding

bc_invalid: number of invalid debarcoded objects in this sphere crop

bc_highest_count: number of cells assigned to the main barcode of this crop

bc_second_count: number of cells assigned to the second most frequent barcode of this crop

barcode: dictionary containing the barcode

bc_plate, bc_x, bc_y: barcode metadata

acquisition_mcd_acid: original MCD aquisition id

site_mcd_panoramaid: original MCD panorama id

acquisition_mcd_roiid: original MCD roiid

slideac_id/name: unique id for each aquisition of a slide. Corresponds to a single mcd file

slide_number: original number of slide this acquisition comes from

</li> <li> Experimental metadata: <ul> <li>condition_id: id of the physical spheroid the slice belongs to. Unique to each sphere replicate.</li> <li>condition_name: name of the growth condition this sphere came from</li> <li>plate_id: id of the plate the spheroid was grown in</li> <li>well_name: position of the well the spheroid was grown in</li> <li>sampleblock_id/sampleblock_name: id/name of the pooled block the spheroid was processed in</li> <li>site_id: corresponds to the site the spheroid slice was located on. All spheroid slices in the same site were stained together.</li> <li>file_name: filename of the segmentation mask found in masks_cell</li> </ul> </li> <li> Filenames: <ul> <li>maskfilename{object}: filename of the object mask corresponding to this image</li> <li>image_stackfilename{imagestack}: filename of the image stack with this name. Note: all mean intensity measurements are usually done in the 'FullStackFiltered' (raw image with only filtered for strong outliers) and then compensated for metal impurities (as recomended in Chevrier, Zanotelli and Crowell 2018). For
h
gigaspeech
huggingface.co
opendatalab.com
Updated Aug 30, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
SpeechColab (2022). gigaspeech [Dataset]. http://doi.org/10.57967/hf/6261
Explore at:
Unique identifier
https://doi.org/10.57967/hf/6261
Dataset updated
Aug 30, 2022
Dataset authored and provided by
SpeechColab
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
GigaSpeech is an evolving, multi-domain English speech recognition corpus with 10,000 hours of high quality labeled audio suitable for supervised training, and 40,000 hours of total audio suitable for semi-supervised and unsupervised training. Around 40,000 hours of transcribed audio is first collected from audiobooks, podcasts and YouTube, covering both read and spontaneous speaking styles, and a variety of topics, such as arts, science, sports, etc. A new forced alignment and segmentation pipeline is proposed to create sentence segments suitable for speech recognition training, and to filter out segments with low-quality transcription. For system training, GigaSpeech provides five subsets of different sizes, 10h, 250h, 1000h, 2500h, and 10000h. For our 10,000-hour XL training subset, we cap the word error rate at 4% during the filtering/validation stage, and for all our other smaller training subsets, we cap it at 0%. The DEV and TEST evaluation sets, on the other hand, are re-processed by professional human transcribers to ensure high transcription quality.
Butterfly: EVALPBMC_DS dataset
zenodo.org
application/gzip
Updated Mar 4, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Johan Gustafsson; Johan Gustafsson (2021). Butterfly: EVALPBMC_DS dataset [Dataset]. http://doi.org/10.5281/zenodo.4574616
Explore at:
application/gzipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.4574616
Dataset updated
Mar 4, 2021
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Johan Gustafsson; Johan Gustafsson
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset is PBMC 1, Drop-Seq, from the publication

Ding J, Adiconis X, Simmons SK, Kowalczyk MS, Hession CC, Marjanovic ND, et al. Systematic comparison of single-cell and single-nucleus RNA-sequencing methods. Nature Biotechnology. 2020 Apr 6;1–10.

This is distributed according to the Creative Commons 4.0 license.

The files are also available in human cell atlas. The reason we put the data here is because we want to be able to download the data from Google Colab, and we found no way to do that in the HCA DCP.
Corpus Nummorum - Coin Image Dataset
zenodo.org
data.niaid.nih.gov
zip
Updated Nov 7, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Corpus_Nummorum; Corpus_Nummorum (2023). Corpus Nummorum - Coin Image Dataset [Dataset]. http://doi.org/10.5281/zenodo.10033993
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.10033993
Dataset updated
Nov 7, 2023
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Corpus_Nummorum; Corpus_Nummorum
License
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Description
Corpus Nummorum - Coin Image Dataset
This dataset is a collection of ancient coin images from three different sources: the Corpus Nummorum (CN) project, the Münzkabinett Berlin and the Bibliothèque nationale de France, Département des Monnaies, médailles et antiques. It covers Greek and Roman coins from ancient Thrace, Moesia Inferior, Troad and Mysia. This is a selection of the coins published on the CN portal (due to copyrights).
The dataset contains 115,160 images with about 29,000 unique coins. The images are split in three main folders with different assignment of the coins. Each main folder is sorted with the help fo subfolders which hold the coin images. The "dataset_coins" folder contains the coin photos divided into obverse and reverse and arranged by coin types. In the "dataset_types" folder the obverse and reverse image of the coins are concatenated and transformed to a quadratic format with black bars on the top and bottom. The images here are sorted by their coin type. The last folder "dataset_mints" contains the also concatenated images sorted by their mint. An "sources" csv file holds the sources for every image. Due to copyrights the image size is limited to 299*299 pixels. However, this should be sufficient for most ML approaches.
The main purpose for this dataset in the CN project is the training of Machine Learning based Image Recognition models. We use three different Convolutional Neural Network based architectures: VGG16, VGG19 and ResNet50. Our best model (VGG16) archieves on this dataset a 79% Top-1 and a 97% Top-5 accuracy for the coin type recognition. The mint recognition achieves an 79% Top-1 and 94% Top-5 accuracy. We have a Colab notebook with two models (trained on the whole CN dataset) online.
During the summer semester 2023, we held the "Data Challenge" event at our Department of Computer Science at the Goethe-University. We gave our students this dataset with the task to achieve better results than us. Here are their experiments:
Team 1: Voting and stacking of models
Team 2: Multimodal model
Team 3: Transformer models
Team 4: Dockerized TIMM Computer Vision Backend & FastAPI
Approach | Type Dataset | Mint Dataset
Ours 79% 79%
Team 1 - 86%
Team 2 86% -
Team 3 88% 58%
Team 4 - -

Now we would like to invite you to try out your own ideas and models on our coin data.
If you have any questions or suggestions, please, feel free to contact us.
Not seeing a result you expected?
Learn how you can add new datasets to our index.

https://github.com/facebookresearch/ppuda/raw/main/data/figs/overview.png" alt="image">
Source: ppuda-repository

Facebook

Twitter

Click to copy link

Link copied

Cite

Anon Anon; Anon Anon (2025). The Denial of Governance Failure in High-Trust Democracies [Dataset]. http://doi.org/10.5281/zenodo.16783246

The Denial of Governance Failure in High-Trust Democracies

Explore at:

bin, csvAvailable download formats

Unique identifier

https://doi.org/10.5281/zenodo.16783246

Dataset updated

Aug 8, 2025

Dataset provided by

Zenodohttp://zenodo.org/

Authors

Anon Anon; Anon Anon

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

How to Run the Analysis in Google Colab

This dataset and code package is designed for execution in Google Colab, which provides a free cloud-based Python environment.
Follow these steps to reproduce the results.

1. Open Google Colab

Visit https://colab.research.google.com
Sign in with your Google account.

2. Access the Notebooks

This repository contains two analysis notebooks:
- Polis.ipynb
- cpi.ipynb
Download them from Zenodo, or open them directly in Colab using File → Upload notebook.

3. Mount Google Drive (Optional but Recommended)

Mounting Google Drive allows you to store the data permanently instead of uploading it each time.

python

CopiarEditar

from google.colab import drive
drive.mount('/content/drive')

After mounting, place all dataset files in a folder inside your Drive (e.g., My Drive/CorruptionStudy/).

4. Required Dataset Files

Ensure the following files are available in your Colab session (either uploaded directly or stored in Drive):

File	Description
`estat_sdg_16_50_en.csv`	Eurostat CPI dataset
`V-Dem-CY-Core-v15.csv`	V-Dem Core dataset
`Controls.xlsx`	Control variables
`Institutional.xlsx`	Institutional variables
`Core.xlsx`	Additional core variables

5. Upload Files (If Not Using Drive)

If you are not using Google Drive, upload all files at the start of your session:

python

CopiarEditar

from google.colab import files
uploaded = files.upload()

Select all required .csv and .xlsx files when prompted.

6. Install Required Python Packages

Run the following command in a Colab cell:

python

CopiarEditar

!pip install pandas numpy statsmodels linearmodels openpyxl

7. Update File Paths in the Notebook

If files are uploaded directly in Colab:

python

CopiarEditar

EUROSTAT_CPI_PATH = "/content/estat_sdg_16_50_en.csv"
VDEM_PATH     = "/content/V-Dem-CY-Core-v15.csv"
CONTROLS_PATH   = "/content/Controls.xlsx"
INSTITUTIONAL_PATH= "/content/Institutional.xlsx"
CORE_PATH     = "/content/Core.xlsx"

If files are stored in Google Drive:

python

CopiarEditar

EUROSTAT_CPI_PATH = "/content/drive/My Drive/CorruptionStudy/estat_sdg_16_50_en.csv"
VDEM_PATH     = "/content/drive/My Drive/CorruptionStudy/V-Dem-CY-Core-v15.csv"

8. Run the Notebook

Execute all cells in order (Runtime → Run all).
The notebook will:
1. Load CPI and V-Dem data
2. Merge with control variables
3. Standardize variables
4. Estimate two-way fixed effects (Driscoll–Kraay standard errors)
5. Output model summaries

9. Save Results

To save results to Google Drive:

python

CopiarEditar

df.to_excel("/content/drive/My Drive/CorruptionStudy/results.xlsx")

To download directly:

python

CopiarEditar

from google.colab import files
files.download("results.xlsx")

10. Citation

If using this dataset or code, please cite the Zenodo record as indicated in the Cite As section.

Zenodo Dataset Description:

Title: Epistemic Legitimacy Traps in High-Trust Democracies: Replication Data and Code

Description:

Contents:

Panel Analysis Data (481 country-year observations, 37 European countries, 2010-2022):
- V-Dem Democracy Dataset (v15) corruption measures
- Eurostat Corruption Perceptions Index data
- Merged and cleaned dataset for two-way fixed effects analysis
Individual-Level Survey Data (66,054 observations):
- Controls, Core, and Institutional survey modules
- Variables measuring friendship reciprocity norms, institutional trust, and socioeconomic outcomes
- Cleaned dataset for OLS regression analysis
Replication Code:
- Python scripts for panel data analysis with Driscoll-Kraay standard errors
- Stata/Python code for individual-level OLS regressions with robust and clustered standard errors
- Data cleaning and variable construction procedures

Key Variables:

Corruption Perceptions Index (Eurostat)
V-Dem corruption measures (executive, public sector, composite)
Institutional quality indicators (judicial independence, civil society participation)
Individual friendship reciprocity and trust measures
Sociodemographic controls

Keywords: corruption, epistemic injustice, institutional legitimacy, democracy, trust, whistleblowing, friendship networks, panel data

Citation: [Author], [Year]. "Epistemic Legitimacy Traps: How High-Trust Institutions Silence Inconvenient Truths." Business Ethics Quarterly [forthcoming].

Data Sources: V-Dem Institute, Eurostat, [Original Survey Data Source]

License: Creative Commons Attribution 4.0 International

Clear search

Close search

Google apps

Main menu

The Denial of Governance Failure in High-Trust Democracies

How to Run the Analysis in Google Colab

1. Open Google Colab

2. Access the Notebooks

3. Mount Google Drive (Optional but Recommended)

4. Required Dataset Files

5. Upload Files (If Not Using Drive)

6. Install Required Python Packages

7. Update File Paths in the Notebook

8. Run the Notebook

9. Save Results

10. Citation

Keras video classification example with a subset of UCF101 - Action...

Data from: Ro90 Dataset

Ro90

deepnets_1m_dataset

Context

Content

Acknowledgements

License

Notes

Human alterations of the global floodplains 1965-2019

Common Voice Corpus 5.1

How it is collected ?

CloudSEN12-high

Accident Detection Model Dataset

Accident-Detection-Model

Problem Statement

Accidents survey

Literature Survey

Research Gap

Proposed methodology

Model Set-up

Preparing Custom dataset

Challenges I ran into

I majorly ran into 3 problems while making this model

Development of an AI/ML-ready knee ultrasound dataset in a population-based...

Data from: Gravity Spy Machine Learning Classifications of LIGO Glitches...

Banana Classification

Quantified dataset: Overexpression

gigaspeech

Butterfly: EVALPBMC_DS dataset

Corpus Nummorum - Coin Image Dataset

The Denial of Governance Failure in High-Trust Democracies

How to Run the Analysis in Google Colab

1. Open Google Colab

2. Access the Notebooks

3. Mount Google Drive (Optional but Recommended)

4. Required Dataset Files

5. Upload Files (If Not Using Drive)

6. Install Required Python Packages

7. Update File Paths in the Notebook

8. Run the Notebook

9. Save Results

10. Citation