Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This upload contains data and documentation for the Python analysis undertaken in Google Colab as part of Episode 1 of the webinar series, conducted by Sambodhi's Center for Health Systems Research and Implementation (CHSRI). You can find the link to the Google Colab notebook here.
All the data uploaded here is open data published by the Toronto Police Public Safety Data Portal and the Ontario Ministry of Health.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Hope_Park_original.csv file.## Contents- sample park analysis.ipynb — The main analysis notebook (Colab/Jupyter format)- Hope_Park_original.csv — Source dataset containing park information- README.md — Documentation for the contents and usage## Usage1. Open the notebook in Google Colab or Jupyter.2. Upload the Hope_Park_original.csv file to the working directory (or adjust the file path in the notebook).3. Run each cell sequentially to reproduce the analysis.## RequirementsThe notebook uses standard Python data science libraries:```pythonpandasnumpymatplotlibseaborn
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
The purpose of this code is to produce a line graph visualization of COVID-19 data. This Jupyter notebook was built and run on Google Colab. This code will serve mostly as a guide and will need to be adapted where necessary to be run locally. The separate COVID-19 datasets uploaded to this Dataverse can be used with this code. This upload is made up of the IPYNB and PDF files of the code.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset contains only the COCO 2017 train images (118K images) and a caption annotation JSON file, designed to fit within Google Colab's available disk space of approximately 50GB when connected to a GPU runtime.
If you're using PyTorch on Google Colab, you can easily utilize this dataset as follows:
Manually downloading and uploading the file to Colab can be time-consuming. Therefore, it's more efficient to download this data directly into Google Colab. Please ensure you have first added your Kaggle key to Google Colab. You can find more details on this process here
from google.colab import drive
import os
import torch
import torchvision.datasets as dset
import torchvision.transforms as transforms
os.environ["KAGGLE_KEY"] = userdata.get('KAGGLE_KEY')
os.environ["KAGGLE_USERNAME"] = userdata.get('KAGGLE_USERNAME')
# Download the Dataset and unzip it
!kaggle datasets download -d seungjunleeofficial/coco2017-image-caption-train
!mkdir "/content/Dataset"
!unzip "coco2017-image-caption-train" -d "/content/Dataset"
# load the dataset
cap = dset.CocoCaptions(root = '/content/Dataset/COCO2017 Image Captioning Train/train2017',
annFile = '/content/Dataset/COCO2017 Image Captioning Train/captions_train2017.json',
transform=transforms.PILToTensor())
You can then use the dataset in the following way:
print(f"Number of samples: {len(cap)}")
img, target = cap[3]
print(img.shape)
print(target)
# Output example: torch.Size([3, 425, 640])
# ['A zebra grazing on lush green grass in a field.', 'Zebra reaching its head down to ground where grass is.',
# 'The zebra is eating grass in the sun.', 'A lone zebra grazing in some green grass.',
# 'A Zebra grazing on grass in a green open field.']
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Antibiotic resistance is a global public health concern. Bacteria have evolved resistance to most antibiotics, which means that for any given bacterial infection, the bacteria may be resistant to one or several antibiotics. It has been suggested that genomic sequencing and machine learning (ML) could make resistance testing more accurate and cost-effective. Given that ML is likely to become an ever more important tool in medicine, we believe that it is important for pre-health students and others in the life sciences to learn to use ML tools. This paper provides a step-by-step tutorial to train 4 different ML models (logistic regression, random forests, extreme gradient-boosted trees, and neural networks) to predict drug resistance for Escherichia coli isolates and to evaluate their performance using different metrics and cross-validation techniques. We also guide the user in how to load and prepare the data used for the ML models. The tutorial is accessible to beginners and does not require any software to be installed as it is based on Google Colab notebooks and provides a basic understanding of the different ML models. The tutorial can be used in undergraduate and graduate classes for students in Biology, Public Health, Computer Science, or related fields.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
S2 File. Colab notebook for AquaWave-BiLSTM model analysis, and results. S3 File. Colab notebook containing SHAP visualizations and interpretability analysis related to PM2.5 prediction. (ZIP)
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This deposit contains the dataset and analysis code supporting the research paper "Recognition Without Implementation: Institutional Gaps and Forestry Expansion in Post-Girjas Swedish Sápmi" by Stefan Holgersson and Scott Brown.
Research Overview: This study examines forestry permit trends in Swedish Sámi territories following the landmark 2020 Girjas Supreme Court ruling, which recognized exclusive Sámi rights over hunting and fishing in traditional lands. Using 432 region-year observations (1998-2024) from the Swedish Forest Agency, we document a 242% increase in clearcutting approvals during 2020-2024 compared to pre-2020 averages, with state/corporate actors showing 313% increases and private landowners 197%.
Key Findings:
Important Limitation: We cannot isolate causal effects of the Girjas ruling from concurrent shocks including COVID-19 economic disruption, EU Taxonomy implementation, and commodity price volatility. The analysis documents institutional conditions and correlational patterns rather than establishing causation.
Dataset Contents:
Clearcut.xlsx: Swedish Forest Agency clearcutting permit data (1998-2024) disaggregated by region, ownership type, and yearSAMI.ipynb: Jupyter notebook containing Python code for descriptive statistics, time series analysis, and figure generationHow to Use These Files in Google Colab:
SAMI.ipynb from your downloadsClearcut.xlsx from your downloads/content/ directoryClearcut.xlsx from the current directoryAlternative method (direct from Zenodo):
# Add this cell at the top of the notebook to download files directly
!wget https://zenodo.org/record/[RECORD_ID]/files/Clearcut.xlsx
Replace [RECORD_ID] with the actual Zenodo record number after publication.
Requirements: The notebook uses standard Python libraries: pandas, numpy, matplotlib, seaborn. These are pre-installed in Google Colab. No additional setup required.
Methodology: Descriptive statistical analysis combined with institutional document review. Data covers eight administrative regions in northern Sweden with mountain-adjacent forests relevant to Sámi reindeer herding territories.
Policy Relevance: Findings inform debates on Indigenous land rights implementation, forestry governance reform, ESG disclosure requirements, and the gap between legal recognition and operational constraints in resource extraction contexts.
Keywords: Indigenous rights, Sámi, forestry governance, legal pluralism, Sweden, Girjas ruling, land tenure, corporate accountability, ESG disclosure
License: Creative Commons Attribution 4.0 International (CC BY 4.0)
Facebook
TwitterAttribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Use this dataset with Misra's Pandas tutorial: How to use the Pandas GroupBy function | Pandas tutorial
The original dataset came from this site: https://data.cityofnewyork.us/City-Government/NYC-Jobs/kpav-sd4t/data
I used Google Colab to filter the columns with the following Pandas commands. Here's a Colab Notebook you can use with the commands listed below: https://colab.research.google.com/drive/17Jpgeytc075CpqDnbQvVMfh9j-f4jM5l?usp=sharing
Once the csv file is uploaded to Google Colab, use these commands to process the file.
import pandas as pd # load the file and create a pandas dataframe df = pd.read_csv('/content/NYC_Jobs.csv') # keep only these columns df = df[['Job ID', 'Civil Service Title', 'Agency', 'Posting Type', 'Job Category', 'Salary Range From', 'Salary Range To' ]] # save the csv file without the index column df.to_csv('/content/NYC_Jobs_filtered_cols.csv', index=False)
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This archive reproduces a table titled "Table 3.1 Boone county population size, 1990 and 2000" from Wang and vom Hofe (2007, p.58). The archive provides a Jupyter Notebook that uses Python and can be run in Google Colaboratory. The workflow uses Census API to retrieve data, reproduce the table, and ensure reproducibility for anyone accessing this archive.The Python code was developed in Google Colaboratory, or Google Colab for short, which is an Integrated Development Environment (IDE) of JupyterLab and streamlines package installation, code collaboration and management. The Census API is used to obtain population counts from the 1990 and 2000 Decennial Census (Summary File 1, 100% data). All downloaded data are maintained in the notebook's temporary working directory while in use. The data are also stored separately with this archive.The notebook features extensive explanations, comments, code snippets, and code output. The notebook can be viewed in a PDF format or downloaded and opened in Google Colab. References to external resources are also provided for the various functional components. The notebook features code to perform the following functions:install/import necessary Python packagesintroduce a Census API Querydownload Census data via CensusAPI manipulate Census tabular data calculate absolute change and percent changeformatting numbersexport the table to csvThe notebook can be modified to perform the same operations for any county in the United States by changing the State and County FIPS code parameters for the Census API downloads. The notebook could be adapted for use in other environments (i.e., Jupyter Notebook) as well as reading and writing files to a local or shared drive, or cloud drive (i.e., Google Drive).
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Scientific and related management challenges in the water domain require synthesis of data from multiple domains. Many data analysis tasks are difficult because datasets are large and complex; standard formats for data types are not always agreed upon nor mapped to an efficient structure for analysis; water scientists may lack training in methods needed to efficiently tackle large and complex datasets; and available tools can make it difficult to share, collaborate around, and reproduce scientific work. Overcoming these barriers to accessing, organizing, and preparing datasets for analyses will be an enabler for transforming scientific inquiries. Building on the HydroShare repository’s established cyberinfrastructure, we have advanced two packages for the Python language that make data loading, organization, and curation for analysis easier, reducing time spent in choosing appropriate data structures and writing code to ingest data. These packages enable automated retrieval of data from HydroShare and the USGS’s National Water Information System (NWIS), loading of data into performant structures keyed to specific scientific data types and that integrate with existing visualization, analysis, and data science capabilities available in Python, and then writing analysis results back to HydroShare for sharing and eventual publication. These capabilities reduce the technical burden for scientists associated with creating a computational environment for executing analyses by installing and maintaining the packages within CUAHSI’s HydroShare-linked JupyterHub server. HydroShare users can leverage these tools to build, share, and publish more reproducible scientific workflows. The HydroShare Python Client and USGS NWIS Data Retrieval packages can be installed within a Python environment on any computer running Microsoft Windows, Apple MacOS, or Linux from the Python Package Index using the PIP utility. They can also be used online via the CUAHSI JupyterHub server (https://jupyterhub.cuahsi.org/) or other Python notebook environments like Google Collaboratory (https://colab.research.google.com/). Source code, documentation, and examples for the software are freely available in GitHub at https://github.com/hydroshare/hsclient/ and https://github.com/USGS-python/dataretrieval.
This presentation was delivered as part of the Hawai'i Data Science Institute's regular seminar series: https://datascience.hawaii.edu/event/data-science-and-analytics-for-water/
Facebook
TwitterThis dataset contains the predicted prices of the asset Collab.Land over the next 16 years. This data is calculated initially using a default 5 percent annual growth rate, and after page load, it features a sliding scale component where the user can then further adjust the growth rate to their own positive or negative projections. The maximum positive adjustable growth rate is 100 percent, and the minimum adjustable growth rate is -100 percent.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Here's a clear Zenodo description for your dataset:
Dataset Description
This dataset supports the research paper "Nou Pa Bèt: Civic Substitution and Expressive Freedoms in Post-State Governance" which examines how civic participation functions as institutional substitution in fragile states, with Haiti as the primary case study. The dataset combines governance indicators from the World Bank's Worldwide Governance Indicators (WGI) with civic engagement measures from the Varieties of Democracy (V-Dem) project.
Files Included:
How to Use in Google Colab:
Step 1: Upload Files
from google.colab import files
import pandas as pd
import numpy as np
# Upload the files to your Colab environment
uploaded = files.upload()
# Select and upload: CivicEngagement_SelectedCountries_Last10Years.xlsx and wgidataset.xlsx
Step 2: Load the Datasets
# Load the civic engagement data (main analysis dataset)
civic_data = pd.read_excel('CivicEngagement_SelectedCountries_Last10Years.xlsx')
# Load the WGI data (if needed for extended analysis)
wgi_data = pd.read_excel('wgidataset.xlsx')
# Display basic information
print("Civic Engagement Dataset Shape:", civic_data.shape)
print("
Columns:", civic_data.columns.tolist())
print("
First few rows:")
civic_data.head()
Step 3: Run the Analysis Notebook
# Download and run the complete analysis notebook
!wget https://zenodo.org/record/[RECORD_ID]/files/civic.ipynb
# Then open civic.ipynb in Colab or copy/paste the code cells
Key Variables:
Dependent Variables (WGI):
Control_of_Corruption - Extent to which public power is exercised for private gainGovernment_Effectiveness - Quality of public services and policy implementationIndependent Variables (V-Dem):
v2x_partip - Participatory Component Indexv2x_cspart - Civil Society Participation Indexv2cademmob - Freedom of Peaceful Assemblyv2cafres - Freedom of Expressionv2csantimv - Anti-System Movementsv2xdd_dd - Direct Popular Vote IndexSample Countries: 21 fragile states including Haiti, Sierra Leone, Liberia, DRC, CAR, Guinea-Bissau, Chad, Niger, Burundi, Yemen, South Sudan, Mozambique, Sudan, Eritrea, Somalia, Mali, Afghanistan, Papua New Guinea, Togo, Cambodia, and Timor-Leste.
Quick Start Analysis:
# Install required packages
!pip install statsmodels scipy
# Basic regression replication
import statsmodels.api as sm
from statsmodels.stats.outliers_influence import variance_inflation_factor
# Prepare variables for regression
X = civic_data[['v2x_partip', 'v2x_cspart', 'v2cademmob', 'v2cafres', 'v2csantimv', 'v2xdd_dd']].dropna()
y_corruption = civic_data['Control_of_Corruption'].dropna()
y_effectiveness = civic_data['Government_Effectiveness'].dropna()
# Run regression (example for Control of Corruption)
X_const = sm.add_constant(X)
model = sm.OLS(y_corruption, X_const).fit(cov_type='HC3')
print(model.summary())
Citation: Brown, Scott M., Fils-Aime, Jempsy, & LaTortue, Paul. (2025). Nou Pa Bèt: Civic Substitution and Expressive Freedoms in Post-State Governance [Dataset]. Zenodo. https://doi.org/10.5281/zenodo.15058161
License: Creative Commons Attribution 4.0 International (CC BY 4.0)
Contact: For questions about data usage or methodology, please contact the corresponding author through the institutional affiliations provided in the paper.
This description provides clear, step-by-step instructions for researchers to immediately begin working with your data in Google Colab while explaining the theoretical and methodological context.
Facebook
TwitterThis dataset contains the predicted prices of the asset Eyaa X ….. Unexpected Collab is coming over the next 16 years. This data is calculated initially using a default 5 percent annual growth rate, and after page load, it features a sliding scale component where the user can then further adjust the growth rate to their own positive or negative projections. The maximum positive adjustable growth rate is 100 percent, and the minimum adjustable growth rate is -100 percent.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Instructions (with screenshots) to replicate results from Section 3 of the manuscript are available in "Step-by-step Instructions to Replicate Results.pdf".-------------------------------------------------------------------------------------------------------------------Step 1: Download the replication materialsDownload the whole replication folder on figshare containing the code, data and replication files.Step 2: Replicate Tables in Section 3All of the data is available inside the sub-folder replication/Data. To replicate Tables 1 and 2 from section 3 of the manuscript run the Python file replicate_section3_tables.py locally on your computer. This will produce two .csv files containing Tables 1 and 2 (already provided). Note that it is not necessary to run the code in order to replicate the tables. The output data needed for replication is provided.Step 3: Replicate Figures in QGISThe Figures must be replicated using QGIS, freely available at https://www.qgis.org/. Open the QGIS project replicate_figures.qgz inside the replication/Replicate Figures sub-folder. It should auto-find the layer data. The Figures are replicated as layers in the project. Step 4: Running the code from scratchThe accompanying code for the manuscript IJGIS-2024-1305, entitled "Route-based Geocoding of Traffic Congestion-Related Social Media Texts on a Complex Network" runs on Google Colab as Python notebooks. Please follow the instructions below to run the entire geocoder and network mapper from scratch. The expected running time is of the order of 10 hours on free tier Google Colab. 4a) Upload to Google DriveUpload the entire replication folder to your Google Drive. Note the path (location) to which you have uploaded it. There are two Google Colab notebooks that need to be executed in their entirety. These are Code/Geocoder/The_Geocoder.ipynb and Code/Complex_Network/Complex_network_code.ipynb. They need to be run in order (Geocoder first and Complex Network second). 4b) Set the path In each Google Colab notebook, you have to set the variable called “REPL_PATH” to the location on your Google Drive where you uploaded the replication folder. Include the replication folder in the path. For example "/content/drive/MyDrive/replication"4c) Run the codeThe code is available in two sub-folders, replication/Code/Geocoder and replication/Code/Complex_Network. You may simply open the Google Colab notebooks inside each folder, mount your Google Drive, set the path and run all cells.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description:
This dataset accompanies the empirical analysis in Legality Without Justice, a study examining the relationship between public trust in institutions and perceived governance legitimacy using data from the World Values Survey Wave 7 (2017–2022). It includes:
WVS_Cross-National_Wave_7_csv_v6_0.csv — World Values Survey Wave 7 core data.
GDP.csv — World Bank GDP per capita (current US$) for 2022 by country.
denial.ipynb — Fully documented Jupyter notebook with code for data merging, exploratory statistics, and ordinal logistic regression using OrderedModel. Includes GDP as a control for institutional trust and perceived governance.
All data processing and analysis were conducted in Python using FAIR reproducibility principles and can be replicated or extended on Google Colab.
DOI: 10.5281/zenodo.16361108
License: Creative Commons Attribution 4.0 International (CC BY 4.0)
Authors: Anon Annotator
Publication date: 2025-07-23
Language: English
Version: 1.0.0
Publisher: Zenodo
Programming language: Python
Go to https://colab.research.google.com
Click File > Upload notebook, and upload the denial.ipynb file.
Also upload the CSVs (WVS_Cross-National_Wave_7_csv_v6_0.csv and GDP.csv) using the file browser on the left sidebar.
In denial.ipynb, ensure file paths match:
wvs = pd.read_csv('/content/WVS_Cross-National_Wave_7_csv_v6_0.csv')
gdp = pd.read_csv('/content/GDP.csv')
Execute the notebook cells from top to bottom. You may need to install required libraries:
!pip install statsmodels pandas numpy
The notebook performs:
Data cleaning
Merging WVS and GDP datasets
Summary statistics
Ordered logistic regression to test if confidence in courts/police (Q57, Q58) predicts belief that the country is governed in the interest of the people (Q183), controlling for GDP.
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
The full semantic dataset is hosted on Kaggle:
👉 https://www.kaggle.com/datasets/cjc0013/epstein-bge-large-hdbscan-bm25/data
Epstein Semantic Explorer v5 is a lightweight, open-source investigation toolkit for analyzing the text fragments released by the House Oversight Committee (November 2025).
This tool does not add new allegations. It simply makes the chaotic, fragmented congressional release usable, by providing:
Everything runs locally in Colab, with no external APIs, servers, or private models.
Explore semantically grouped themes: legal strategy, PR coordination, iMessage logs, internal disputes, travel notes, media monitoring, and more.
view_cluster(96)
Instant relevance-ranked search across all 9,666 documents.
search("Prince Andrew")
search("Clinton")
search("Ghislaine")
Get a fast narrative overview of what a cluster contains.
summarize_cluster(96)
Shows the most meaningful terms defining each cluster.
show_topics()
Identify the most-referenced people, places, and organizations in any cluster.
cluster_entities(12)
Searches all documents for dates and assembles a chronological list.
show_timeline()
See which clusters relate to which — using cosine similarity on text centroids.
cluster_similarity()
Find out where a name appears most often across the entire corpus.
entity_to_clusters("Epstein")
entity_to_clusters("Maxwell")
entity_to_clusters("Barak")
You only need one file:
epstein_semantic.jsonlEach line is:
{"id": "HOUSE_OVERSIGHT_023051", "cluster": 96, "text": "...document text..."}
{"id": "HOUSE_OVERSIGHT_028614", "cluster": 122, "text": "...document text..."}
id — original document identifiercluster — HDBSCAN semantic clustertext — raw text fragmentNo PDFs, images, or external metadata required.
Open Google Colab → upload:
Epstein_Semantic_Explorer_v5.ipynb
Colab → Runtime → Run all
When prompted:
Upload epstein_semantic.jsonl
If the file is already in /content/, the notebook will auto-detect it.
Now try:
view_cluster(96)
search("Prince Andrew")
show_topics()
cluster_entities(96)
Everything runs on CPU. No GPU required.
No. This only reorganizes public text fragments released by Congress.
No. All analysis stays inside your Colab runtime.
Yes. It’s intentionally simple and transparent — point, click, search.
Yes, as long as you clarify:
Epstein Semantic Explorer v5 turns the unstructured House Oversight text archive into a searchable, analyzable, cluster-organized dataset, enabling:
This tool makes the archive usable — but does not alter or invent any content.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Web Scraped Dataset for Open Access, Indexed and Scopus Indexed Journals from openacessjournal.com!
The Main Dataset is : MainOpenAccessJournalsData.csv, (alternate identical file : MainOpenAccessJournalsData.xlsx)
Columns : JournalName, ImpactFactor2020, Source, Type, Title, Publisher, ISSN, DoIsbyYear, BackFileDoIs, CurrentDoIs, TotalDoIs, Subjects, ImpactFactor, Journals_Metadata_Paths
Impact Factor List Journals - https://www.openacessjournal.com/impact-factor-list-journals Indexed Journal Lists - https://www.openacessjournal.com/indexed-journals-list Scopus Indexed (SCI) Journals - https://www.openacessjournal.com/blog/scopus-indexed-journals/
Research Data is often missing. From this dataset, we can gain that how a Journal can be classified as high scoring or low scoring on the Impact Factor Scale!
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Food/Not Food Image Caption Dataset
Small dataset of synthetic food and not food image captions. Text generated using Mistral Chat/Mixtral. Can be used to train a text classifier on food/not_food image captions as a demo before scaling up to a larger dataset. See Colab notebook on how dataset was created.
Example usage
import random from datasets import load_dataset
loaded_dataset = load_dataset("mrdbourke/learn_hf_food_not_food_image_captions")
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset accompanies the study The Cultural Resource Curse: How Trade Dependence Undermines Creative Industries. It contains country-year panel data for 2000–2023 covering both OECD economies and the ten largest Latin American countries by land area. Variables include GDP per capita (constant PPP, USD), trade openness, internet penetration, education indicators, cultural exports per capita, and executive constraints from the Polity V dataset.
The dataset supports a comparative analysis of how economic structure, institutional quality, and infrastructure shape cultural export performance across development contexts. Within-country fixed effects models show that trade openness constrains cultural exports in OECD economies but has no measurable effect in resource-dependent Latin America. In contrast, strong executive constraints benefit cultural industries in advanced economies while constraining them in extraction-oriented systems. The results provide empirical evidence for a two-stage development framework in which colonial extraction legacies create distinct constraints on creative industry growth.
All variables are harmonized to ISO3 country codes and aligned on a common panel structure. The dataset is fully reproducible using the included Jupyter notebooks (OECD.ipynb, LATAM+OECD.ipynb, cervantes.ipynb).
Contents:
GDPPC.csv — GDP per capita series from the World Bank.
explanatory.csv — Trade openness, internet penetration, and education indicators.
culture_exports.csv — UNESCO cultural export data.
p5v2018.csv — Polity V institutional indicators.
Jupyter notebooks for data processing and replication.
Potential uses: Comparative political economy, cultural economics, institutional development, and resource curse research.
These steps reproduce the OECD vs. Latin America analyses from the paper using the provided CSVs and notebooks.
Click File → New notebook.
(Optional) If your files are in Google Drive, mount it:
from google.colab import drive
drive.mount('/content/drive')
You have two easy options:
A. Upload the 4 CSVs + notebooks directly
In the left sidebar, click the folder icon → Upload.
Upload: GDPPC.csv, explanatory.csv, culture_exports.csv, p5v2018.csv, and any .ipynb you want to run.
B. Use Google Drive
Put those files in a Drive folder.
After mounting Drive, refer to them with paths like /content/drive/MyDrive/your_folder/GDPPC.csv.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
2,121,458 records
I used Google Colab to check out this dataset and pull the column names using Pandas.
Sample code example: Python Pandas read csv file compressed with gzip and load into Pandas dataframe https://pastexy.com/106/python-pandas-read-csv-file-compressed-with-gzip-and-load-into-pandas-dataframe
Columns: ['Date received', 'Product', 'Sub-product', 'Issue', 'Sub-issue', 'Consumer complaint narrative', 'Company public response', 'Company', 'State', 'ZIP code', 'Tags', 'Consumer consent provided?', 'Submitted via', 'Date sent to company', 'Company response to consumer', 'Timely response?', 'Consumer disputed?', 'Complaint ID']
I did not modify the dataset.
Use it to practice with dataframes - Pandas or PySpark on Google Colab:
!unzip complaints.csv.zip
import pandas as pd df = pd.read_csv('complaints.csv') df.columns
df.head() etc.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This upload contains data and documentation for the Python analysis undertaken in Google Colab as part of Episode 1 of the webinar series, conducted by Sambodhi's Center for Health Systems Research and Implementation (CHSRI). You can find the link to the Google Colab notebook here.
All the data uploaded here is open data published by the Toronto Police Public Safety Data Portal and the Ontario Ministry of Health.