25 datasets found
  1. Toronto Emergency Response - Open Data

    • zenodo.org
    zip
    Updated Aug 30, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Piyush Kumar; Piyush Kumar (2024). Toronto Emergency Response - Open Data [Dataset]. http://doi.org/10.5281/zenodo.13578078
    Explore at:
    zipAvailable download formats
    Dataset updated
    Aug 30, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Piyush Kumar; Piyush Kumar
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Toronto
    Description

    This upload contains data and documentation for the Python analysis undertaken in Google Colab as part of Episode 1 of the webinar series, conducted by Sambodhi's Center for Health Systems Research and Implementation (CHSRI). You can find the link to the Google Colab notebook here.

    All the data uploaded here is open data published by the Toronto Police Public Safety Data Portal and the Ontario Ministry of Health.

  2. B

    Python Code for Visualizing COVID-19 data

    • borealisdata.ca
    • datasetcatalog.nlm.nih.gov
    • +1more
    Updated Dec 16, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ryan Chartier; Geoffrey Rockwell (2023). Python Code for Visualizing COVID-19 data [Dataset]. http://doi.org/10.5683/SP3/PYEQL0
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Dec 16, 2023
    Dataset provided by
    Borealis
    Authors
    Ryan Chartier; Geoffrey Rockwell
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    The purpose of this code is to produce a line graph visualization of COVID-19 data. This Jupyter notebook was built and run on Google Colab. This code will serve mostly as a guide and will need to be adapted where necessary to be run locally. The separate COVID-19 datasets uploaded to this Dataverse can be used with this code. This upload is made up of the IPYNB and PDF files of the code.

  3. COCO2017 Image Caption Train

    • kaggle.com
    zip
    Updated May 30, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Seungjun Lee (2024). COCO2017 Image Caption Train [Dataset]. https://www.kaggle.com/datasets/seungjunleeofficial/coco2017-image-caption-train
    Explore at:
    zip(19236355851 bytes)Available download formats
    Dataset updated
    May 30, 2024
    Authors
    Seungjun Lee
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset contains only the COCO 2017 train images (118K images) and a caption annotation JSON file, designed to fit within Google Colab's available disk space of approximately 50GB when connected to a GPU runtime.

    If you're using PyTorch on Google Colab, you can easily utilize this dataset as follows:

    Manually downloading and uploading the file to Colab can be time-consuming. Therefore, it's more efficient to download this data directly into Google Colab. Please ensure you have first added your Kaggle key to Google Colab. You can find more details on this process here

    from google.colab import drive
    import os
    import torch
    import torchvision.datasets as dset
    import torchvision.transforms as transforms
    
    os.environ["KAGGLE_KEY"] = userdata.get('KAGGLE_KEY')
    os.environ["KAGGLE_USERNAME"] = userdata.get('KAGGLE_USERNAME')
    
    # Download the Dataset and unzip it
    !kaggle datasets download -d seungjunleeofficial/coco2017-image-caption-train
    !mkdir "/content/Dataset"
    !unzip "coco2017-image-caption-train" -d "/content/Dataset"
    
    
    # load the dataset
    cap = dset.CocoCaptions(root = '/content/Dataset/COCO2017 Image Captioning Train/train2017',
                annFile = '/content/Dataset/COCO2017 Image Captioning Train/captions_train2017.json',
                transform=transforms.PILToTensor())
    

    You can then use the dataset in the following way:

    print(f"Number of samples: {len(cap)}")
    img, target = cap[3]
    print(img.shape)
    print(target)
    # Output example: torch.Size([3, 425, 640])
    # ['A zebra grazing on lush green grass in a field.', 'Zebra reaching its head down to ground where grass is.', 
    # 'The zebra is eating grass in the sun.', 'A lone zebra grazing in some green grass.', 
    # 'A Zebra grazing on grass in a green open field.']
    
  4. h

    Turkish_Basketball_Super_League_Dataset

    • huggingface.co
    Updated Aug 10, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Muhammed Onur ulu (2025). Turkish_Basketball_Super_League_Dataset [Dataset]. https://huggingface.co/datasets/onurulu17/Turkish_Basketball_Super_League_Dataset
    Explore at:
    Dataset updated
    Aug 10, 2025
    Authors
    Muhammed Onur ulu
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    📥 Load Dataset in Python

    To load this dataset in Google Colab or any Python environment:
    !pip install huggingface_hub pandas openpyxl

    from huggingface_hub import hf_hub_download import pandas as pd

    repo_id = "onurulu17/Turkish_Basketball_Super_League_Dataset"

    files = [ "leaderboard.xlsx", "player_data.xlsx", "team_data.xlsx", "team_matches.xlsx", "player_statistics.xlsx", "technic_roster.xlsx" ]

    datasets = {}

    for f in files: path =… See the full description on the dataset page: https://huggingface.co/datasets/onurulu17/Turkish_Basketball_Super_League_Dataset.

  5. Top Rated TV Shows

    • kaggle.com
    zip
    Updated Jan 5, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Shreya Gupta (2025). Top Rated TV Shows [Dataset]. https://www.kaggle.com/datasets/shreyajii/top-rated-tv-shows
    Explore at:
    zip(314571 bytes)Available download formats
    Dataset updated
    Jan 5, 2025
    Authors
    Shreya Gupta
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    This dataset provides information about top-rated TV shows, collected from The Movie Database (TMDb) API. It can be used for data analysis, recommendation systems, and insights on popular television content.

    Key Stats:

    Total Pages: 109 Total Results: 2098 TV shows Data Source: TMDb API Sorting Criteria: Highest-rated by vote_average (average rating) with a minimum vote count of 200 Data Fields (Columns):

    id: Unique identifier for the TV show name: Title of the TV show vote_average: Average rating given by users vote_count: Total number of votes received first_air_date: The date when the show was first aired original_language: Language in which the show was originally produced genre_ids: Genre IDs linked to the show's genres overview: A brief summary of the show popularity: Popularity score based on audience engagement poster_path: URL path for the show's poster image Accessing the Dataset via API (Python Example):

    python Copy code import requests

    api_key = 'YOUR_API_KEY_HERE' url = "https://api.themoviedb.org/3/discover/tv" params = { 'api_key': api_key, 'include_adult': 'false', 'language': 'en-US', 'page': 1, 'sort_by': 'vote_average.desc', 'vote_count.gte': 200 }

    response = requests.get(url, params=params) data = response.json()

    Display the first show

    print(data['results'][0]) Dataset Use Cases:

    Data Analysis: Explore trends in highly-rated TV shows. Recommendation Systems: Build personalized TV show suggestions. Visualization: Create charts to showcase ratings or genre distribution. Machine Learning: Predict show popularity using historical data. Exporting and Sharing the Dataset (Google Colab Example):

    python Copy code import pandas as pd

    Convert the API data to a DataFrame

    df = pd.DataFrame(data['results'])

    Save to CSV and upload to Google Drive

    from google.colab import drive drive.mount('/content/drive') df.to_csv('/content/drive/MyDrive/top_rated_tv_shows.csv', index=False) Ways to Share the Dataset:

    Google Drive: Upload and share a public link. Kaggle: Create a public dataset for collaboration. GitHub: Host the CSV file in a repository for easy sharing.

  6. Sample Park Analysis

    • figshare.com
    zip
    Updated Nov 2, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Eric Delmelle (2025). Sample Park Analysis [Dataset]. http://doi.org/10.6084/m9.figshare.30509021.v1
    Explore at:
    zipAvailable download formats
    Dataset updated
    Nov 2, 2025
    Dataset provided by
    figshare
    Figsharehttp://figshare.com/
    Authors
    Eric Delmelle
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    README – Sample Park Analysis## OverviewThis repository contains a Google Colab / Jupyter notebook and accompanying dataset used for analyzing park features and associated metrics. The notebook demonstrates data loading, cleaning, and exploratory analysis of the Hope_Park_original.csv file.## Contents- sample park analysis.ipynb — The main analysis notebook (Colab/Jupyter format)- Hope_Park_original.csv — Source dataset containing park information- README.md — Documentation for the contents and usage## Usage1. Open the notebook in Google Colab or Jupyter.2. Upload the Hope_Park_original.csv file to the working directory (or adjust the file path in the notebook).3. Run each cell sequentially to reproduce the analysis.## RequirementsThe notebook uses standard Python data science libraries:```pythonpandasnumpymatplotlibseaborn

  7. NYC Jobs Dataset (Filtered Columns)

    • kaggle.com
    zip
    Updated Oct 5, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jeffery Mandrake (2022). NYC Jobs Dataset (Filtered Columns) [Dataset]. https://www.kaggle.com/datasets/jefferymandrake/nyc-jobs-filtered-cols
    Explore at:
    zip(93408 bytes)Available download formats
    Dataset updated
    Oct 5, 2022
    Authors
    Jeffery Mandrake
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Area covered
    New York
    Description

    Use this dataset with Misra's Pandas tutorial: How to use the Pandas GroupBy function | Pandas tutorial

    The original dataset came from this site: https://data.cityofnewyork.us/City-Government/NYC-Jobs/kpav-sd4t/data

    I used Google Colab to filter the columns with the following Pandas commands. Here's a Colab Notebook you can use with the commands listed below: https://colab.research.google.com/drive/17Jpgeytc075CpqDnbQvVMfh9j-f4jM5l?usp=sharing

    Once the csv file is uploaded to Google Colab, use these commands to process the file.

    import pandas as pd # load the file and create a pandas dataframe df = pd.read_csv('/content/NYC_Jobs.csv') # keep only these columns df = df[['Job ID', 'Civil Service Title', 'Agency', 'Posting Type', 'Job Category', 'Salary Range From', 'Salary Range To' ]] # save the csv file without the index column df.to_csv('/content/NYC_Jobs_filtered_cols.csv', index=False)

  8. Nou Pa Bèt: Civic Substitution and Expressive Freedoms in Post-State...

    • zenodo.org
    bin
    Updated Aug 13, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Brown Scott; Brown Scott (2025). Nou Pa Bèt: Civic Substitution and Expressive Freedoms in Post-State Governance [Dataset]. http://doi.org/10.5281/zenodo.16858858
    Explore at:
    binAvailable download formats
    Dataset updated
    Aug 13, 2025
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Brown Scott; Brown Scott
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Here's a clear Zenodo description for your dataset:

    Dataset Description

    This dataset supports the research paper "Nou Pa Bèt: Civic Substitution and Expressive Freedoms in Post-State Governance" which examines how civic participation functions as institutional substitution in fragile states, with Haiti as the primary case study. The dataset combines governance indicators from the World Bank's Worldwide Governance Indicators (WGI) with civic engagement measures from the Varieties of Democracy (V-Dem) project.

    Files Included:

    1. wgidataset.xlsx (2.57 MB) - Complete World Bank Worldwide Governance Indicators dataset covering multiple governance dimensions across countries and years
    2. CivicEngagement_SelectedCountries_Last10Years.xlsx (25.03 KB) - Processed V-Dem civic engagement indicators for fragile states sample (2015-2024) including variables for participatory governance, civil society participation, freedom of expression, freedom of assembly, anti-system movements, and direct democracy
    3. civic.ipynb (10.35 KB) - Complete Python analysis notebook containing all data processing, regression analysis, and visualization code used in the study

    How to Use in Google Colab:

    Step 1: Upload Files

    python
    from google.colab import files
    import pandas as pd
    import numpy as np
    
    # Upload the files to your Colab environment
    uploaded = files.upload()
    # Select and upload: CivicEngagement_SelectedCountries_Last10Years.xlsx and wgidataset.xlsx

    Step 2: Load the Datasets

    python
    # Load the civic engagement data (main analysis dataset)
    civic_data = pd.read_excel('CivicEngagement_SelectedCountries_Last10Years.xlsx')
    
    # Load the WGI data (if needed for extended analysis)
    wgi_data = pd.read_excel('wgidataset.xlsx')
    
    # Display basic information
    print("Civic Engagement Dataset Shape:", civic_data.shape)
    print("
    Columns:", civic_data.columns.tolist())
    print("
    First few rows:")
    civic_data.head()

    Step 3: Run the Analysis Notebook

    python
    # Download and run the complete analysis notebook
    !wget https://zenodo.org/record/[RECORD_ID]/files/civic.ipynb
    # Then open civic.ipynb in Colab or copy/paste the code cells

    Key Variables:

    Dependent Variables (WGI):

    • Control_of_Corruption - Extent to which public power is exercised for private gain
    • Government_Effectiveness - Quality of public services and policy implementation

    Independent Variables (V-Dem):

    • v2x_partip - Participatory Component Index
    • v2x_cspart - Civil Society Participation Index
    • v2cademmob - Freedom of Peaceful Assembly
    • v2cafres - Freedom of Expression
    • v2csantimv - Anti-System Movements
    • v2xdd_dd - Direct Popular Vote Index

    Sample Countries: 21 fragile states including Haiti, Sierra Leone, Liberia, DRC, CAR, Guinea-Bissau, Chad, Niger, Burundi, Yemen, South Sudan, Mozambique, Sudan, Eritrea, Somalia, Mali, Afghanistan, Papua New Guinea, Togo, Cambodia, and Timor-Leste.

    Quick Start Analysis:

    python
    # Install required packages
    !pip install statsmodels scipy
    
    # Basic regression replication
    import statsmodels.api as sm
    from statsmodels.stats.outliers_influence import variance_inflation_factor
    
    # Prepare variables for regression
    X = civic_data[['v2x_partip', 'v2x_cspart', 'v2cademmob', 'v2cafres', 'v2csantimv', 'v2xdd_dd']].dropna()
    y_corruption = civic_data['Control_of_Corruption'].dropna()
    y_effectiveness = civic_data['Government_Effectiveness'].dropna()
    
    # Run regression (example for Control of Corruption)
    X_const = sm.add_constant(X)
    model = sm.OLS(y_corruption, X_const).fit(cov_type='HC3')
    print(model.summary())

    Citation: Brown, Scott M., Fils-Aime, Jempsy, & LaTortue, Paul. (2025). Nou Pa Bèt: Civic Substitution and Expressive Freedoms in Post-State Governance [Dataset]. Zenodo. https://doi.org/10.5281/zenodo.15058161

    License: Creative Commons Attribution 4.0 International (CC BY 4.0)

    Contact: For questions about data usage or methodology, please contact the corresponding author through the institutional affiliations provided in the paper.

    This description provides clear, step-by-step instructions for researchers to immediately begin working with your data in Google Colab while explaining the theoretical and methodological context.

  9. epstein bge large hdbscan bm25

    • kaggle.com
    zip
    Updated Nov 15, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    cjc0013 (2025). epstein bge large hdbscan bm25 [Dataset]. https://www.kaggle.com/datasets/cjc0013/epstein-bge-large-hdbscan-bm25
    Explore at:
    zip(17884547 bytes)Available download formats
    Dataset updated
    Nov 15, 2025
    Authors
    cjc0013
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Epstein Semantic Explorer v5

    Semantic clusters, entity extraction, and BM25 search across recently released public Congressional Epstein documents

    The full semantic dataset is hosted on Kaggle:

    👉 https://www.kaggle.com/datasets/cjc0013/epstein-bge-large-hdbscan-bm25/data

    📘 Overview

    Epstein Semantic Explorer v5 is a lightweight, open-source investigation toolkit for analyzing the text fragments released by the House Oversight Committee (November 2025).

    This tool does not add new allegations. It simply makes the chaotic, fragmented congressional release usable, by providing:

    • semantic clustering (HDBSCAN)
    • lightweight topic modeling
    • BM25 keyword search
    • entity extraction
    • cross-cluster linking
    • timeline extraction
    • document browser

    Everything runs locally in Colab, with no external APIs, servers, or private models.

    📦 What This Notebook Enables

    1. Cluster Browser

    Explore semantically grouped themes: legal strategy, PR coordination, iMessage logs, internal disputes, travel notes, media monitoring, and more.

    view_cluster(96)
    

    2. Keyword Search (BM25-lite)

    Instant relevance-ranked search across all 9,666 documents.

    search("Prince Andrew")
    search("Clinton")
    search("Ghislaine")
    

    3. Cluster Summaries

    Get a fast narrative overview of what a cluster contains.

    summarize_cluster(96)
    

    4. Topic Modeling (stopword-filtered centroids)

    Shows the most meaningful terms defining each cluster.

    show_topics()
    

    5. Entity Extraction

    Identify the most-referenced people, places, and organizations in any cluster.

    cluster_entities(12)
    

    6. Timeline Extraction

    Searches all documents for dates and assembles a chronological list.

    show_timeline()
    

    7. Cluster Similarity Matrix

    See which clusters relate to which — using cosine similarity on text centroids.

    cluster_similarity()
    

    8. Cross-Cluster Entity Search

    Find out where a name appears most often across the entire corpus.

    entity_to_clusters("Epstein")
    entity_to_clusters("Maxwell")
    entity_to_clusters("Barak")
    

    📁 Dataset Format

    You only need one file:

    epstein_semantic.jsonl

    Each line is:

    {"id": "HOUSE_OVERSIGHT_023051", "cluster": 96, "text": "...document text..."}
    {"id": "HOUSE_OVERSIGHT_028614", "cluster": 122, "text": "...document text..."}
    
    • id — original document identifier
    • cluster — HDBSCAN semantic cluster
    • text — raw text fragment

    No PDFs, images, or external metadata required.

    🚀 How to Use (Colab)

    Step 1 — Upload the Notebook

    Open Google Colab → upload:

    Epstein_Semantic_Explorer_v5.ipynb
    

    Step 2 — Run All Cells

    Colab → Runtime → Run all

    Step 3 — Upload Your Data

    When prompted:

    Upload epstein_semantic.jsonl
    

    If the file is already in /content/, the notebook will auto-detect it.

    Step 4 — Explore

    Now try:

    view_cluster(96)
    search("Prince Andrew")
    show_topics()
    cluster_entities(96)
    

    Everything runs on CPU. No GPU required.

    🧩 FAQ

    ❓ Does this create new allegations?

    No. This only reorganizes public text fragments released by Congress.

    ❓ Does this send data anywhere?

    No. All analysis stays inside your Colab runtime.

    ❓ Is this built for reporters?

    Yes. It’s intentionally simple and transparent — point, click, search.

    ❓ Is it safe to publish or link to?

    Yes, as long as you clarify:

    • data was publicly released by the House Oversight Committee
    • no new text was generated
    • this is semantic reorganization only

    📝 Summary

    Epstein Semantic Explorer v5 turns the unstructured House Oversight text archive into a searchable, analyzable, cluster-organized dataset, enabling:

    • rapid investigative discovery
    • cluster-level narrative reconstruction
    • entity frequency analysis
    • timeline mapping
    • correlation across clusters

    This tool makes the archive usable — but does not alter or invent any content.

  10. m

    Data for "Prediction of Phakic Intraocular Lens Vault Using Machine Learning...

    • data.mendeley.com
    Updated Nov 18, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    TaeKeun Yoo (2020). Data for "Prediction of Phakic Intraocular Lens Vault Using Machine Learning of Anterior Segment Optical Coherence Tomography Metrics" [Dataset]. http://doi.org/10.17632/ffn745r57z.1
    Explore at:
    Dataset updated
    Nov 18, 2020
    Authors
    TaeKeun Yoo
    License

    Attribution-NonCommercial 3.0 (CC BY-NC 3.0)https://creativecommons.org/licenses/by-nc/3.0/
    License information was derived automatically

    Description

    Prediction of Phakic Intraocular Lens Vault Using Machine Learning of Anterior Segment Optical Coherence Tomography Metrics. Authors: Kazutaka Kamiya, MD, PhD1, Ik Hee Ryu, MD, MS2, Tae Keun Yoo, MD2, Jung Sub Kim MD2, In Sik Lee, MD, PhD2, Jin Kook Kim MD2, Wakako Ando CO3, Nobuyuki Shoji, MD, PhD3, Tomofusa, Yamauchi, MD, PhD4, Hitoshi Tabuchi, MD, PhD4. Author Affiliation: 1Visual Physiology, School of Allied Health Sciences, Kitasato University, Kanagawa, Japan, 2B&VIIT Eye Center, Seoul, Korea, 3Department of Ophthalmology, School of Medicine, Kitasato University, Kanagawa, Japan, 4Department of Ophthalmology, Tsukazaki Hospital, Hyogo, Japan.

    We hypothesize that machine learning of preoperative biometric data obtained by the As-OCT may be clinically beneficial for predicting the actual ICL vault. Therefore, we built the machine learning model using Random Forest to predict ICL vault after surgery.

    This multicenter study comprised one thousand seven hundred forty-five eyes of 1745 consecutive patients (656 men and 1089 women), who underwent EVO ICL implantation (V4c and V5 Visian ICL with KS-AquaPORT) for the correction of moderate to high myopia and myopic astigmatism, and who completed at least a 1-month follow-up, at Kitasato University Hospital (Kanagawa, Japan), or at B&VIIT Eye Center (Seoul, Korea).

    This data file (RFR_model(feature=12).mat) is the final trained random forest model for MATLAB 2020a.

    Python version:

    from sklearn.model_selection import train_test_split import pandas as pd import numpy as np from sklearn.ensemble import RandomForestClassifier from sklearn.ensemble import RandomForestRegressor

    connect data in your google drive

    from google.colab import auth auth.authenticate_user() from google.colab import drive drive.mount('/content/gdrive')

    Change the path for the custom data

    In this case, we used ICL vault prediction using preop measurement

    dataset = pd.read_csv('gdrive/My Drive/ICL/data_icl.csv') dataset.head()

    optimal features (sorted by importance) :

    1. ICL size 2. ICL power 3. LV 4. CLR 5. ACD 6. ATA

    7. MSE 8.Age 9. Pupil size 10. WTW 11. CCT 12. ACW

    y = dataset['Vault_1M'] X = dataset.drop(['Vault_1M'], axis = 1)

    Split the dataset to train and test data

    For a simple validation test, we split data to 8:2

    train_X, test_X, train_y, test_y = train_test_split(X, y, test_size=0.2, random_state=0)

    Optimal parameter search could be performed in this section

    parameters = {'bootstrap': True, 'min_samples_leaf': 3, 'n_estimators': 500, 'criterion': 'mae' 'min_samples_split': 10, 'max_features': 'sqrt', 'max_depth': 6, 'max_leaf_nodes': None}

    RF_model = RandomForestRegressor(**parameters) RF_model.fit(train_X, train_y) RF_predictions = RF_model.predict(test_X) importance = RF_model.feature_importances_

  11. Brain Tumor Classification

    • kaggle.com
    zip
    Updated Nov 26, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Taneem UR Rehman (2022). Brain Tumor Classification [Dataset]. https://www.kaggle.com/datasets/taneemurrehman/brain-tumor-classification
    Explore at:
    zip(91002358 bytes)Available download formats
    Dataset updated
    Nov 26, 2022
    Authors
    Taneem UR Rehman
    Description

    Please follow the steps below to download and use Kaggle data within Google Colab:

    1) from google.colab import files files.upload()

    Choose the kaggle.json file that you downloaded 2) ! mkdir ~/.kaggle

    ! cp kaggle.json ~/.kaggle/

    Make directory named kaggle and copy kaggle.json file there. 4) ! chmod 600 ~/.kaggle/kaggle.json

    Change the permissions of the file. 5) ! kaggle datasets list - That's all ! You can check if everything's okay by running this command.

    Use unzip command to unzip the data:

    unzip train data there,

    ! unzip train.zip -d train

  12. Recognition Without Implementation: Institutional Gaps and Forestry...

    • zenodo.org
    bin
    Updated Oct 2, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Anon Anon; Anon Anon (2025). Recognition Without Implementation: Institutional Gaps and Forestry Expansion in Post-Girjas Swedish Sápmi - Dataset and Analysis [Dataset]. http://doi.org/10.5281/zenodo.17249309
    Explore at:
    binAvailable download formats
    Dataset updated
    Oct 2, 2025
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Anon Anon; Anon Anon
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Sápmi
    Description

    Recognition Without Implementation: Institutional Gaps and Forestry Expansion in Post-Girjas Swedish Sápmi - Dataset and Analysis

    Description

    This deposit contains the dataset and analysis code supporting the research paper "Recognition Without Implementation: Institutional Gaps and Forestry Expansion in Post-Girjas Swedish Sápmi" by Stefan Holgersson and Scott Brown.

    Research Overview: This study examines forestry permit trends in Swedish Sámi territories following the landmark 2020 Girjas Supreme Court ruling, which recognized exclusive Sámi rights over hunting and fishing in traditional lands. Using 432 region-year observations (1998-2024) from the Swedish Forest Agency, we document a 242% increase in clearcutting approvals during 2020-2024 compared to pre-2020 averages, with state/corporate actors showing 313% increases and private landowners 197%.

    Key Findings:

    • Clearcutting intensified most in regions with strongest Sámi territorial claims (Västerbotten +369%, Norra Norrland +275%)
    • State actors exhibited greater intensification than private landowners despite public accountability mandates
    • Three institutional mechanisms correlate with continued extraction: legal non-integration of customary tenure, implementation deficits between judicial recognition and administrative enforcement, and ESG disclosure opacity

    Important Limitation: We cannot isolate causal effects of the Girjas ruling from concurrent shocks including COVID-19 economic disruption, EU Taxonomy implementation, and commodity price volatility. The analysis documents institutional conditions and correlational patterns rather than establishing causation.

    Dataset Contents:

    • Clearcut.xlsx: Swedish Forest Agency clearcutting permit data (1998-2024) disaggregated by region, ownership type, and year
    • SAMI.ipynb: Jupyter notebook containing Python code for descriptive statistics, time series analysis, and figure generation

    How to Use These Files in Google Colab:

    1. Download the files from this Zenodo deposit to your computer
    2. Open Google Colab at https://colab.research.google.com
    3. Upload the notebook:
      • Click "File" → "Upload notebook"
      • Select SAMI.ipynb from your downloads
    4. Upload the data file:
      • In the Colab notebook, click the folder icon in the left sidebar
      • Click the upload button (page with up arrow)
      • Select Clearcut.xlsx from your downloads
      • The file will appear in the /content/ directory
    5. Run the analysis:
      • Execute cells sequentially by pressing Shift+Enter
      • The notebook will automatically load Clearcut.xlsx from the current directory
      • All figures and statistics will generate inline

    Alternative method (direct from Zenodo):

    python
    # Add this cell at the top of the notebook to download files directly
    !wget https://zenodo.org/record/[RECORD_ID]/files/Clearcut.xlsx

    Replace [RECORD_ID] with the actual Zenodo record number after publication.

    Requirements: The notebook uses standard Python libraries: pandas, numpy, matplotlib, seaborn. These are pre-installed in Google Colab. No additional setup required.

    Methodology: Descriptive statistical analysis combined with institutional document review. Data covers eight administrative regions in northern Sweden with mountain-adjacent forests relevant to Sámi reindeer herding territories.

    Policy Relevance: Findings inform debates on Indigenous land rights implementation, forestry governance reform, ESG disclosure requirements, and the gap between legal recognition and operational constraints in resource extraction contexts.

    Keywords: Indigenous rights, Sámi, forestry governance, legal pluralism, Sweden, Girjas ruling, land tenure, corporate accountability, ESG disclosure

    License: Creative Commons Attribution 4.0 International (CC BY 4.0)

  13. h

    lsun_church_train

    • huggingface.co
    Updated Oct 2, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Generative Landscape (2025). lsun_church_train [Dataset]. https://huggingface.co/datasets/tglcourse/lsun_church_train
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Oct 2, 2025
    Dataset authored and provided by
    The Generative Landscape
    Description

    Dataset Card for "lsun_church_train"

    Uploading lsun church train dataset for convenience I've split this into 119915 train and 6312 test but if you want the original test set see https://github.com/fyu/lsun Notebook that I used to download then upload this dataset: https://colab.research.google.com/drive/1_f-D2ENgmELNSB51L1igcnLx63PkveY2?usp=sharing More Information needed

  14. z

    The Cultural Resource Curse: How Trade Dependence Undermines Creative...

    • zenodo.org
    bin, csv
    Updated Aug 9, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Anon Anon; Anon Anon (2025). The Cultural Resource Curse: How Trade Dependence Undermines Creative Industries [Dataset]. http://doi.org/10.5281/zenodo.16784974
    Explore at:
    csv, binAvailable download formats
    Dataset updated
    Aug 9, 2025
    Dataset provided by
    Zenodo
    Authors
    Anon Anon; Anon Anon
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset accompanies the study The Cultural Resource Curse: How Trade Dependence Undermines Creative Industries. It contains country-year panel data for 2000–2023 covering both OECD economies and the ten largest Latin American countries by land area. Variables include GDP per capita (constant PPP, USD), trade openness, internet penetration, education indicators, cultural exports per capita, and executive constraints from the Polity V dataset.

    The dataset supports a comparative analysis of how economic structure, institutional quality, and infrastructure shape cultural export performance across development contexts. Within-country fixed effects models show that trade openness constrains cultural exports in OECD economies but has no measurable effect in resource-dependent Latin America. In contrast, strong executive constraints benefit cultural industries in advanced economies while constraining them in extraction-oriented systems. The results provide empirical evidence for a two-stage development framework in which colonial extraction legacies create distinct constraints on creative industry growth.

    All variables are harmonized to ISO3 country codes and aligned on a common panel structure. The dataset is fully reproducible using the included Jupyter notebooks (OECD.ipynb, LATAM+OECD.ipynb, cervantes.ipynb).

    Contents:

    • GDPPC.csv — GDP per capita series from the World Bank.

    • explanatory.csv — Trade openness, internet penetration, and education indicators.

    • culture_exports.csv — UNESCO cultural export data.

    • p5v2018.csv — Polity V institutional indicators.

    • Jupyter notebooks for data processing and replication.

    Potential uses: Comparative political economy, cultural economics, institutional development, and resource curse research.

    How to Run This Dataset and Code in Google Colab

    These steps reproduce the OECD vs. Latin America analyses from the paper using the provided CSVs and notebooks.

    1) Open Colab and set up

    1. Go to https://colab.research.google.com

    2. Click File → New notebook.

    3. (Optional) If your files are in Google Drive, mount it:

    python
    CopiarEditar
    from google.colab import drive drive.mount('/content/drive')

    2) Get the data files into Colab

    You have two easy options:

    A. Upload the 4 CSVs + notebooks directly

    • In the left sidebar, click the folder icon → Upload.

    • Upload: GDPPC.csv, explanatory.csv, culture_exports.csv, p5v2018.csv, and any .ipynb you want to run.

    B. Use Google Drive

    • Put those files in a Drive folder.

    • After mounting Drive, refer to them with paths like /content/drive/MyDrive/your_folder/GDPPC.csv.

  15. Legality Without Justice: Symbolic Governance, Institutional Denial, and the...

    • zenodo.org
    bin, csv
    Updated Nov 6, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Scott Brown; Scott Brown (2025). Legality Without Justice: Symbolic Governance, Institutional Denial, and the Ethical Foundations of Law [Dataset]. http://doi.org/10.5281/zenodo.16361108
    Explore at:
    csv, binAvailable download formats
    Dataset updated
    Nov 6, 2025
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Scott Brown; Scott Brown
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Description:
    This dataset accompanies the empirical analysis in Legality Without Justice, a study examining the relationship between public trust in institutions and perceived governance legitimacy using data from the World Values Survey Wave 7 (2017–2022). It includes:

    • WVS_Cross-National_Wave_7_csv_v6_0.csv — World Values Survey Wave 7 core data.

    • GDP.csv — World Bank GDP per capita (current US$) for 2022 by country.

    • denial.ipynb — Fully documented Jupyter notebook with code for data merging, exploratory statistics, and ordinal logistic regression using OrderedModel. Includes GDP as a control for institutional trust and perceived governance.

    All data processing and analysis were conducted in Python using FAIR reproducibility principles and can be replicated or extended on Google Colab.

    DOI: 10.5281/zenodo.16361108
    License: Creative Commons Attribution 4.0 International (CC BY 4.0)
    Authors: Anon Annotator
    Publication date: 2025-07-23
    Language: English
    Version: 1.0.0
    Publisher: Zenodo
    Programming language: Python

    🔽 How to Download and Run on Google Colab

    Step 1: Open Google Colab

    Go to https://colab.research.google.com

    Step 2: Upload Files

    Click File > Upload notebook, and upload the denial.ipynb file.
    Also upload the CSVs (WVS_Cross-National_Wave_7_csv_v6_0.csv and GDP.csv) using the file browser on the left sidebar.

    Step 3: Adjust File Paths (if needed)

    In denial.ipynb, ensure file paths match:

    python
    CopiarEditar
    wvs = pd.read_csv('/content/WVS_Cross-National_Wave_7_csv_v6_0.csv') gdp = pd.read_csv('/content/GDP.csv')

    Step 4: Run the Code

    Execute the notebook cells from top to bottom. You may need to install required libraries:

    python
    CopiarEditar
    !pip install statsmodels pandas numpy

    The notebook performs:

    • Data cleaning

    • Merging WVS and GDP datasets

    • Summary statistics

    • Ordered logistic regression to test if confidence in courts/police (Q57, Q58) predicts belief that the country is governed in the interest of the people (Q183), controlling for GDP.

  16. Social Media Customer Analysis

    • kaggle.com
    zip
    Updated Apr 16, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nafe Muhtasim (2021). Social Media Customer Analysis [Dataset]. https://www.kaggle.com/nafemuhtasim/social-media-customer-analysis
    Explore at:
    zip(108529 bytes)Available download formats
    Dataset updated
    Apr 16, 2021
    Authors
    Nafe Muhtasim
    Description

    This is the data of a social media platform of an organization. You have been hired by the organization & given their social media data to analyze, visualize and prepare a report on it.

    You are required to prepare a neat notebook on it using Jupyter Notebook/Jupyter Lab or Google Colab. Then, zip everything including the notebook file (.ipynb file) and the dataset. Finally, upload through the google forms link stated below. The notebook should be neat, containing codes with details regarding your code, visualizations, and description of your purpose of doing each task.

    You are suggested but not limited to go through the general steps like -> Data Cleaning, Data preparation, Exploratory Data Analysis(EDA), Correlations finding, Feature extraction, and more. (There is no limit to your skills and ideas)

    After doing what needs to be done, you are to give your organization insights and facts. For example, are they reaching more audiences on weekends? Is posting content on the weekdays turn out to be more effective? Is posting many contents on the same day make more sense? Or, should they post content regularly and keep day-to-day consistency? Did you find any trend patterns in the data? What are your advice after completing the analysis? Mention them clearly at the end of the Notebook. (These are just a few examples, your findings may be entirely different and that is totally acceptable. )

    Note that, we will value clear documentation which states clear insights from analysis of data & visualizations, more than anything else. It will not matter how complex methods are you applying if it eventually does not find anything useful.

  17. Replication Data/Code for Route-based Geocoding of Traffic...

    • figshare.com
    csv
    Updated Jan 15, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Saif Ali (2025). Replication Data/Code for Route-based Geocoding of Traffic congestion-Related Social Media Texts on a Complex Network (Manuscript ID IJGIS-2024-1073) [Dataset]. http://doi.org/10.6084/m9.figshare.28210757.v1
    Explore at:
    csvAvailable download formats
    Dataset updated
    Jan 15, 2025
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Saif Ali
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Instructions (with screenshots) to replicate results from Section 3 of the manuscript are available in "Step-by-step Instructions to Replicate Results.pdf".-------------------------------------------------------------------------------------------------------------------Step 1: Download the replication materialsDownload the whole replication folder on figshare containing the code, data and replication files.Step 2: Replicate Tables in Section 3All of the data is available inside the sub-folder replication/Data. To replicate Tables 1 and 2 from section 3 of the manuscript run the Python file replicate_section3_tables.py locally on your computer. This will produce two .csv files containing Tables 1 and 2 (already provided). Note that it is not necessary to run the code in order to replicate the tables. The output data needed for replication is provided.Step 3: Replicate Figures in QGISThe Figures must be replicated using QGIS, freely available at https://www.qgis.org/. Open the QGIS project replicate_figures.qgz inside the replication/Replicate Figures sub-folder. It should auto-find the layer data. The Figures are replicated as layers in the project. Step 4: Running the code from scratchThe accompanying code for the manuscript IJGIS-2024-1305, entitled "Route-based Geocoding of Traffic Congestion-Related Social Media Texts on a Complex Network" runs on Google Colab as Python notebooks. Please follow the instructions below to run the entire geocoder and network mapper from scratch. The expected running time is of the order of 10 hours on free tier Google Colab. 4a) Upload to Google DriveUpload the entire replication folder to your Google Drive. Note the path (location) to which you have uploaded it. There are two Google Colab notebooks that need to be executed in their entirety. These are Code/Geocoder/The_Geocoder.ipynb and Code/Complex_Network/Complex_network_code.ipynb. They need to be run in order (Geocoder first and Complex Network second). 4b) Set the path In each Google Colab notebook, you have to set the variable called “REPL_PATH” to the location on your Google Drive where you uploaded the replication folder. Include the replication folder in the path. For example "/content/drive/MyDrive/replication"4c) Run the codeThe code is available in two sub-folders, replication/Code/Geocoder and replication/Code/Complex_Network. You may simply open the Google Colab notebooks inside each folder, mount your Google Drive, set the path and run all cells.

  18. US Consumer Complaints Against Businesses

    • kaggle.com
    zip
    Updated Oct 9, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jeffery Mandrake (2022). US Consumer Complaints Against Businesses [Dataset]. https://www.kaggle.com/jefferymandrake/us-consumer-complaints-dataset-through-2019
    Explore at:
    zip(343188956 bytes)Available download formats
    Dataset updated
    Oct 9, 2022
    Authors
    Jeffery Mandrake
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    2,121,458 records

    I used Google Colab to check out this dataset and pull the column names using Pandas.

    Sample code example: Python Pandas read csv file compressed with gzip and load into Pandas dataframe https://pastexy.com/106/python-pandas-read-csv-file-compressed-with-gzip-and-load-into-pandas-dataframe

    Columns: ['Date received', 'Product', 'Sub-product', 'Issue', 'Sub-issue', 'Consumer complaint narrative', 'Company public response', 'Company', 'State', 'ZIP code', 'Tags', 'Consumer consent provided?', 'Submitted via', 'Date sent to company', 'Company response to consumer', 'Timely response?', 'Consumer disputed?', 'Complaint ID']

    I did not modify the dataset.

    Use it to practice with dataframes - Pandas or PySpark on Google Colab:

    !unzip complaints.csv.zip

    import pandas as pd df = pd.read_csv('complaints.csv') df.columns

    df.head() etc.

  19. Futures Market Datasets

    • kaggle.com
    zip
    Updated Jul 13, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    mlippo (2025). Futures Market Datasets [Dataset]. https://www.kaggle.com/datasets/mlippo/futures-market-dataset/versions/2
    Explore at:
    zip(265495 bytes)Available download formats
    Dataset updated
    Jul 13, 2025
    Authors
    mlippo
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    This dataset contains the futures market from the following countries:

    Australia -> Aus200

    Brazil -> Bra50 and MinDol

    Spain -> Esp35

    France -> Fra40

    Germany -> Ger40

    Hong Kong -> HkInd

    Italy-> Ita40

    Netherlands -> Neth25

    Switzerland -> Swi20

    United Kingdom -> UK100

    United States -> Usa500, UsaTec and UsaRus

    There's a csv file for all those markets and one with all in one.

    Note: the MinDol, Swi20 and Neth25 data were taken by it's monthly contract, because MetaTrader5 don't have their historical series (like S&P 500, that has the 'Usa500' and 'Usa500Mar24'):

    https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F17272056%2Fefa5c9f6d7841c496d20d467d4a1c874%2Ffutures_dailycontract.png?generation=1704756245532483&alt=media" alt="">

    MT5 Library: (I used PyCharm because the i couldn't be able to use the mt5 library on GoogleColab)

    import MetaTrader5 as mt5
    import pandas as pd
    import numpy as np
    import pytz
    from datetime import datetime
    
    if not mt5.initialize(login= , server= "server", password=""):
    # you can use your login and password if you have an account on a broker to use mt5
      print("initialize() failed, error code =", mt5.last_error())
      quit()
    
    
    symbols = mt5.symbols_get()
    
    list_symbols = []
    for num in range(0, len(symbols)):
      list_symbols.append(symbols[num].name)
    
    print(list_symbols)
    
    list_futures = ['Aus200', 'Bra50', 'Esp35', 'Fra40', 'Ger40', 'HKInd', 'Ita40Mar24', 'Jp225', 'MinDolFeb24', 'Neth25Jan24', 'UK100', 'Usa500', 'UsaRus', 'UsaTec', 'Swi20Mar24']
    time_frame = mt5.TIMEFRAME_D1
    dynamic_vars = {}
    
    time_zone = pytz.timezone('Etc/UTC')
    
    time_start = datetime(2017, 1, 1, tzinfo= time_zone)
    time_end = datetime(2023, 12, 31, tzinfo= time_zone)
    
    for sym in list_futures:
      var = f'{sym}'
      rates = mt5.copy_rates_range(sym, time_frame, time_start, time_end)
    
      rates_frame = pd.DataFrame(rates)
      rates_frame['time'] = pd.to_datetime(rates_frame['time'], unit='s')
      rates_frame = rates_frame[['time', 'close']]
      rates_frame.rename(columns = {'close': var}, inplace = True)
      dynamic_vars[var] = rates_frame
    
      dynamic_vars[sym].to_csv(f'{sym}.csv', index = False)
    
  20. TMF Business Process Framework Dataset for Neo4j

    • kaggle.com
    zip
    Updated Dec 4, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Aleksei Golovin (2023). TMF Business Process Framework Dataset for Neo4j [Dataset]. https://www.kaggle.com/datasets/algord/tmf-business-process-framework-dataset-for-neo4j
    Explore at:
    zip(13261206 bytes)Available download formats
    Dataset updated
    Dec 4, 2023
    Authors
    Aleksei Golovin
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    TMF Business Process Framework Dataset for Neo4j

    The dataset is a Neo4j knowledge graph based on TMF Business Process Framework v22.0 data.
    CSV files contain data about the model entities, and the JSON file contains knowledge graph mapping.
    The script used to generate CSV files based on the XML model can be found here.

    To import the dataset, download the zip archive and upload it to Neo4j.

    You also can check this dataset here.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Piyush Kumar; Piyush Kumar (2024). Toronto Emergency Response - Open Data [Dataset]. http://doi.org/10.5281/zenodo.13578078
Organization logo

Toronto Emergency Response - Open Data

Explore at:
zipAvailable download formats
Dataset updated
Aug 30, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Piyush Kumar; Piyush Kumar
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Area covered
Toronto
Description

This upload contains data and documentation for the Python analysis undertaken in Google Colab as part of Episode 1 of the webinar series, conducted by Sambodhi's Center for Health Systems Research and Implementation (CHSRI). You can find the link to the Google Colab notebook here.

All the data uploaded here is open data published by the Toronto Police Public Safety Data Portal and the Ontario Ministry of Health.

Search
Clear search
Close search
Google apps
Main menu