51 datasets found

h
Turkish_Basketball_Super_League_Dataset
huggingface.co
Updated Aug 10, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Muhammed Onur ulu (2025). Turkish_Basketball_Super_League_Dataset [Dataset]. https://huggingface.co/datasets/onurulu17/Turkish_Basketball_Super_League_Dataset
Explore at:
Dataset updated
Aug 10, 2025
Authors
Muhammed Onur ulu
License
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Description
📥 Load Dataset in Python

To load this dataset in Google Colab or any Python environment:
!pip install huggingface_hub pandas openpyxl

from huggingface_hub import hf_hub_download import pandas as pd

repo_id = "onurulu17/Turkish_Basketball_Super_League_Dataset"

files = [ "leaderboard.xlsx", "player_data.xlsx", "team_data.xlsx", "team_matches.xlsx", "player_statistics.xlsx", "technic_roster.xlsx" ]

datasets = {}

for f in files: path =… See the full description on the dataset page: https://huggingface.co/datasets/onurulu17/Turkish_Basketball_Super_League_Dataset.
COCO2017 Image Caption Train
kaggle.com
zip
Updated May 30, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Seungjun Lee (2024). COCO2017 Image Caption Train [Dataset]. https://www.kaggle.com/datasets/seungjunleeofficial/coco2017-image-caption-train
Explore at:
zip(19236355851 bytes)Available download formats
Dataset updated
May 30, 2024
Authors
Seungjun Lee
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset contains only the COCO 2017 train images (118K images) and a caption annotation JSON file, designed to fit within Google Colab's available disk space of approximately 50GB when connected to a GPU runtime.

If you're using PyTorch on Google Colab, you can easily utilize this dataset as follows:

Manually downloading and uploading the file to Colab can be time-consuming. Therefore, it's more efficient to download this data directly into Google Colab. Please ensure you have first added your Kaggle key to Google Colab. You can find more details on this process here

from google.colab import drive import os import torch import torchvision.datasets as dset import torchvision.transforms as transforms os.environ["KAGGLE_KEY"] = userdata.get('KAGGLE_KEY') os.environ["KAGGLE_USERNAME"] = userdata.get('KAGGLE_USERNAME') # Download the Dataset and unzip it !kaggle datasets download -d seungjunleeofficial/coco2017-image-caption-train !mkdir "/content/Dataset" !unzip "coco2017-image-caption-train" -d "/content/Dataset" # load the dataset cap = dset.CocoCaptions(root = '/content/Dataset/COCO2017 Image Captioning Train/train2017', annFile = '/content/Dataset/COCO2017 Image Captioning Train/captions_train2017.json', transform=transforms.PILToTensor())

You can then use the dataset in the following way:

print(f"Number of samples: {len(cap)}") img, target = cap[3] print(img.shape) print(target) # Output example: torch.Size([3, 425, 640]) # ['A zebra grazing on lush green grass in a field.', 'Zebra reaching its head down to ground where grass is.', # 'The zebra is eating grass in the sun.', 'A lone zebra grazing in some green grass.', # 'A Zebra grazing on grass in a green open field.']
Sample Park Analysis
figshare.com
zip
Updated Nov 2, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Eric Delmelle (2025). Sample Park Analysis [Dataset]. http://doi.org/10.6084/m9.figshare.30509021.v1
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.30509021.v1
Dataset updated
Nov 2, 2025
Dataset provided by
Figsharehttp://figshare.com/
figshare
Authors
Eric Delmelle
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
README – Sample Park Analysis## OverviewThis repository contains a Google Colab / Jupyter notebook and accompanying dataset used for analyzing park features and associated metrics. The notebook demonstrates data loading, cleaning, and exploratory analysis of the Hope_Park_original.csv file.## Contents- sample park analysis.ipynb — The main analysis notebook (Colab/Jupyter format)- Hope_Park_original.csv — Source dataset containing park information- README.md — Documentation for the contents and usage## Usage1. Open the notebook in Google Colab or Jupyter.2. Upload the Hope_Park_original.csv file to the working directory (or adjust the file path in the notebook).3. Run each cell sequentially to reproduce the analysis.## RequirementsThe notebook uses standard Python data science libraries:```pythonpandasnumpymatplotlibseaborn
B
Python Code for Visualizing COVID-19 data
borealisdata.ca
datasetcatalog.nlm.nih.gov
+1more
Updated Dec 16, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ryan Chartier; Geoffrey Rockwell (2023). Python Code for Visualizing COVID-19 data [Dataset]. http://doi.org/10.5683/SP3/PYEQL0
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.5683/SP3/PYEQL0
Dataset updated
Dec 16, 2023
Dataset provided by
Borealis
Authors
Ryan Chartier; Geoffrey Rockwell
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
The purpose of this code is to produce a line graph visualization of COVID-19 data. This Jupyter notebook was built and run on Google Colab. This code will serve mostly as a guide and will need to be adapted where necessary to be run locally. The separate COVID-19 datasets uploaded to this Dataverse can be used with this code. This upload is made up of the IPYNB and PDF files of the code.
NYC Jobs Dataset (Filtered Columns)
kaggle.com
zip
Updated Oct 5, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jeffery Mandrake (2022). NYC Jobs Dataset (Filtered Columns) [Dataset]. https://www.kaggle.com/datasets/jefferymandrake/nyc-jobs-filtered-cols
Explore at:
zip(93408 bytes)Available download formats
Dataset updated
Oct 5, 2022
Authors
Jeffery Mandrake
License
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Area covered
New York
Description
Use this dataset with Misra's Pandas tutorial: How to use the Pandas GroupBy function | Pandas tutorial

The original dataset came from this site: https://data.cityofnewyork.us/City-Government/NYC-Jobs/kpav-sd4t/data

I used Google Colab to filter the columns with the following Pandas commands. Here's a Colab Notebook you can use with the commands listed below: https://colab.research.google.com/drive/17Jpgeytc075CpqDnbQvVMfh9j-f4jM5l?usp=sharing

Once the csv file is uploaded to Google Colab, use these commands to process the file.

import pandas as pd # load the file and create a pandas dataframe df = pd.read_csv('/content/NYC_Jobs.csv') # keep only these columns df = df[['Job ID', 'Civil Service Title', 'Agency', 'Posting Type', 'Job Category', 'Salary Range From', 'Salary Range To' ]] # save the csv file without the index column df.to_csv('/content/NYC_Jobs_filtered_cols.csv', index=False)
US Consumer Complaints Against Businesses
kaggle.com
zip
Updated Oct 9, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jeffery Mandrake (2022). US Consumer Complaints Against Businesses [Dataset]. https://www.kaggle.com/jefferymandrake/us-consumer-complaints-dataset-through-2019
Explore at:
zip(343188956 bytes)Available download formats
Dataset updated
Oct 9, 2022
Authors
Jeffery Mandrake
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
2,121,458 records

I used Google Colab to check out this dataset and pull the column names using Pandas.

Sample code example: Python Pandas read csv file compressed with gzip and load into Pandas dataframe https://pastexy.com/106/python-pandas-read-csv-file-compressed-with-gzip-and-load-into-pandas-dataframe

Columns: ['Date received', 'Product', 'Sub-product', 'Issue', 'Sub-issue', 'Consumer complaint narrative', 'Company public response', 'Company', 'State', 'ZIP code', 'Tags', 'Consumer consent provided?', 'Submitted via', 'Date sent to company', 'Company response to consumer', 'Timely response?', 'Consumer disputed?', 'Complaint ID']

I did not modify the dataset.

Use it to practice with dataframes - Pandas or PySpark on Google Colab:

!unzip complaints.csv.zip

import pandas as pd df = pd.read_csv('complaints.csv') df.columns

df.head() etc.
Nou Pa Bèt: Civic Substitution and Expressive Freedoms in Post-State...
zenodo.org
bin
Updated Aug 13, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Brown Scott; Brown Scott (2025). Nou Pa Bèt: Civic Substitution and Expressive Freedoms in Post-State Governance [Dataset]. http://doi.org/10.5281/zenodo.16858858
Explore at:
binAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.16858858
Dataset updated
Aug 13, 2025
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Brown Scott; Brown Scott
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Here's a clear Zenodo description for your dataset:

Dataset Description

This dataset supports the research paper "Nou Pa Bèt: Civic Substitution and Expressive Freedoms in Post-State Governance" which examines how civic participation functions as institutional substitution in fragile states, with Haiti as the primary case study. The dataset combines governance indicators from the World Bank's Worldwide Governance Indicators (WGI) with civic engagement measures from the Varieties of Democracy (V-Dem) project.

Files Included:

wgidataset.xlsx (2.57 MB) - Complete World Bank Worldwide Governance Indicators dataset covering multiple governance dimensions across countries and years

CivicEngagement_SelectedCountries_Last10Years.xlsx (25.03 KB) - Processed V-Dem civic engagement indicators for fragile states sample (2015-2024) including variables for participatory governance, civil society participation, freedom of expression, freedom of assembly, anti-system movements, and direct democracy

civic.ipynb (10.35 KB) - Complete Python analysis notebook containing all data processing, regression analysis, and visualization code used in the study

How to Use in Google Colab:

Step 1: Upload Files

python

from google.colab import files import pandas as pd import numpy as np # Upload the files to your Colab environment uploaded = files.upload() # Select and upload: CivicEngagement_SelectedCountries_Last10Years.xlsx and wgidataset.xlsx

Step 2: Load the Datasets

python

# Load the civic engagement data (main analysis dataset) civic_data = pd.read_excel('CivicEngagement_SelectedCountries_Last10Years.xlsx') # Load the WGI data (if needed for extended analysis) wgi_data = pd.read_excel('wgidataset.xlsx') # Display basic information print("Civic Engagement Dataset Shape:", civic_data.shape) print(" Columns:", civic_data.columns.tolist()) print(" First few rows:") civic_data.head()

Step 3: Run the Analysis Notebook

python

# Download and run the complete analysis notebook !wget https://zenodo.org/record/[RECORD_ID]/files/civic.ipynb # Then open civic.ipynb in Colab or copy/paste the code cells

Key Variables:

Dependent Variables (WGI):

Control_of_Corruption - Extent to which public power is exercised for private gain

Government_Effectiveness - Quality of public services and policy implementation

Independent Variables (V-Dem):

v2x_partip - Participatory Component Index

v2x_cspart - Civil Society Participation Index

v2cademmob - Freedom of Peaceful Assembly

v2cafres - Freedom of Expression

v2csantimv - Anti-System Movements

v2xdd_dd - Direct Popular Vote Index

Sample Countries: 21 fragile states including Haiti, Sierra Leone, Liberia, DRC, CAR, Guinea-Bissau, Chad, Niger, Burundi, Yemen, South Sudan, Mozambique, Sudan, Eritrea, Somalia, Mali, Afghanistan, Papua New Guinea, Togo, Cambodia, and Timor-Leste.

Quick Start Analysis:

python

# Install required packages !pip install statsmodels scipy # Basic regression replication import statsmodels.api as sm from statsmodels.stats.outliers_influence import variance_inflation_factor # Prepare variables for regression X = civic_data[['v2x_partip', 'v2x_cspart', 'v2cademmob', 'v2cafres', 'v2csantimv', 'v2xdd_dd']].dropna() y_corruption = civic_data['Control_of_Corruption'].dropna() y_effectiveness = civic_data['Government_Effectiveness'].dropna() # Run regression (example for Control of Corruption) X_const = sm.add_constant(X) model = sm.OLS(y_corruption, X_const).fit(cov_type='HC3') print(model.summary())

Citation: Brown, Scott M., Fils-Aime, Jempsy, & LaTortue, Paul. (2025). Nou Pa Bèt: Civic Substitution and Expressive Freedoms in Post-State Governance [Dataset]. Zenodo. https://doi.org/10.5281/zenodo.15058161

License: Creative Commons Attribution 4.0 International (CC BY 4.0)

Contact: For questions about data usage or methodology, please contact the corresponding author through the institutional affiliations provided in the paper.

This description provides clear, step-by-step instructions for researchers to immediately begin working with your data in Google Colab while explaining the theoretical and methodological context.
m
Data for "Prediction of Phakic Intraocular Lens Vault Using Machine Learning...
data.mendeley.com
Updated Nov 18, 2020
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
TaeKeun Yoo (2020). Data for "Prediction of Phakic Intraocular Lens Vault Using Machine Learning of Anterior Segment Optical Coherence Tomography Metrics" [Dataset]. http://doi.org/10.17632/ffn745r57z.1
Explore at:
Unique identifier
https://doi.org/10.17632/ffn745r57z.1
Dataset updated
Nov 18, 2020
Authors
TaeKeun Yoo
License
Attribution-NonCommercial 3.0 (CC BY-NC 3.0)https://creativecommons.org/licenses/by-nc/3.0/
License information was derived automatically
Description
Prediction of Phakic Intraocular Lens Vault Using Machine Learning of Anterior Segment Optical Coherence Tomography Metrics. Authors: Kazutaka Kamiya, MD, PhD1, Ik Hee Ryu, MD, MS2, Tae Keun Yoo, MD2, Jung Sub Kim MD2, In Sik Lee, MD, PhD2, Jin Kook Kim MD2, Wakako Ando CO3, Nobuyuki Shoji, MD, PhD3, Tomofusa, Yamauchi, MD, PhD4, Hitoshi Tabuchi, MD, PhD4. Author Affiliation: 1Visual Physiology, School of Allied Health Sciences, Kitasato University, Kanagawa, Japan, 2B&VIIT Eye Center, Seoul, Korea, 3Department of Ophthalmology, School of Medicine, Kitasato University, Kanagawa, Japan, 4Department of Ophthalmology, Tsukazaki Hospital, Hyogo, Japan.

We hypothesize that machine learning of preoperative biometric data obtained by the As-OCT may be clinically beneficial for predicting the actual ICL vault. Therefore, we built the machine learning model using Random Forest to predict ICL vault after surgery.

This multicenter study comprised one thousand seven hundred forty-five eyes of 1745 consecutive patients (656 men and 1089 women), who underwent EVO ICL implantation (V4c and V5 Visian ICL with KS-AquaPORT) for the correction of moderate to high myopia and myopic astigmatism, and who completed at least a 1-month follow-up, at Kitasato University Hospital (Kanagawa, Japan), or at B&VIIT Eye Center (Seoul, Korea).

This data file (RFR_model(feature=12).mat) is the final trained random forest model for MATLAB 2020a.

Python version:

from sklearn.model_selection import train_test_split import pandas as pd import numpy as np from sklearn.ensemble import RandomForestClassifier from sklearn.ensemble import RandomForestRegressor

connect data in your google drive

from google.colab import auth auth.authenticate_user() from google.colab import drive drive.mount('/content/gdrive')

Change the path for the custom data

In this case, we used ICL vault prediction using preop measurement

dataset = pd.read_csv('gdrive/My Drive/ICL/data_icl.csv') dataset.head()

optimal features (sorted by importance) :

1. ICL size 2. ICL power 3. LV 4. CLR 5. ACD 6. ATA

7. MSE 8.Age 9. Pupil size 10. WTW 11. CCT 12. ACW

y = dataset['Vault_1M'] X = dataset.drop(['Vault_1M'], axis = 1)

Split the dataset to train and test data

For a simple validation test, we split data to 8:2

train_X, test_X, train_y, test_y = train_test_split(X, y, test_size=0.2, random_state=0)

Optimal parameter search could be performed in this section

parameters = {'bootstrap': True, 'min_samples_leaf': 3, 'n_estimators': 500, 'criterion': 'mae' 'min_samples_split': 10, 'max_features': 'sqrt', 'max_depth': 6, 'max_leaf_nodes': None}

RF_model = RandomForestRegressor(**parameters) RF_model.fit(train_X, train_y) RF_predictions = RF_model.predict(test_X) importance = RF_model.feature_importances_
Toronto Emergency Response - Open Data
zenodo.org
zip
Updated Aug 30, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Piyush Kumar; Piyush Kumar (2024). Toronto Emergency Response - Open Data [Dataset]. http://doi.org/10.5281/zenodo.13578078
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.13578078
Dataset updated
Aug 30, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Piyush Kumar; Piyush Kumar
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Toronto
Description
This upload contains data and documentation for the Python analysis undertaken in Google Colab as part of Episode 1 of the webinar series, conducted by Sambodhi's Center for Health Systems Research and Implementation (CHSRI). You can find the link to the Google Colab notebook here.

All the data uploaded here is open data published by the Toronto Police Public Safety Data Portal and the Ontario Ministry of Health.
h
MNIST_small_colab-sample_data
huggingface.co
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Martial Terran, MNIST_small_colab-sample_data [Dataset]. https://huggingface.co/datasets/MartialTerran/MNIST_small_colab-sample_data
Explore at:
Authors
Martial Terran
Description
The purpose of this copy of the MNIST small dataset [mnist_test.csv (20,000 samples) and mnist_train_small.csv (10,000 samples)] copied from sample_data folder in Google Colab is simply to illustrate how WEIRD and totally deformed/unrecognizable are the 1% to 2% test samples that are difficult for a competent Vision model to correctly classify. See for yourself (up to 4 misclassified test samples shown per training epoch) Vision_model_V2.1.py --- Hyperparameters --- [INFO] Loading datasets...… See the full description on the dataset page: https://huggingface.co/datasets/MartialTerran/MNIST_small_colab-sample_data.
Top Rated TV Shows
kaggle.com
zip
Updated Jan 5, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Shreya Gupta (2025). Top Rated TV Shows [Dataset]. https://www.kaggle.com/datasets/shreyajii/top-rated-tv-shows
Explore at:
zip(314571 bytes)Available download formats
Dataset updated
Jan 5, 2025
Authors
Shreya Gupta
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
This dataset provides information about top-rated TV shows, collected from The Movie Database (TMDb) API. It can be used for data analysis, recommendation systems, and insights on popular television content.

Key Stats:

Total Pages: 109 Total Results: 2098 TV shows Data Source: TMDb API Sorting Criteria: Highest-rated by vote_average (average rating) with a minimum vote count of 200 Data Fields (Columns):

id: Unique identifier for the TV show name: Title of the TV show vote_average: Average rating given by users vote_count: Total number of votes received first_air_date: The date when the show was first aired original_language: Language in which the show was originally produced genre_ids: Genre IDs linked to the show's genres overview: A brief summary of the show popularity: Popularity score based on audience engagement poster_path: URL path for the show's poster image Accessing the Dataset via API (Python Example):

python Copy code import requests

api_key = 'YOUR_API_KEY_HERE' url = "https://api.themoviedb.org/3/discover/tv" params = { 'api_key': api_key, 'include_adult': 'false', 'language': 'en-US', 'page': 1, 'sort_by': 'vote_average.desc', 'vote_count.gte': 200 }

response = requests.get(url, params=params) data = response.json()

Display the first show

print(data['results'][0]) Dataset Use Cases:

Data Analysis: Explore trends in highly-rated TV shows. Recommendation Systems: Build personalized TV show suggestions. Visualization: Create charts to showcase ratings or genre distribution. Machine Learning: Predict show popularity using historical data. Exporting and Sharing the Dataset (Google Colab Example):

python Copy code import pandas as pd

Convert the API data to a DataFrame

df = pd.DataFrame(data['results'])

Save to CSV and upload to Google Drive

from google.colab import drive drive.mount('/content/drive') df.to_csv('/content/drive/MyDrive/top_rated_tv_shows.csv', index=False) Ways to Share the Dataset:

Google Drive: Upload and share a public link. Kaggle: Create a public dataset for collaboration. GitHub: Host the CSV file in a repository for easy sharing.
Recognition Without Implementation: Institutional Gaps and Forestry...
zenodo.org
bin
Updated Oct 2, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Anon Anon; Anon Anon (2025). Recognition Without Implementation: Institutional Gaps and Forestry Expansion in Post-Girjas Swedish Sápmi - Dataset and Analysis [Dataset]. http://doi.org/10.5281/zenodo.17249309
Explore at:
binAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.17249309
Dataset updated
Oct 2, 2025
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Anon Anon; Anon Anon
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Sápmi
Description
Recognition Without Implementation: Institutional Gaps and Forestry Expansion in Post-Girjas Swedish Sápmi - Dataset and Analysis

Description

This deposit contains the dataset and analysis code supporting the research paper "Recognition Without Implementation: Institutional Gaps and Forestry Expansion in Post-Girjas Swedish Sápmi" by Stefan Holgersson and Scott Brown.

Research Overview: This study examines forestry permit trends in Swedish Sámi territories following the landmark 2020 Girjas Supreme Court ruling, which recognized exclusive Sámi rights over hunting and fishing in traditional lands. Using 432 region-year observations (1998-2024) from the Swedish Forest Agency, we document a 242% increase in clearcutting approvals during 2020-2024 compared to pre-2020 averages, with state/corporate actors showing 313% increases and private landowners 197%.

Key Findings:

Clearcutting intensified most in regions with strongest Sámi territorial claims (Västerbotten +369%, Norra Norrland +275%)

State actors exhibited greater intensification than private landowners despite public accountability mandates

Three institutional mechanisms correlate with continued extraction: legal non-integration of customary tenure, implementation deficits between judicial recognition and administrative enforcement, and ESG disclosure opacity

Important Limitation: We cannot isolate causal effects of the Girjas ruling from concurrent shocks including COVID-19 economic disruption, EU Taxonomy implementation, and commodity price volatility. The analysis documents institutional conditions and correlational patterns rather than establishing causation.

Dataset Contents:

Clearcut.xlsx: Swedish Forest Agency clearcutting permit data (1998-2024) disaggregated by region, ownership type, and year

SAMI.ipynb: Jupyter notebook containing Python code for descriptive statistics, time series analysis, and figure generation

How to Use These Files in Google Colab:

Download the files from this Zenodo deposit to your computer

Open Google Colab at https://colab.research.google.com

Upload the notebook:

Click "File" → "Upload notebook"

Select SAMI.ipynb from your downloads

Upload the data file:

In the Colab notebook, click the folder icon in the left sidebar

Click the upload button (page with up arrow)

Select Clearcut.xlsx from your downloads

The file will appear in the /content/ directory

Run the analysis:

Execute cells sequentially by pressing Shift+Enter

The notebook will automatically load Clearcut.xlsx from the current directory

All figures and statistics will generate inline

Alternative method (direct from Zenodo):

python

# Add this cell at the top of the notebook to download files directly !wget https://zenodo.org/record/[RECORD_ID]/files/Clearcut.xlsx

Replace [RECORD_ID] with the actual Zenodo record number after publication.

Requirements: The notebook uses standard Python libraries: pandas, numpy, matplotlib, seaborn. These are pre-installed in Google Colab. No additional setup required.

Methodology: Descriptive statistical analysis combined with institutional document review. Data covers eight administrative regions in northern Sweden with mountain-adjacent forests relevant to Sámi reindeer herding territories.

Policy Relevance: Findings inform debates on Indigenous land rights implementation, forestry governance reform, ESG disclosure requirements, and the gap between legal recognition and operational constraints in resource extraction contexts.

Keywords: Indigenous rights, Sámi, forestry governance, legal pluralism, Sweden, Girjas ruling, land tenure, corporate accountability, ESG disclosure

License: Creative Commons Attribution 4.0 International (CC BY 4.0)
h
flan-t5-small-embed-refinedweb
huggingface.co
Updated Jun 5, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
lauren (2023). flan-t5-small-embed-refinedweb [Dataset]. https://huggingface.co/datasets/crumb/flan-t5-small-embed-refinedweb
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jun 5, 2023
Authors
lauren
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
All of the data together is around 41GB. It's the last hidden states of 131,072 samples from refinedweb padded/truncated to 512 tokens on the left, fed through google/flan-t5-small. Structure: { "encoding": List, shaped (512, 512) aka (tokens, d_model), "text": String, the original text that was encoded, "attention_mask": List, binary mask to pass to your model with encoding to not attend to pad tokens }

just a tip, you cannot load this with the RAM in the free ver of google colab, not… See the full description on the dataset page: https://huggingface.co/datasets/crumb/flan-t5-small-embed-refinedweb.
Python Libraries used in tutorials.
plos.figshare.com
datasetcatalog.nlm.nih.gov
xls
Updated Dec 30, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Faye Orcales; Lucy Moctezuma Tan; Meris Johnson-Hagler; John Matthew Suntay; Jameel Ali; Kristiene Recto; Phelan Glenn; Pleuni Pennings (2024). Python Libraries used in tutorials. [Dataset]. http://doi.org/10.1371/journal.pcbi.1012579.t003
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pcbi.1012579.t003
Dataset updated
Dec 30, 2024
Dataset provided by
PLOShttp://plos.org/
Authors
Faye Orcales; Lucy Moctezuma Tan; Meris Johnson-Hagler; John Matthew Suntay; Jameel Ali; Kristiene Recto; Phelan Glenn; Pleuni Pennings
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Antibiotic resistance is a global public health concern. Bacteria have evolved resistance to most antibiotics, which means that for any given bacterial infection, the bacteria may be resistant to one or several antibiotics. It has been suggested that genomic sequencing and machine learning (ML) could make resistance testing more accurate and cost-effective. Given that ML is likely to become an ever more important tool in medicine, we believe that it is important for pre-health students and others in the life sciences to learn to use ML tools. This paper provides a step-by-step tutorial to train 4 different ML models (logistic regression, random forests, extreme gradient-boosted trees, and neural networks) to predict drug resistance for Escherichia coli isolates and to evaluate their performance using different metrics and cross-validation techniques. We also guide the user in how to load and prepare the data used for the ML models. The tutorial is accessible to beginners and does not require any software to be installed as it is based on Google Colab notebooks and provides a basic understanding of the different ML models. The tutorial can be used in undergraduate and graduate classes for students in Biology, Public Health, Computer Science, or related fields.
T
duke_ultrasound
tensorflow.org
Updated Mar 14, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2025). duke_ultrasound [Dataset]. https://www.tensorflow.org/datasets/catalog/duke_ultrasound
Explore at:
Dataset updated
Mar 14, 2025
Description
DukeUltrasound is an ultrasound dataset collected at Duke University with a Verasonics c52v probe. It contains delay-and-sum (DAS) beamformed data as well as data post-processed with Siemens Dynamic TCE for speckle reduction, contrast enhancement and improvement in conspicuity of anatomical structures. These data were collected with support from the National Institute of Biomedical Imaging and Bioengineering under Grant R01-EB026574 and National Institutes of Health under Grant 5T32GM007171-44. A usage example is available here.

To use this dataset:

import tensorflow_datasets as tfds ds = tfds.load('duke_ultrasound', split='train') for ex in ds.take(4): print(ex)

See the guide for more informations on tensorflow_datasets.
h
lsun_church_train
huggingface.co
Updated Oct 2, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The Generative Landscape (2025). lsun_church_train [Dataset]. https://huggingface.co/datasets/tglcourse/lsun_church_train
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Oct 2, 2025
Dataset authored and provided by
The Generative Landscape
Description
Dataset Card for "lsun_church_train"

Uploading lsun church train dataset for convenience I've split this into 119915 train and 6312 test but if you want the original test set see https://github.com/fyu/lsun Notebook that I used to download then upload this dataset: https://colab.research.google.com/drive/1_f-D2ENgmELNSB51L1igcnLx63PkveY2?usp=sharing More Information needed
m
Can the use of Google Colab improve female and male mathematical analytical...
data.mendeley.com
Updated Nov 12, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Luz Maria Alonso-Valerdi (2024). Can the use of Google Colab improve female and male mathematical analytical thinking in biomedical engineering students? [Dataset]. http://doi.org/10.17632/ym64wgr3n6.1
Explore at:
Unique identifier
https://doi.org/10.17632/ym64wgr3n6.1
Dataset updated
Nov 12, 2024
Authors
Luz Maria Alonso-Valerdi
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Google Colab is a free product from Google Research, which allows programming in Python from a browser, and is primarily suitable for (1) machine learning, (2) data analysis, and (3) education. Google Colab is an online programming environment that requires no installation. It has basic math (Math), deep learning (TensorFlow), machine learning (Scikit-learn) and graphing (Matplotlib) tools. Like other Google online tools, it has collaborative properties that allow parallel programming by different users, whose work environments are stored in Google Drive. In addition, it is possible to document the codes with multimedia material, and to publish and import from GitHub.

Therefore, this project aims to use Google Colab as an assistance teaching tool that takes into account the interests and competencies of male and female biomedical engineering students to improve their experience and academic performance. The project proposes to use Google Colab in three ways: (1) implementing study cases in the health area with illustrative materials (e.g., images, sounds, web pages), (2) continuous monitoring by the teacher, and (3) asynchronous collaborative programming. For this purpose, a teacher's guide and a repository of example activities will be implemented in accordance with student feedback.

The project seeks to develop mathematical analytical thinking to quantitatively interpret measurements of biological systems. On the one hand, male students are expected to increase their interest in mathematical analysis through computational development (characteristic more preferred by men). On the other hand, female students are expected to have online mathematical counseling and study cases in the health area (characteristics more preferred by women).

The overall goal is to change the dynamics of teaching applied mathematics, which is an important factor of withdraw, mainly of women in engineering. Men and women have different interests and competencies in engineering, and the frequent ignoring of this fact in the teaching process could be a factor in the current gender gap in STEM (Science, Tech, Engineering and Math).

This proposal is scalable because Google Colab is a free, friendly and executable programming environment in the cloud. It does not involve economic, administrative or infrastructural costs. It is also transferable not only for other blocks and subjects of biomedical engineering, but also for any other engineering, where programming tools are indispensable. Google Colab is a simple and easy to learn environment.
z
The Cultural Resource Curse: How Trade Dependence Undermines Creative...
zenodo.org
bin, csv
Updated Aug 9, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Anon Anon; Anon Anon (2025). The Cultural Resource Curse: How Trade Dependence Undermines Creative Industries [Dataset]. http://doi.org/10.5281/zenodo.16784974
Explore at:
csv, binAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.16784974
Dataset updated
Aug 9, 2025
Dataset provided by
Zenodo
Authors
Anon Anon; Anon Anon
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset accompanies the study The Cultural Resource Curse: How Trade Dependence Undermines Creative Industries. It contains country-year panel data for 2000–2023 covering both OECD economies and the ten largest Latin American countries by land area. Variables include GDP per capita (constant PPP, USD), trade openness, internet penetration, education indicators, cultural exports per capita, and executive constraints from the Polity V dataset.

The dataset supports a comparative analysis of how economic structure, institutional quality, and infrastructure shape cultural export performance across development contexts. Within-country fixed effects models show that trade openness constrains cultural exports in OECD economies but has no measurable effect in resource-dependent Latin America. In contrast, strong executive constraints benefit cultural industries in advanced economies while constraining them in extraction-oriented systems. The results provide empirical evidence for a two-stage development framework in which colonial extraction legacies create distinct constraints on creative industry growth.

All variables are harmonized to ISO3 country codes and aligned on a common panel structure. The dataset is fully reproducible using the included Jupyter notebooks (OECD.ipynb, LATAM+OECD.ipynb, cervantes.ipynb).

Contents:

GDPPC.csv — GDP per capita series from the World Bank.

explanatory.csv — Trade openness, internet penetration, and education indicators.

culture_exports.csv — UNESCO cultural export data.

p5v2018.csv — Polity V institutional indicators.

Jupyter notebooks for data processing and replication.

Potential uses: Comparative political economy, cultural economics, institutional development, and resource curse research.

How to Run This Dataset and Code in Google Colab

These steps reproduce the OECD vs. Latin America analyses from the paper using the provided CSVs and notebooks.

1) Open Colab and set up

Go to https://colab.research.google.com

Click File → New notebook.

(Optional) If your files are in Google Drive, mount it:

python

CopiarEditar

from google.colab import drive drive.mount('/content/drive')

2) Get the data files into Colab

You have two easy options:

A. Upload the 4 CSVs + notebooks directly

In the left sidebar, click the folder icon → Upload.

Upload: GDPPC.csv, explanatory.csv, culture_exports.csv, p5v2018.csv, and any .ipynb you want to run.

B. Use Google Drive

Put those files in a Drive folder.

After mounting Drive, refer to them with paths like /content/drive/MyDrive/your_folder/GDPPC.csv.
H
Advancing Open and Reproducible Water Data Science by Integrating Data...
hydroshare.org
beta.hydroshare.org
+1more
zip
Updated Jan 9, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jeffery S. Horsburgh (2024). Advancing Open and Reproducible Water Data Science by Integrating Data Analytics with an Online Data Repository [Dataset]. https://www.hydroshare.org/resource/45d3427e794543cfbee129c604d7e865
Explore at:
zip(50.9 MB)Available download formats
Dataset updated
Jan 9, 2024
Dataset provided by
HydroShare
Authors
Jeffery S. Horsburgh
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Scientific and related management challenges in the water domain require synthesis of data from multiple domains. Many data analysis tasks are difficult because datasets are large and complex; standard formats for data types are not always agreed upon nor mapped to an efficient structure for analysis; water scientists may lack training in methods needed to efficiently tackle large and complex datasets; and available tools can make it difficult to share, collaborate around, and reproduce scientific work. Overcoming these barriers to accessing, organizing, and preparing datasets for analyses will be an enabler for transforming scientific inquiries. Building on the HydroShare repository’s established cyberinfrastructure, we have advanced two packages for the Python language that make data loading, organization, and curation for analysis easier, reducing time spent in choosing appropriate data structures and writing code to ingest data. These packages enable automated retrieval of data from HydroShare and the USGS’s National Water Information System (NWIS), loading of data into performant structures keyed to specific scientific data types and that integrate with existing visualization, analysis, and data science capabilities available in Python, and then writing analysis results back to HydroShare for sharing and eventual publication. These capabilities reduce the technical burden for scientists associated with creating a computational environment for executing analyses by installing and maintaining the packages within CUAHSI’s HydroShare-linked JupyterHub server. HydroShare users can leverage these tools to build, share, and publish more reproducible scientific workflows. The HydroShare Python Client and USGS NWIS Data Retrieval packages can be installed within a Python environment on any computer running Microsoft Windows, Apple MacOS, or Linux from the Python Package Index using the PIP utility. They can also be used online via the CUAHSI JupyterHub server (https://jupyterhub.cuahsi.org/) or other Python notebook environments like Google Collaboratory (https://colab.research.google.com/). Source code, documentation, and examples for the software are freely available in GitHub at https://github.com/hydroshare/hsclient/ and https://github.com/USGS-python/dataretrieval.

This presentation was delivered as part of the Hawai'i Data Science Institute's regular seminar series: https://datascience.hawaii.edu/event/data-science-and-analytics-for-water/
Legality Without Justice: Symbolic Governance, Institutional Denial, and the...
zenodo.org
bin, csv
Updated Nov 6, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Scott Brown; Scott Brown (2025). Legality Without Justice: Symbolic Governance, Institutional Denial, and the Ethical Foundations of Law [Dataset]. http://doi.org/10.5281/zenodo.16361108
Explore at:
csv, binAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.16361108
Dataset updated
Nov 6, 2025
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Scott Brown; Scott Brown
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Description:
This dataset accompanies the empirical analysis in Legality Without Justice, a study examining the relationship between public trust in institutions and perceived governance legitimacy using data from the World Values Survey Wave 7 (2017–2022). It includes:

WVS_Cross-National_Wave_7_csv_v6_0.csv — World Values Survey Wave 7 core data.

GDP.csv — World Bank GDP per capita (current US$) for 2022 by country.

denial.ipynb — Fully documented Jupyter notebook with code for data merging, exploratory statistics, and ordinal logistic regression using OrderedModel. Includes GDP as a control for institutional trust and perceived governance.

All data processing and analysis were conducted in Python using FAIR reproducibility principles and can be replicated or extended on Google Colab.

DOI: 10.5281/zenodo.16361108
License: Creative Commons Attribution 4.0 International (CC BY 4.0)
Authors: Anon Annotator
Publication date: 2025-07-23
Language: English
Version: 1.0.0
Publisher: Zenodo
Programming language: Python

🔽 How to Download and Run on Google Colab

Step 1: Open Google Colab

Go to https://colab.research.google.com

Step 2: Upload Files

Click File > Upload notebook, and upload the denial.ipynb file.
Also upload the CSVs (WVS_Cross-National_Wave_7_csv_v6_0.csv and GDP.csv) using the file browser on the left sidebar.

Step 3: Adjust File Paths (if needed)

In denial.ipynb, ensure file paths match:

python

CopiarEditar

wvs = pd.read_csv('/content/WVS_Cross-National_Wave_7_csv_v6_0.csv') gdp = pd.read_csv('/content/GDP.csv')

Step 4: Run the Code

Execute the notebook cells from top to bottom. You may need to install required libraries:

python

CopiarEditar

!pip install statsmodels pandas numpy

The notebook performs:

Data cleaning

Merging WVS and GDP datasets

Summary statistics

Ordered logistic regression to test if confidence in courts/police (Q57, Q58) predicts belief that the country is governed in the interest of the people (Q183), controlling for GDP.

Facebook

Twitter

Click to copy link

Link copied

Cite

Muhammed Onur ulu (2025). Turkish_Basketball_Super_League_Dataset [Dataset]. https://huggingface.co/datasets/onurulu17/Turkish_Basketball_Super_League_Dataset

Turkish_Basketball_Super_League_Dataset

onurulu17/Turkish_Basketball_Super_League_Dataset

Explore at:

Dataset updated

Aug 10, 2025

Authors

Muhammed Onur ulu

License

Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically

Description

📥 Load Dataset in Python

To load this dataset in Google Colab or any Python environment:
!pip install huggingface_hub pandas openpyxl

from huggingface_hub import hf_hub_download import pandas as pd

repo_id = "onurulu17/Turkish_Basketball_Super_League_Dataset"

files = [ "leaderboard.xlsx", "player_data.xlsx", "team_data.xlsx", "team_matches.xlsx", "player_statistics.xlsx", "technic_roster.xlsx" ]

datasets = {}

for f in files: path =… See the full description on the dataset page: https://huggingface.co/datasets/onurulu17/Turkish_Basketball_Super_League_Dataset.

Clear search

Close search

Google apps

Main menu

Turkish_Basketball_Super_League_Dataset

COCO2017 Image Caption Train

Sample Park Analysis

Python Code for Visualizing COVID-19 data

NYC Jobs Dataset (Filtered Columns)

US Consumer Complaints Against Businesses

Nou Pa Bèt: Civic Substitution and Expressive Freedoms in Post-State...

Data for "Prediction of Phakic Intraocular Lens Vault Using Machine Learning...

connect data in your google drive

Change the path for the custom data

In this case, we used ICL vault prediction using preop measurement

optimal features (sorted by importance) :

1. ICL size 2. ICL power 3. LV 4. CLR 5. ACD 6. ATA

7. MSE 8.Age 9. Pupil size 10. WTW 11. CCT 12. ACW

Split the dataset to train and test data

For a simple validation test, we split data to 8:2

Optimal parameter search could be performed in this section

Toronto Emergency Response - Open Data

MNIST_small_colab-sample_data

Top Rated TV Shows

Display the first show

Convert the API data to a DataFrame

Save to CSV and upload to Google Drive

Recognition Without Implementation: Institutional Gaps and Forestry...

Recognition Without Implementation: Institutional Gaps and Forestry Expansion in Post-Girjas Swedish Sápmi - Dataset and Analysis

Description

flan-t5-small-embed-refinedweb

Python Libraries used in tutorials.

duke_ultrasound

lsun_church_train

Can the use of Google Colab improve female and male mathematical analytical...

The Cultural Resource Curse: How Trade Dependence Undermines Creative...

How to Run This Dataset and Code in Google Colab

1) Open Colab and set up

2) Get the data files into Colab

Advancing Open and Reproducible Water Data Science by Integrating Data...

Legality Without Justice: Symbolic Governance, Institutional Denial, and the...

🔽 How to Download and Run on Google Colab

Step 1: Open Google Colab

Step 2: Upload Files

Step 3: Adjust File Paths (if needed)

Step 4: Run the Code

Turkish_Basketball_Super_League_Dataset

onurulu17/Turkish_Basketball_Super_League_Dataset