45 datasets found
  1. Speedtest Open Data - Four International cities - MEL, BKK, SHG, LAX plus...

    • figshare.com
    txt
    Updated May 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Richard Ferrers; Speedtest Global Index (2023). Speedtest Open Data - Four International cities - MEL, BKK, SHG, LAX plus ALC - 2020, 2022 [Dataset]. http://doi.org/10.6084/m9.figshare.13621169.v24
    Explore at:
    txtAvailable download formats
    Dataset updated
    May 30, 2023
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Richard Ferrers; Speedtest Global Index
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset compares four cities FIXED-line broadband internet speeds: - Melbourne, AU - Bangkok, TH - Shanghai, CN - Los Angeles, US - Alice Springs, AU

    ERRATA: 1.Data is for Q3 2020, but some files are labelled incorrectly as 02-20 of June 20. They all should read Sept 20, or 09-20 as Q3 20, rather than Q2. Will rename and reload. Amended in v7.

    1. LAX file named 0320, when should be Q320. Amended in v8.

    *lines of data for each geojson file; a line equates to a 600m^2 location, inc total tests, devices used, and average upload and download speed - MEL 16181 locations/lines => 0.85M speedtests (16.7 tests per 100people) - SHG 31745 lines => 0.65M speedtests (2.5/100pp) - BKK 29296 lines => 1.5M speedtests (14.3/100pp) - LAX 15899 lines => 1.3M speedtests (10.4/100pp) - ALC 76 lines => 500 speedtests (2/100pp)

    Geojsons of these 2* by 2* extracts for MEL, BKK, SHG now added, and LAX added v6. Alice Springs added v15.

    This dataset unpacks, geospatially, data summaries provided in Speedtest Global Index (linked below). See Jupyter Notebook (*.ipynb) to interrogate geo data. See link to install Jupyter.

    ** To Do Will add Google Map versions so everyone can see without installing Jupyter. - Link to Google Map (BKK) added below. Key:Green > 100Mbps(Superfast). Black > 500Mbps (Ultrafast). CSV provided. Code in Speedtestv1.1.ipynb Jupyter Notebook. - Community (Whirlpool) surprised [Link: https://whrl.pl/RgAPTl] that Melb has 20% at or above 100Mbps. Suggest plot Top 20% on map for community. Google Map link - now added (and tweet).

    ** Python melb = au_tiles.cx[144:146 , -39:-37] #Lat/Lon extract shg = tiles.cx[120:122 , 30:32] #Lat/Lon extract bkk = tiles.cx[100:102 , 13:15] #Lat/Lon extract lax = tiles.cx[-118:-120, 33:35] #lat/Lon extract ALC=tiles.cx[132:134, -22:-24] #Lat/Lon extract

    Histograms (v9), and data visualisations (v3,5,9,11) will be provided. Data Sourced from - This is an extract of Speedtest Open data available at Amazon WS (link below - opendata.aws).

    **VERSIONS v.24 Add tweet and google map of Top 20% (over 100Mbps locations) in Mel Q322. Add v.1.5 MEL-Superfast notebook, and CSV of results (now on Google Map; link below). v23. Add graph of 2022 Broadband distribution, and compare 2020 - 2022. Updated v1.4 Jupyter notebook. v22. Add Import ipynb; workflow-import-4cities. v21. Add Q3 2022 data; five cities inc ALC. Geojson files. (2020; 4.3M tests 2022; 2.9M tests)

    Melb 14784 lines Avg download speed 69.4M Tests 0.39M

    SHG 31207 lines Avg 233.7M Tests 0.56M

    ALC 113 lines Avg 51.5M Test 1092

    BKK 29684 lines Avg 215.9M Tests 1.2M

    LAX 15505 lines Avg 218.5M Tests 0.74M

    v20. Speedtest - Five Cities inc ALC. v19. Add ALC2.ipynb. v18. Add ALC line graph. v17. Added ipynb for ALC. Added ALC to title.v16. Load Alice Springs Data Q221 - csv. Added Google Map link of ALC. v15. Load Melb Q1 2021 data - csv. V14. Added Melb Q1 2021 data - geojson. v13. Added Twitter link to pics. v12 Add Line-Compare pic (fastest 1000 locations) inc Jupyter (nbn-intl-v1.2.ipynb). v11 Add Line-Compare pic, plotting Four Cities on a graph. v10 Add Four Histograms in one pic. v9 Add Histogram for Four Cities. Add NBN-Intl.v1.1.ipynb (Jupyter Notebook). v8 Renamed LAX file to Q3, rather than 03. v7 Amended file names of BKK files to correctly label as Q3, not Q2 or 06. v6 Added LAX file. v5 Add screenshot of BKK Google Map. v4 Add BKK Google map(link below), and BKK csv mapping files. v3 replaced MEL map with big key version. Prev key was very tiny in top right corner. v2 Uploaded MEL, SHG, BKK data and Jupyter Notebook v1 Metadata record

    ** LICENCE AWS data licence on Speedtest data is "CC BY-NC-SA 4.0", so use of this data must be: - non-commercial (NC) - reuse must be share-alike (SA)(add same licence). This restricts the standard CC-BY Figshare licence.

    ** Other uses of Speedtest Open Data; - see link at Speedtest below.

  2. H

    JavaScript code for retrieval of MODIS Collection 6 NDSI snow cover at...

    • beta.hydroshare.org
    • hydroshare.org
    • +1more
    zip
    Updated Feb 11, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Irene Garousi-Nejad; David Tarboton (2022). JavaScript code for retrieval of MODIS Collection 6 NDSI snow cover at SNOTEL sites and a Jupyter Notebook to merge/reprocess data [Dataset]. http://doi.org/10.4211/hs.d287f010b2dd48edb0573415a56d47f8
    Explore at:
    zip(52.2 KB)Available download formats
    Dataset updated
    Feb 11, 2022
    Dataset provided by
    HydroShare
    Authors
    Irene Garousi-Nejad; David Tarboton
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Description

    This JavaScript code has been developed to retrieve NDSI_Snow_Cover from MODIS version 6 for SNOTEL sites using the Google Earth Engine platform. To successfully run the code, you should have a Google Earth Engine account. An input file, called NWM_grid_Western_US_polygons_SNOTEL_ID.zip, is required to run the code. This input file includes 1 km grid cells of the NWM containing SNOTEL sites. You need to upload this input file to the Assets tap in the Google Earth Engine code editor. You also need to import the MOD10A1.006 Terra Snow Cover Daily Global 500m collection to the Google Earth Engine code editor. You may do this by searching for the product name in the search bar of the code editor.

    The JavaScript works for s specified time range. We found that the best period is a month, which is the maximum allowable time range to do the computation for all SNOTEL sites on Google Earth Engine. The script consists of two main loops. The first loop retrieves data for the first day of a month up to day 28 through five periods. The second loop retrieves data from day 28 to the beginning of the next month. The results will be shown as graphs on the right-hand side of the Google Earth Engine code editor under the Console tap. To save results as CSV files, open each time-series by clicking on the button located at each graph's top right corner. From the new web page, you can click on the Download CSV button on top.

    Here is the link to the script path: https://code.earthengine.google.com/?scriptPath=users%2Figarousi%2Fppr2-modis%3AMODIS-monthly

    Then, run the Jupyter Notebook (merge_downloaded_csv_files.ipynb) to merge the downloaded CSV files that are stored for example in a folder called output/from_GEE into one single CSV file which is merged.csv. The Jupyter Notebook then applies some preprocessing steps and the final output is NDSI_FSCA_MODIS_C6.csv.

  3. v

    Update CSV item in ArcGIS

    • anrgeodata.vermont.gov
    Updated Mar 18, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    ArcGIS Survey123 (2022). Update CSV item in ArcGIS [Dataset]. https://anrgeodata.vermont.gov/documents/dc69467c3e7243719c9125679bbcee9b
    Explore at:
    Dataset updated
    Mar 18, 2022
    Dataset authored and provided by
    ArcGIS Survey123
    Description

    ArcGIS Survey123 utilizes CSV data in several workflows, including external choice lists, the search() appearance, and pulldata() calculations. When you need to periodically update the CSV content used in a survey, a useful method is to upload the CSV files to your ArcGIS organization and link the CSV items to your survey. Once linked, any updates to the CSV items will automatically pull through to your survey without the need to republish the survey. To learn more about linking items to a survey, see Linked content.This notebook demonstrates how to automate updating a CSV item in your ArcGIS organization.Note: It is recommended to run this notebook on your computer in Jupyter Notebook or ArcGIS Pro, as that will provide the best experience when reading locally stored CSV files. If you intend to schedule this notebook in ArcGIS Online or ArcGIS Notebook Server, additional configuration may be required to read CSV files from online file storage, such as Microsoft OneDrive or Google Drive.

  4. Sample Park Analysis

    • figshare.com
    zip
    Updated Nov 2, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Eric Delmelle (2025). Sample Park Analysis [Dataset]. http://doi.org/10.6084/m9.figshare.30509021.v1
    Explore at:
    zipAvailable download formats
    Dataset updated
    Nov 2, 2025
    Dataset provided by
    figshare
    Figsharehttp://figshare.com/
    Authors
    Eric Delmelle
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    README – Sample Park Analysis## OverviewThis repository contains a Google Colab / Jupyter notebook and accompanying dataset used for analyzing park features and associated metrics. The notebook demonstrates data loading, cleaning, and exploratory analysis of the Hope_Park_original.csv file.## Contents- sample park analysis.ipynb — The main analysis notebook (Colab/Jupyter format)- Hope_Park_original.csv — Source dataset containing park information- README.md — Documentation for the contents and usage## Usage1. Open the notebook in Google Colab or Jupyter.2. Upload the Hope_Park_original.csv file to the working directory (or adjust the file path in the notebook).3. Run each cell sequentially to reproduce the analysis.## RequirementsThe notebook uses standard Python data science libraries:```pythonpandasnumpymatplotlibseaborn

  5. d

    Reporting behavior from WHO COVID-19 public data

    • search.dataone.org
    • data.niaid.nih.gov
    • +1more
    Updated Jul 14, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Auss Abbood (2025). Reporting behavior from WHO COVID-19 public data [Dataset]. http://doi.org/10.5061/dryad.9s4mw6mmb
    Explore at:
    Dataset updated
    Jul 14, 2025
    Dataset provided by
    Dryad Digital Repository
    Authors
    Auss Abbood
    Time period covered
    Dec 16, 2022
    Description

    Objective Daily COVID-19 data reported by the World Health Organization (WHO) may provide the basis for political ad hoc decisions including travel restrictions. Data reported by countries, however, is heterogeneous and metrics to evaluate its quality are scarce. In this work, we analyzed COVID-19 case counts provided by WHO and developed tools to evaluate country-specific reporting behaviors. Methods In this retrospective cross-sectional study, COVID-19 data reported daily to WHO from 3rd January 2020 until 14th June 2021 were analyzed. We proposed the concepts of binary reporting rate and relative reporting behavior and performed descriptive analyses for all countries with these metrics. We developed a score to evaluate the consistency of incidence and binary reporting rates. Further, we performed spectral clustering of the binary reporting rate and relative reporting behavior to identify salient patterns in these metrics. Results Our final analysis included 222 countries and regions...., Data collection COVID-19 data was downloaded from WHO. Using a public repository, we have added the countries' full names to the WHO data set using the two-letter abbreviations for each country to merge both data sets. The provided COVID-19 data covers January 2020 until June 2021. We uploaded the final data set used for the analyses of this paper. Data processing We processed data using a Jupyter Notebook with a Python kernel and publically available external libraries. This upload contains the required Jupyter Notebook (reporting_behavior.ipynb) with all analyses and some additional work, a README, and the conda environment yml (env.yml)., Any text editor including Microsoft Excel and their free alternatives can open the uploaded CSV file. Any web browser and some code editors (like the freely available Visual Studio Code) can show the uploaded Jupyter Notebook if the required Python environment is set up correctly.

  6. Z

    UWB Motion Detection Data Set

    • data.niaid.nih.gov
    Updated Feb 11, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Klemen Bregar; Andrej Hrovat; Mihael Mohorčič (2022). UWB Motion Detection Data Set [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_4613124
    Explore at:
    Dataset updated
    Feb 11, 2022
    Dataset provided by
    Institut Jožef Stefan
    Authors
    Klemen Bregar; Andrej Hrovat; Mihael Mohorčič
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Introduction

    This data set includes a collection of measurements using DecaWave DW1000 UWB radios in two indoor environments used for motion detection functionality. Measurements include channel impulse response (CIR) samples in form of power delay profile (PDP) with corresponding timestamps for three channels for each indoor environment.

    Data set includes pieces of Python code and Jupyter notebooks for data loading, analysis and to reproduce the results of a paper entitled "UWB Radio Based Motion Detection System for Assisted Living" submitted to MDPI Sensors.

    The data set will require around 10 GB of total free space after extraction.

    The code included in the data set is written and tested on Linux (Ubuntu 20.04) and requires 16 GB of RAM and additional SWAP partition to run properly. The code can be modified to consume less memory but it requires unnecessary additional work. If the .npy format is compatible with your numpy version, you won't need to regenerate npy data from .csv files.

    Data Set Structure

    The resulting folder after extracting the uwb_motion_detection.zip file is organized as follows:

    data subfolder: contains all original .csv and intermediate .npy data files.

    models

    pdp: this folder contains 4 .csv files with raw PDP measurements (timestamp + PDP). The data format will be discussed in the following section.

    pdp_diff: this folder contains .npy files with PDP samples and .npy files with timestamps. Those files are generated by running the generate_pdp_diff.py script.

    generate_pdp_diff.py

    validation subfolder: contains data for motion detection validation

    events: contains .npy files with motion events for validation. The .npy files are generated using generate_event_x.py files or notebooks inside the /Process/validation folder.

    pdp: this folder contains raw PDP measurements in .csv format.

    pdp_diff: this folder contains .npy files with PDP samples and .npy files with timestamps. Those files are generated by running the generate_pdp_diff.py script.

    generate_events_0.py

    generate_events_1.py

    generate_events_2.py

    generate_pdp_diff.py

    figures subfolder: contains all figures generated in Jupyter notebooks inside the "Process" folder.

    Process subfolder: contains Jupyter notebooks with data processing and motion detection code.

    MotionDetection: contains notebook comparing standard score motion detection with windowed standard score motion detection

    OnlineModels: presents the development process of online models definitions

    PDP_diff: presents the basic principle of PDP differences used in the motion detection

    Validation: presents a motion detection validation process

    Raw data structure

    All .csv files in data folder contain raw PDP measurements with timestamps for each PDP sample. The structure of file goes as follows:

    unix timestamp, cir0 [dBm], cir1 [dBm], cir2[dBm] ... cir149[dBm]

  7. u

    Data from: Data and code from: Cultivation and dynamic cropping processes...

    • agdatacommons.nal.usda.gov
    • catalog.data.gov
    zip
    Updated Nov 21, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Lucas J. Heintzman; Nancy E. McIntyre; Eddy J. Langendoen; Quentin D. Read (2025). Data and code from: Cultivation and dynamic cropping processes impart land-cover heterogeneity within agroecosystems: a metrics-based case study in the Yazoo-Mississippi Delta (USA) [Dataset]. http://doi.org/10.15482/USDA.ADC/1529589
    Explore at:
    zipAvailable download formats
    Dataset updated
    Nov 21, 2025
    Dataset provided by
    Ag Data Commons
    Authors
    Lucas J. Heintzman; Nancy E. McIntyre; Eddy J. Langendoen; Quentin D. Read
    License

    U.S. Government Workshttps://www.usa.gov/government-works
    License information was derived automatically

    Area covered
    Mississippi Delta, Mississippi, United States
    Description

    This dataset contains data and code from the manuscript:Heintzman, L.J., McIntyre, N.E., Langendoen, E.J., & Read, Q.D. (2024). Cultivation and dynamic cropping processes impart land-cover heterogeneity within agroecosystems: a metrics-based case study in the Yazoo-Mississippi Delta (USA). Landscape Ecology 39, 29 (2024). https://doi.org/10.1007/s10980-024-01797-0There are 14 rasters of land use and land cover data for the study region, in .tif format with associated auxiliary files, two shape files with county boundaries and study area extent, a CSV file with summary information derived from the rasters, and a Jupyter notebook containing Python code.The rasters included here represent an intermediate data product. Original unprocessed rasters from NASS CropScape are not included here, nor is the code to process them.List of filesMS_Delta_maps.zipMSDeltaCounties_UTMZone15N.shp: Depiction of the 19 counties (labeled) that intersect the Mississippi Alluvial Plain in western Mississippi.MS_Delta_MAP_UTMZone15N.shp: Depiction of the study area extent.mf8h_20082021.zipmf8h_XXXX.tif: Yearly, reclassified and majority filtered LULC data used to build comboall1.csv - derived from USDA NASS CropScape. There are 14 .tif files total for years 2008-2021. Each .tif file includes auxiliary files with the same file name and the following extensions: .tfw, .tif.aux.xml, .tif.ovr., .tif.vat.cpg., .tif.vat.dbf.comboall1.csv: Combined dataset of LULC information for all 14 years in study period.analysis.ipynb_.txt: Jupyter Notebook used to analyze comboall1.csv. Convert to .ipynb format to open with Jupyter.This research was conducted under USDA Agricultural Research Service, National Program 211 (Water Availability and Watershed Management).

  8. Legality Without Justice: Symbolic Governance, Institutional Denial, and the...

    • zenodo.org
    bin, csv
    Updated Nov 6, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Scott Brown; Scott Brown (2025). Legality Without Justice: Symbolic Governance, Institutional Denial, and the Ethical Foundations of Law [Dataset]. http://doi.org/10.5281/zenodo.16361108
    Explore at:
    csv, binAvailable download formats
    Dataset updated
    Nov 6, 2025
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Scott Brown; Scott Brown
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Description:
    This dataset accompanies the empirical analysis in Legality Without Justice, a study examining the relationship between public trust in institutions and perceived governance legitimacy using data from the World Values Survey Wave 7 (2017–2022). It includes:

    • WVS_Cross-National_Wave_7_csv_v6_0.csv — World Values Survey Wave 7 core data.

    • GDP.csv — World Bank GDP per capita (current US$) for 2022 by country.

    • denial.ipynb — Fully documented Jupyter notebook with code for data merging, exploratory statistics, and ordinal logistic regression using OrderedModel. Includes GDP as a control for institutional trust and perceived governance.

    All data processing and analysis were conducted in Python using FAIR reproducibility principles and can be replicated or extended on Google Colab.

    DOI: 10.5281/zenodo.16361108
    License: Creative Commons Attribution 4.0 International (CC BY 4.0)
    Authors: Anon Annotator
    Publication date: 2025-07-23
    Language: English
    Version: 1.0.0
    Publisher: Zenodo
    Programming language: Python

    🔽 How to Download and Run on Google Colab

    Step 1: Open Google Colab

    Go to https://colab.research.google.com

    Step 2: Upload Files

    Click File > Upload notebook, and upload the denial.ipynb file.
    Also upload the CSVs (WVS_Cross-National_Wave_7_csv_v6_0.csv and GDP.csv) using the file browser on the left sidebar.

    Step 3: Adjust File Paths (if needed)

    In denial.ipynb, ensure file paths match:

    python
    CopiarEditar
    wvs = pd.read_csv('/content/WVS_Cross-National_Wave_7_csv_v6_0.csv') gdp = pd.read_csv('/content/GDP.csv')

    Step 4: Run the Code

    Execute the notebook cells from top to bottom. You may need to install required libraries:

    python
    CopiarEditar
    !pip install statsmodels pandas numpy

    The notebook performs:

    • Data cleaning

    • Merging WVS and GDP datasets

    • Summary statistics

    • Ordered logistic regression to test if confidence in courts/police (Q57, Q58) predicts belief that the country is governed in the interest of the people (Q183), controlling for GDP.

  9. Cognitive Fatigue

    • figshare.com
    csv
    Updated Nov 5, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rui Varandas; Inês Silveira; Hugo Gamboa (2025). Cognitive Fatigue [Dataset]. http://doi.org/10.6084/m9.figshare.28188143.v3
    Explore at:
    csvAvailable download formats
    Dataset updated
    Nov 5, 2025
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Rui Varandas; Inês Silveira; Hugo Gamboa
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description
    1. Cognitive FatigueWhile executing the proposed tasks, the participants’ physiological signals were monitored using two biosignalsplux devices from PLUX Wireless Biosignals, Lisbon, Portugal, with a sampling frequency of 100 Hz a resolution of 16 bits (24 bits in the case of fNIRS). Six different sensors were used: EEG and fNIRS positioned around the F7 and F8 of the 10–20 system (dorsolateral prefrontal cortex is often used to assess CW and fatigue as well as cognitive states); ECG monitored an approximation of Lead I of the Einthoven system, EDA placed on the palm of the non-dominant hand; ACC was positioned on the right side of the head to measure head movement and overall posture changes, and the RIP sensor was attached to the upper-abdominal area to measure the respiration cycles—the combination of the three allows to infer about the response of the Autonomic Nervous System (ANS) of the human body, namely, the response of the sympathetic and parasympathetic nervous system.2.1. Experimental designCognitive fatigue (CF) is a phenomenon that arises following the prolonged engagement in mentally demanding cognitive tasks. Thus, we developed an experimental procedure that involved three demanding tasks: a digital lesson in Jupyter Notebook format, three repetitions of Corsi-Block task, and two repetitions of a concentration test.Before the Corsi-Block task and after the concentration task there were periods of baseline of two min. In our analysis, the first baseline period, although not explicitly present in the dataset, was designated as representing no CF, whereas the final baseline period was designated as representing the presence of CF. Between repetitions of the Corsi-Block task, there were periods of baseline of 15 s after the task and of 30 s before the beginning of each repetition of the task.2.2. Data recordingA data sample of 10 volunteer participants (4 females) aged between 22 and 48 years old (M = 28.2, SD = 7.6) took part in this study. All volunteers were recruited at NOVA School of Science and Technology, fluent in English, right-handed, none reported suffering from psychological disorders, and none reported taking regular medication. Written informed consent was obtained before participating and all Ethical Procedures approved by the Ethics Committee of NOVA University of Lisbon were thoroughly followed.In this study, we omitted the data from one participant due to the insufficient duration of data acquisition.2.3. Data labellingThe labels easy, difficult, very difficult and repeat found in the ECG_lesson_answers.txt files represent the subjects' opinion of the content read in the ECG lesson. The repeat label represents the most difficult level. It's called repeat because when you press it, the answer to the question is shown again. This system is based on the Anki system, which has been proposed and used to memorise information effectively. In addition, the PB description JSON files include timestamps indicating the start and end of cognitive tasks, baseline periods, and other events, which are useful for defining CF states as we defined in 2.1.2.4. Data descriptionBiosignals include EEG, fNIRS (not converted to oxi and deoxiHb), ECG, EDA, respiration (RIP), accelerometer (ACC), and push-button data (PB). All signals have already been converted to physical units. In each biosignal file, the first column corresponds to the timestamps.HCI features encompass keyboard, mouse, and screenshot data. Below is a Python code snippet for extracting screenshot files from the screenshots CSV file.import base64from os import mkdirfrom os.path import joinfile = '...'with open(file, 'r') as f: lines = f.readlines()for line in lines[1:]: timestamp = line.split(',')[0] code = line.split(',')[-1][:-2] imgdata = base64.b64decode(code) filename = str(timestamp) + '.jpeg' mkdir('screenshot') with open(join('screenshot', filename), 'wb') as f: f.write(imgdata)A characterization file containing age and gender information for all subjects in each dataset is provided within the respective dataset folder (e.g., D2_subject-info.csv). Other complementary files include (i) description of the pushbuttons to help segment the signals (e.g., D2_S2_PB_description.json) and (ii) labelling (e.g., D2_S2_ECG_lesson_results.txt). The files D2_Sx_results_corsi-block_board_1.json and D2_Sx_results_corsi-block_board_2.json show the results for the first and second iterations of the corsi-block task, where, for example, row_0_1 = 12 means that the subject got 12 pairs right in the first row of the first board, and row_0_2 = 12 means that the subject got 12 pairs right in the first row of the second board.
  10. d

    Using HydroShare Buckets to Access Resource Files

    • search.dataone.org
    Updated Aug 9, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Pabitra Dash (2025). Using HydroShare Buckets to Access Resource Files [Dataset]. https://search.dataone.org/view/sha256%3Ab25a0f5e5d62530d70ecd6a86f1bd3fa2ab804a8350dc7ba087327839fcb1fb1
    Explore at:
    Dataset updated
    Aug 9, 2025
    Dataset provided by
    Hydroshare
    Authors
    Pabitra Dash
    Description

    This resource contains a draft Jupyter Notebook that has example code snippets showing how to access HydroShare resource files using HydroShare S3 buckets. The user_account.py is a utility to read user hydroshare cached account information in any of the JupyterHub instances that HydroShare has access to. The example notebook uses this utility so that you don't have to enter your hydroshare account information in order to access hydroshare buckets.

    Here are the 3 notebooks in this resource:

    • hydroshare_s3_bucket_access_examples.ipynb:

    The above notebook has examples showing how to upload/download resource files from the resource bucket. It also contains examples how to list files and folders of a resource in a bucket.

    • python-modules-direct-read-from-bucket/hs_bucket_access_gdal_example.ipynb:

    The above notebook has examples for reading raster and shapefile from bucket using gdal without the need of downloading the file from the bucket to local disk.

    • python-modules-direct-read-from-bucket/hs_bucket_access_non_gdal_example.ipynb

    The above notebook has examples of using h5netcdf and xarray for reading netcdf file directly from bucket. It also contains examples of using rioxarray to read raster file, and pandas to read CSV file from hydroshare buckets.

  11. OpenOrca

    • kaggle.com
    • opendatalab.com
    • +1more
    zip
    Updated Nov 22, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Devastator (2023). OpenOrca [Dataset]. https://www.kaggle.com/datasets/thedevastator/open-orca-augmented-flan-dataset/versions/2
    Explore at:
    zip(2548102631 bytes)Available download formats
    Dataset updated
    Nov 22, 2023
    Authors
    The Devastator
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Open-Orca Augmented FLAN Dataset

    Unlocking Advanced Language Understanding and ML Model Performance

    By Huggingface Hub [source]

    About this dataset

    The Open-Orca Augmented FLAN Collection is a revolutionary dataset that unlocks new levels of language understanding and machine learning model performance. This dataset was created to support research on natural language processing, machine learning models, and language understanding through leveraging the power of reasoning trace-enhancement techniques. By enabling models to understand complex relationships between words, phrases, and even entire sentences in a more robust way than ever before, this dataset provides researchers expanded opportunities for furthering the progress of linguistics research. With its unique combination of features including system prompts, questions from users and responses from systems, this dataset opens up exciting possibilities for deeper exploration into the cutting edge concepts underlying advanced linguistics applications. Experience a new level of accuracy and performance - explore Open-Orca Augmented FLAN Collection today!

    More Datasets

    For more datasets, click here.

    Featured Notebooks

    • 🚨 Your notebook can be here! 🚨!

    How to use the dataset

    This guide provides an introduction to the Open-Orca Augmented FLAN Collection dataset and outlines how researchers can utilize it for their language understanding and natural language processing (NLP) work. The Open-Orca dataset includes system prompts, questions posed by users, and responses from the system.

    Getting Started The first step is to download the data set from Kaggle at https://www.kaggle.com/openai/open-orca-augmented-flan and save it in a project directory of your choice on your computer or cloud storage space. Once you have downloaded the data set, launch your ‘Jupyter Notebook’ or ‘Google Colab’ program with which you want to work with this data set.

    Exploring & Preprocessing Data: To get a better understanding of the features in this dataset, import them into Pandas DataFrame as shown below. You can use other libraries as per your need:

    import pandas as pd   # Library used for importing datasets into Python 
    
    df = pd.read_csv('train.csv') #Imports train csv file into Pandas};#DataFrame 
    
    df[['system_prompt','question','response']].head() #Views top 5 rows with columns 'system_prompt','question','response'
    

    After importing check each feature using basic descriptive statistics such Pandas groupby statement: We can use groupby statements to have greater clarity over the variables present in each feature(elements). The below command will show counts of each element in System Prompt column present under train CVS file :

     df['system prompt'].value_counts().head()#shows count of each element present under 'System Prompt'column
     Output: User says hello guys 587 <br>System asks How are you?: 555 times<br>User says I am doing good: 487 times <br>..and so on   
    

    Data Transformation: After inspecting & exploring different features one may want/need certain changes that best suits their needs from this dataset before training modeling algorithms on it.
    Common transformation steps include : Removing punctuation marks : Since punctuation marks may not add any value to computation operations , we can remove them using regex functions write .replace('[^A-Za -z]+','' ) as

    Research Ideas

    • Automated Question Answering: Leverage the dataset to train and develop question answering models that can provide tailored answers to specific user queries while retaining language understanding abilities.
    • Natural Language Understanding: Use the dataset as an exploratory tool for fine-tuning natural language processing applications, such as sentiment analysis, document categorization, parts-of-speech tagging and more.
    • Machine Learning Optimizations: The dataset can be used to build highly customized machine learning pipelines that allow users to harness the power of conditioning data with pre-existing rules or models for improved accuracy and performance in automated tasks

    Acknowledgements

    If you use this dataset in your research, please credit the original authors. Data Source

    License

    License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. [See Other Information](ht...

  12. Z

    Can Developers Prompt? A Controlled Experiment for Code Documentation...

    • data.niaid.nih.gov
    • zenodo.org
    Updated Sep 11, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kruse, Hans-Alexander; Puhlfürß, Tim; Maalej, Walid (2024). Can Developers Prompt? A Controlled Experiment for Code Documentation Generation [Replication Package] [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_13127237
    Explore at:
    Dataset updated
    Sep 11, 2024
    Dataset provided by
    Universität Hamburg
    Authors
    Kruse, Hans-Alexander; Puhlfürß, Tim; Maalej, Walid
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Summary of Artifacts

    This is the replication package for the paper titled 'Can Developers Prompt? A Controlled Experiment for Code Documentation Generation' that is part of the 40th IEEE International Conference on Software Maintenance and Evolution (ICSME), from October 6 to 11, 2024, located in Flagstaff, AZ, USA.

    Full Abstract

    Large language models (LLMs) bear great potential for automating tedious development tasks such as creating and maintaining code documentation. However, it is unclear to what extent developers can effectively prompt LLMs to create concise and useful documentation. We report on a controlled experiment with 20 professionals and 30 computer science students tasked with code documentation generation for two Python functions. The experimental group freely entered ad-hoc prompts in a ChatGPT-like extension of Visual Studio Code, while the control group executed a predefined few-shot prompt. Our results reveal that professionals and students were unaware of or unable to apply prompt engineering techniques. Especially students perceived the documentation produced from ad-hoc prompts as significantly less readable, less concise, and less helpful than documentation from prepared prompts. Some professionals produced higher quality documentation by just including the keyword Docstring in their ad-hoc prompts. While students desired more support in formulating prompts, professionals appreciated the flexibility of ad-hoc prompting. Participants in both groups rarely assessed the output as perfect. Instead, they understood the tools as support to iteratively refine the documentation. Further research is needed to understand which prompting skills and preferences developers have and which support they need for certain tasks.

    Author Information

    Name Affiliation Email

    Hans-Alexander Kruse Universität Hamburg hans-alexander.kruse@studium.uni-hamburg.de

    Tim Puhlfürß Universität Hamburg tim.puhlfuerss@uni-hamburg.de

    Walid Maalej Universität Hamburg walid.maalej@uni-hamburg.de

    Citation Information

    @inproceedings{kruse-icsme-2024, author={Kruse, Hans-Alexander and Puhlf{"u}r{\ss}, Tim and Maalej, Walid}, booktitle={2022 IEEE International Conference on Software Maintenance and Evolution}, title={Can Developers Prompt? A Controlled Experiment for Code Documentation Generation}, year={2024}, doi={tba}, }

    Artifacts Overview

    1. Preprint

    The file kruse-icsme-2024-preprint.pdf is the preprint version of the official paper. You should read the paper in detail to understand the study, especially its methodology and results.

    1. Results

    The folder results includes two subfolders, explained in the following.

    Demographics RQ1 RQ2

    The subfolder Demographics RQ1 RQ2 provides Jupyter Notebook file evaluation.ipynb for analyzing (1) the experiment participants' submissions of the digital survey and (2) the ad-hoc prompts that the experimental group entered into their tool. Hence, this file provides demographic information about the participants and results for the research questions 1 and 2. Please refer to the README file inside this subfolder for installation steps of the Jupyter Notebook file.

    RQ2

    The subfolder RQ2 contains further subfolders with Microsoft Excel files specific to the results of research question 2:

    The subfolder UEQ contains three times the official User Experience Questionnaire (UEQ) analysis Excel tool, with data entered from all participants/students/professionals.

    The subfolder Open Coding contains three Excel files with the open-coding results for the free-text answers that participants could enter at the end of the survey to state additional positive and negative comments about their experience during the experiment. The Consensus file provides the finalized version of the open coding process.

    1. Extension

    The folder extension contains the code of the Visual Studio Code (VS Code) extension developed in this study to generate code documentation with predefined prompts. Please refer to the README file inside the folder for installation steps. Alternatively, you can install the deployed version of this tool, called Code Docs AI, via the VS Code Marketplace.

    You can install the tool to generate code documentation with ad-hoc prompts directly via the VS Code Marketplace. We did not include the code of this extension in this replication package due to license conflicts (GNUv3 vs. MIT).

    1. Survey

    The folder survey contains PDFs of the digital survey in two versions:

    The file Survey.pdf contains the rendered version of the survey (how it was presented to participants).

    The file SurveyOptions.pdf is an export of the LimeSurvey web platform. Its main purpose is to provide the technical answer codes, e.g., AO01 and AO02, that refer to the rendered answer texts, e.g., Yes and No. This can help you if you want to analyze the CSV files inside the results folder (instead of using the Jupyter Notebook file), as the CSVs contain the answer codes, not the answer texts. Please note that an export issue caused page 9 to be almost blank. However, this problem is negligible as the question on this page only contained one free-text answer field.

    1. Appendix

    The folder appendix provides additional material about the study:

    The subfolder tool_screenshots contains screenshots of both tools.

    The file few_shots.txt lists the few shots used for the predefined prompt tool.

    The file test_functions.py lists the functions used in the experiment.

    Revisions

    Version Changelog

    1.0.0 Initial upload

    1.1.0 Add paper preprint. Update abstract.

    1.2.0 Update replication package based on ICSME Artifact Track reviews

    License

    See LICENSE file.

  13. Pulsar Voices

    • figshare.com
    pdf
    Updated Jun 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Richard Ferrers; Anderson Murray; Ben Raymond; Gary Ruben; CHRISTOPHER RUSSELL; Sarath Tomy; Michael Walker (2023). Pulsar Voices [Dataset]. http://doi.org/10.6084/m9.figshare.3084748.v2
    Explore at:
    pdfAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    figshare
    Figsharehttp://figshare.com/
    Authors
    Richard Ferrers; Anderson Murray; Ben Raymond; Gary Ruben; CHRISTOPHER RUSSELL; Sarath Tomy; Michael Walker
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The data is sourced from CSIRO Parkes ATNF.eg http://www.atnf.csiro.au/research/pulsar/psrcat/Feel the pulse of the universeWe're taking signal data from astronomical "pulsar" sources and creating a way to listen to their signals audibly.Pulsar data is available from ATNF at CSIRO.au. Our team at #SciHackMelb has been working on a #datavis to give researchers and others a novel way to explore the Pulsar corpus, especially through the sound of the frequencies at which the Pulsars emit pulses.Link to project page at #SciHackMelb - http://www.the-hackfest.com/events/melbourne-science-hackfest/projects/pulsar-voices/The files attached here include: source data, project presentation, data as used in website final_pulsar.sql, and other methodology documentation. Importantly, see the Github link which contains data manipulation code, html code to present the data, and render audibly, iPython Notebook to process single pulsar data into an audible waveform file. Together all these resources are the Pulsar Voices activity and resulting data.Source Data;* RA - east/west coordinates (0 - 24 hrs, roughly equates to longitude) [theta; transforms RA to 0 - 360*]* Dec - north/south coordinates (-90, +90 roughly equates to latitude i.e. 90 is above north pole, and -90 south pole)* P0 - the time in seconds that a pulsar repeats its signal* f - 1/P0 which ranges from 700 cycles per sec, to some which pulses which occur every few seconds* kps - distance from Earth in kilo-parsecs. 1 kps = 3,000 light years. The furthest data is 30 kps. The galactic centre is about 25,000 light years away i.e. about 8kps.psrcatShort.csv = 2,295 Pulsars all known pulsars with above fields; RA, Dec, ThetapsrcatMedium.csv - add P0 and kps, only 1428 lines - i.e. not available for all 2,295 datapointpsrcatSparse.csv - add P0 and kps, banks if n/a, 2,295 linesshort.txt - important pulsars with high levels of observation (** even more closely examined)pulsar.R - code contributed by Ben Raymond to visualise Pulsar frequency, period in histogrampulsarVoices_authors.JPG - added photo of authors from SciHackMelbAdded to the raw data:- Coordinates to map RA, Dec to screen width(y)/height(x)y = RA[Theta]*width/360; x = (Dec + 90)*height/180- audible frequency converted from Pulsar frequency (1/P0)Formula for 1/P0(x) -> Hz(y) => y = 10 ^ (0.5 log(x) + 2.8)Explanation in text file; Convert1/P0toHz.txtTone generator from: http://www.softsynth.com/webaudio/tone.php- detailed waveform file audible converted from Pulsar signal data, and waveform image (and python notebook to generate; available):The project source is hosted on github at:https://github.com/gazzar/pulsarvoicesAn IPython/Jupyter notebook contains code and a rough description of the method used to process a psrfits .sf filedownloaded via the CSIRO Data Access Portal at http://doi.org/10.4225/08/55940087706E1The notebook contains experimental code to read one of these .sf files and access the contained spectrogram data, processing it to generate an audible signal.It also reads the .txt files containing columnar pulse phase data (which is also contained in the .sf files) and processes these by frequency modulating the signal with an audible carrier.This is the method used to generate the .wav and .png files used in the web interface.https://github.com/gazzar/pulsarvoices/blob/master/ipynb/hackfest1.ipynb A standalone python script that does the .txt to .png and .wav signal processing was used to process 15 more pulsar data examples. These can be reproduced by running the script.https://github.com/gazzar/pulsarvoices/blob/master/data/pulsarvoices.pyProcessed file at: https://github.com/gazzar/pulsarvoices/tree/master/webhttps://github.com/gazzar/pulsarvoices/blob/master/web/J0437-4715.pngJ0437-4715.wav | J0437-4715.png)#Datavis online at: http://checkonline.com.au/tooltip.php. Code at Github linked above. See especially:https://github.com/gazzar/pulsarvoices/blob/master/web/index.phpparticularly, lines 314 - 328 (or search: "SELECT * FROM final_pulsar";) which loads pulsar data from DB and push to screen with Hz on mouseover.Pulsar Voices webpage Functions:1.There is sound when you run the mouse across the Pulsars. We plot all known pulsars (N=2,295), and play a tone for pulsars we had data on frequency i.e. about 75%.2. In the bottom left corner a more detailed Pulsar sound, and wave image pops up when you click the star icon. Two of the team worked exclusively on turning a single pulsars waveform into an audible wav file. They created 16 of these files, and a workflow, but the team only had time to load one waveform. With more time, it would be great to load these files.3. If you leave the mouse over a Pulsar, a little data description pops up, with location (RA, Dec), distance (kilo parsecs; 1 = 3,000 light years), and frequency of rotation (and Hz converted to human hearing).4.If you click on a Pulsar, other pulsars with similar frequency are highlighted in white. With more time I was interested to see if there are harmonics between pulsars. i.e. related frequencies.The TeamMichael Walker is: orcid.org/0000-0003-3086-6094 ; Biosciences PhD student, Unimelb, Melbourne.Richard Ferrers is: orcid.org/0000-0002-2923-9889 ; ANDS Research Data Analyst, Innovation/Value Researcher, Melbourne.Sarath Tomy is: http://orcid.org/0000-0003-4301-0690 ; La Trobe PhD Comp Sci, Melbourne.Gary Ruben is: http://orcid.org/0000-0002-6591-1820 ; CSIRO Postdoc at Australian Synchrotron, Melbourne.Christopher Russell is: Data Manager, CSIRO, Sydney.https://wiki.csiro.au/display/ASC/Chris+RussellAnderson Murray is: orcid.org/0000-0001-6986-9140; Physics Honours, Monash, Melbourne.Contact: richard.ferrers@ands.org.au for more information.What is still left to do?* load data, description, images fileset to figshare :: DOI ; DONE except DOI* add overview images as option eg frequency bi-modal histogram* colour code pulsars by distance; DONE* add pulsar detail sound to Top three Observants; 16 pulsars processed but not loaded* add tones to pulsars to indicate f; DONE* add tooltips to show location, distance, frequency, name; DONE* add title and description; DONE* project data onto a planetarium dome with interaction to play pulsar frequencies.DONE see youtube video at https://youtu.be/F119gqOKJ1U* zoom into parts of sky to get separation between close data points - see youtube; function in Google Earth #datavis of dataset. Link at youtube.* set upper and lower tone boundaries, so tones aren't annoying* colour code pulsars by frequency bins e.g. >100 Hz, 10 - 100, 1 - 10,

  14. z

    The Cultural Resource Curse: How Trade Dependence Undermines Creative...

    • zenodo.org
    bin, csv
    Updated Aug 9, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Anon Anon; Anon Anon (2025). The Cultural Resource Curse: How Trade Dependence Undermines Creative Industries [Dataset]. http://doi.org/10.5281/zenodo.16784974
    Explore at:
    csv, binAvailable download formats
    Dataset updated
    Aug 9, 2025
    Dataset provided by
    Zenodo
    Authors
    Anon Anon; Anon Anon
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset accompanies the study The Cultural Resource Curse: How Trade Dependence Undermines Creative Industries. It contains country-year panel data for 2000–2023 covering both OECD economies and the ten largest Latin American countries by land area. Variables include GDP per capita (constant PPP, USD), trade openness, internet penetration, education indicators, cultural exports per capita, and executive constraints from the Polity V dataset.

    The dataset supports a comparative analysis of how economic structure, institutional quality, and infrastructure shape cultural export performance across development contexts. Within-country fixed effects models show that trade openness constrains cultural exports in OECD economies but has no measurable effect in resource-dependent Latin America. In contrast, strong executive constraints benefit cultural industries in advanced economies while constraining them in extraction-oriented systems. The results provide empirical evidence for a two-stage development framework in which colonial extraction legacies create distinct constraints on creative industry growth.

    All variables are harmonized to ISO3 country codes and aligned on a common panel structure. The dataset is fully reproducible using the included Jupyter notebooks (OECD.ipynb, LATAM+OECD.ipynb, cervantes.ipynb).

    Contents:

    • GDPPC.csv — GDP per capita series from the World Bank.

    • explanatory.csv — Trade openness, internet penetration, and education indicators.

    • culture_exports.csv — UNESCO cultural export data.

    • p5v2018.csv — Polity V institutional indicators.

    • Jupyter notebooks for data processing and replication.

    Potential uses: Comparative political economy, cultural economics, institutional development, and resource curse research.

    How to Run This Dataset and Code in Google Colab

    These steps reproduce the OECD vs. Latin America analyses from the paper using the provided CSVs and notebooks.

    1) Open Colab and set up

    1. Go to https://colab.research.google.com

    2. Click File → New notebook.

    3. (Optional) If your files are in Google Drive, mount it:

    python
    CopiarEditar
    from google.colab import drive drive.mount('/content/drive')

    2) Get the data files into Colab

    You have two easy options:

    A. Upload the 4 CSVs + notebooks directly

    • In the left sidebar, click the folder icon → Upload.

    • Upload: GDPPC.csv, explanatory.csv, culture_exports.csv, p5v2018.csv, and any .ipynb you want to run.

    B. Use Google Drive

    • Put those files in a Drive folder.

    • After mounting Drive, refer to them with paths like /content/drive/MyDrive/your_folder/GDPPC.csv.

  15. Data from: Code4ML: a Large-scale Dataset of annotated Machine Learning Code...

    • zenodo.org
    csv
    Updated Sep 15, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Anonymous authors; Anonymous authors (2023). Code4ML: a Large-scale Dataset of annotated Machine Learning Code [Dataset]. http://doi.org/10.5281/zenodo.6607065
    Explore at:
    csvAvailable download formats
    Dataset updated
    Sep 15, 2023
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Anonymous authors; Anonymous authors
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    We present Code4ML: a Large-scale Dataset of annotated Machine Learning Code, a corpus of Python code snippets, competition summaries, and data descriptions from Kaggle.

    The data is organized in a table structure. Code4ML includes several main objects: competitions information, raw code blocks collected form Kaggle and manually marked up snippets. Each table has a .csv format.

    Each competition has the text description and metadata, reflecting competition and used dataset characteristics as well as evaluation metrics (competitions.csv). The corresponding datasets can be loaded using Kaggle API and data sources.

    The code blocks themselves and their metadata are collected to the data frames concerning the publishing year of the initial kernels. The current version of the corpus includes two code blocks files: snippets from kernels up to the 2020 year (сode_blocks_upto_20.csv) and those from the 2021 year (сode_blocks_21.csv) with corresponding metadata. The corpus consists of 2 743 615 ML code blocks collected from 107 524 Jupyter notebooks.

    Marked up code blocks have the following metadata: anonymized id, the format of the used data (for example, table or audio), the id of the semantic type, a flag for the code errors, the estimated relevance to the semantic class (from 1 to 5), the id of the parent notebook, and the name of the competition. The current version of the corpus has ~12 000 labeled snippets (markup_data_20220415.csv).

    As marked up code blocks data contains the numeric id of the code block semantic type, we also provide a mapping from this number to semantic type and subclass (actual_graph_2022-06-01.csv).

    The dataset can help solve various problems, including code synthesis from a prompt in natural language, code autocompletion, and semantic code classification.

  16. o

    Population Distribution Workflow using Census API in Jupyter Notebook:...

    • openicpsr.org
    delimited
    Updated Jul 23, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Cooper Goodman; Nathanael Rosenheim; Wayne Day; Donghwan Gu; Jayasaree Korukonda (2020). Population Distribution Workflow using Census API in Jupyter Notebook: Dynamic Map of Census Tracts in Boone County, KY, 2000 [Dataset]. http://doi.org/10.3886/E120382V1
    Explore at:
    delimitedAvailable download formats
    Dataset updated
    Jul 23, 2020
    Dataset provided by
    Texas A&M University
    Authors
    Cooper Goodman; Nathanael Rosenheim; Wayne Day; Donghwan Gu; Jayasaree Korukonda
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    2000
    Area covered
    Boone County
    Description

    This archive reproduces a figure titled "Figure 3.2 Boone County population distribution" from Wang and vom Hofe (2007, p.60). The archive provides a Jupyter Notebook that uses Python and can be run in Google Colaboratory. The workflow uses the Census API to retrieve data, reproduce the figure, and ensure reproducibility for anyone accessing this archive.The Python code was developed in Google Colaboratory, or Google Colab for short, which is an Integrated Development Environment (IDE) of JupyterLab and streamlines package installation, code collaboration, and management. The Census API is used to obtain population counts from the 2000 Decennial Census (Summary File 1, 100% data). Shapefiles are downloaded from the TIGER/Line FTP Server. All downloaded data are maintained in the notebook's temporary working directory while in use. The data and shapefiles are stored separately with this archive. The final map is also stored as an HTML file.The notebook features extensive explanations, comments, code snippets, and code output. The notebook can be viewed in a PDF format or downloaded and opened in Google Colab. References to external resources are also provided for the various functional components. The notebook features code that performs the following functions:install/import necessary Python packagesdownload the Census Tract shapefile from the TIGER/Line FTP Serverdownload Census data via CensusAPI manipulate Census tabular data merge Census data with TIGER/Line shapefileapply a coordinate reference systemcalculate land area and population densitymap and export the map to HTMLexport the map to ESRI shapefileexport the table to CSVThe notebook can be modified to perform the same operations for any county in the United States by changing the State and County FIPS code parameters for the TIGER/Line shapefile and Census API downloads. The notebook can be adapted for use in other environments (i.e., Jupyter Notebook) as well as reading and writing files to a local or shared drive, or cloud drive (i.e., Google Drive).

  17. Z

    Data, scripts and simulations for ProxyOH-[OH] analysis using ATom data and...

    • data.niaid.nih.gov
    Updated Jul 15, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Colleen Baublitz; Sarah M. Ludwig; Julie M. Nicely; Glenn M. Wolfe (2024). Data, scripts and simulations for ProxyOH-[OH] analysis using ATom data and F0AM and AM3 simulations [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7512700
    Explore at:
    Dataset updated
    Jul 15, 2024
    Dataset provided by
    Columbia University, New York, NY, USA, Lamont-Doherty Earth Observatory, Palisades, NY, USA
    Earth System Science Interdisciplinary Center, University of Maryland, College Park, MD, USA, National Aeronautics and Space Administration Goddard Space Flight Center, Greenbelt, MD, USA
    Columbia University, New York, NY, US Environmental Protection Agency, Durham, NC
    National Aeronautics and Space Administration Goddard Space Flight Center, Greenbelt, MD, USA
    Authors
    Colleen Baublitz; Sarah M. Ludwig; Julie M. Nicely; Glenn M. Wolfe
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset provides the simulations and analysis code used in Baublitz et al., An observation-based, reduced-form model for oxidation in the remote marine troposphere, Proceedings of the National Academy of Sciences, 120.

    Code, package versions

    The code is generally written in Python and saved to a Jupyter Notebook (.ipynb) format, except for the component developing the Bayesian regressions, which is written in R. For improved accessibility, the code has also been printed to PDF format so that it may be readable without requiring access to Jupyter. The code used to create the main text figures is specified in the file names. When the primary focus of a script is to create supplemental figures, the figure names have also been specified in the script file name. The code for creating other supplemental figures is also available in the script corresponding to the section where that figure is referenced.

    The following packages and package versions were used to develop this analysis:

    Python (v3.10.0)

    anaconda 4.13.0

    collections (native to anaconda installation)

    datetime

    os

    random

    jupyter 1.0.0, jupyter-core 4.9.1

    matplotlib (visualization) 3.5.1

    notebook 6.4.6

    numpy 1.21.4

    pandas 1.3.4

    scipy 1.7.3

    seaborn (figure formatting) 0.11.2

    The full environment is specified in the YAML file "atom_env.yml." Anaconda users (not tested, potentially restricted to Windows) may load this environment with this file and the following command:

    $ conda env create -f atom_env.yml

    R (v4.1.2, includes package parallel)

    tidyverse 1.3.1

    rjags 4-12

    runjags 2.2.0-3

    lattice 0.20-45

    lme4 1.1-31

    loo 2.4.1

    ggpubr 0.4.0

    matrixStats 0.61.0

    Zipped directory contents

    The full set of global, hourly AM3 model simulations developed for this project are included in this repository (AM3_hourly_simulations_global_ATom1-4.zip) for reference and potential future application, though they are not used in the code. They are described here (vs listed) and span the dates for each campaign leg and are broken into four variable categories, concentrations and met fields ('stp_conc_v2'), individual reaction rates ('ind_rate'), integrated reaction rates ('all_rate') and deposition velocities or photolysis rates ('dep_jval'). Some of these files include all days in the range, while others include only the days that the campaign took measurements.

    In addition, a subset of the AM3 simulations that specifically include variables used in the manuscript analysis that have been sampled along the ATom flight is included, along with the 10 s ATom merge data (AM3_model_simulations_sampled.zip). This is the file that should be downloaded for reproducing the manuscript in the analysis.

    AM3_model_simulations_sampled.zip

    atom1_10s_ss_030122.csv

    atom2_10s_ss_030122.csv

    atom3_10s_ss_030122.csv

    atom4_10s_ss_030122.csv

    bayes_data.zip

    bayes_atom_10s_model_122022.csv

    bayes_ats_10s_remNOlsth2sigma_highlogNO_emulate_122022.csv

    bayes_ats_10s_remNOlsth2sigma_highlogNO_emulate_allPOH_030723.csv

    base/

    .Rhistory

    atom_jags_010723.R

    atom_lmer_model_122122.R

    atom_sens_030723.R

    dat1_bins.csv

    dat1_OH.csv

    gelman_list_base.csv

    levels.csv

    log_pd.csv

    model_b0.csv

    model_b1.csv

    p.fit.csv

    p.mu.csv

    p.sd.csv

    r_prx_ytrue.pkl

    rjmt_B0.csv

    rjmt_B1.csv

    rjmt_proxy.csv

    rjmt_y_true.csv

    CH4_CO_HCHO_MHP/

    .Rhistory

    atom_altCH4_CO_HCHO_MHP_031423.R

    atom_jags_altCH4_CO_HCHO_MHP_031423.R

    dat1_bins.csv

    dat1_OH.csv

    dat1_proxy_ch4_co_hcho_mhp.csv

    levels.csv

    log_pd.csv

    p.fit.csv

    p.mu.csv

    p.sd.csv

    r_prx_ytrue.pkl

    rjmt_B0.csv

    rjmt_B1.csv

    rjmt_proxy_ch4_co_hcho_mhp.csv

    rjmt_y_true.csv

    CO_HCHO/

    .Rhistory

    atom_altCO_HCHO_031423.R

    atom_jags_altCO_HCHO_031423.R

    dat1_bins.csv

    dat1_OH.csv

    dat1_proxy_CO_HCHO.csv

    levels.csv

    r_prx_ytrue.pkl

    rjmt_B0.csv

    rjmt_B1.csv

    rjmt_proxy_co_hcho.csv

    rjmt_y_true.csv

    CO_HCHO_MHP/

    atom_altCO_HCHO_MHP_031423.R

    atom_jags_altCO_HCHO_MHP_031423.R

    dat1_bins.csv

    dat1_OH.csv

    dat1_proxy_CO_HCHO_MHP.csv

    levels.csv

    r_prx_ytrue.pkl

    rjmt_B0.csv

    rjmt_B1.csv

    rjmt_proxy_CO_HCHO_MHP.csv

    rjmt_y_true.csv

    H2O2_O3_CH4_CO_HCHO_MHP/

    .Rhistory

    atom_altH2O2_O2_CH4_CO_HCHO_MHP_122122.R

    atom_jags_altH2O2_O3_CH4_CO_HCHO_MHP_030723.R

    dat1_bins.csv

    dat1_OH.csv

    dat1_proxy_h2o2_o3_ch4_co_hcho_mhp.csv

    levels.csv

    r_prx_ytrue.pkl

    rjmt_B0.csv

    rjmt_B1.csv

    rjmt_proxy_h2o2_o3_ch4_co_hcho_mhp.csv

    rjmt_y_true.csv

    HCHO/

    atom_altHCHO_031423.R

    atom_jags_altHCHO_031423.R

    dat1_bins.csv

    dat1_OH.csv

    dat1_proxy_HCHO.csv

    levels.csv

    r_prx_ytrue.pkl

    rjmt_B0.csv

    rjmt_B1.csv

    rjmt_proxy_HCHO.csv

    rjmt_y_true.csv

    MHP/

    atom_altMHP_122122.R

    atom_jags_altMHP_122122.R

    dat1_bins.csv

    dat1_OH.csv

    dat1_proxy_MHP.csv

    levels.csv

    r_prx_ytrue.pkl

    rjmt_B0.csv

    rjmt_B1.csv

    rjmt_proxy_MHP.csv

    rjmt_y_true.csv

    F0AMv3.2.zip

    mean_ratio_OH_loss_bins_oce.npy

    mean_ratio_OH_prod_bins_oce.npy

    mean_ratio_OH_prod_loss_bins_oce.npy

    Data/

    atom1/

    atom1_output_alt.cs

    atom1_output_CO.csv

    atom1_output_H2O.csv

    atom1_output_lat.csv

    atom1_output_lon.csv

    atom1_output_lossOH_ppt_lump15.csv

    atom1_output_M.csv

    atom1_output_NO.csv

    atom1_output_OH.csv

    atom1_output_prodOH_ppt_lump15.csv

    atom1_output_startTime.csv

    atom1_output_sza.csv

    atom2/

    atom2_output_alt.csv

    atom2_output_CO.csv

    atom2_output_H2O.csv

    atom2_output_lat.csv

    atom2_output_lon.csv

    atom2_output_lossOH_ppt_lump15.csv

    atom2_output_M.csv

    atom2_output_NO.csv

    atom2_output_OH.csv

    atom2_output_prodOH_ppt_lump15.csv

    atom2_output_startTime.csv

    atom2_output_sza.csv

    atom3/

    atom3_output_alt.csv

    atom3_output_CO.csv

    atom3_output_H2O.csv

    atom3_output_lat.csv

    atom3_output_lon.csv

    atom3_output_lossOH_ppt_lump15.csv

    atom3_output_M.csv

    atom3_output_NO.csv

    atom3_output_OH.csv

    atom3_output_prodOH_ppt_lump15.csv

    atom3_output_startTime.csv

    atom3_output_sza.csv

    atom4/

    atom4_output_alt.cs

    atom4_output_CO.csv

    atom4_output_H2O.csv

    atom4_output_lat.cs

    atom4_output_lon.cs

    atom4_output_lossOH_ppt_lump15.csv

    atom4_output_M.csv

    atom4_output_NO.csv

    atom4_output_OH.csv

    atom4_output_prodOH_ppt_lump15.csv

    atom4_output_startTime.csv

    atom4_output_sza.csv

    For any further questions on the model simulations or code included here, please contact the corresponding author (Colleen Baublitz, cbb2158@columbia.edu).

  18. FakeCovid Fact-Checked News Dataset

    • kaggle.com
    zip
    Updated Feb 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Devastator (2023). FakeCovid Fact-Checked News Dataset [Dataset]. https://www.kaggle.com/datasets/thedevastator/fakecovid-fact-checked-news-dataset
    Explore at:
    zip(19911252 bytes)Available download formats
    Dataset updated
    Feb 1, 2023
    Authors
    The Devastator
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    FakeCovid Fact-Checked News Dataset

    International Coverage of COVID-19 in 40 Languages from 105 Countries

    By [source]

    About this dataset

    The FakeCovid dataset is an unparalleled compilation of 7623 fact-checked news articles related to COVID-19. Obtained from 92 fact-checking websites located in 105 countries, this comprehensive collection covers a wide range of sources and languages, including locations across Africa, Europe, Asia, The Americas and Oceania. With data gathered from references on Poynter and Snopes, this unique dataset is an invaluable resource for researching the accuracy of global news related to the pandemic. It offers an invaluable insight into the international nature of COVID information with its column headers covering country's involved; categories such as coronavirus health updates or political interference during coronavirus; URLs for referenced articles; verifiers employed by websites; article classes that can range from true to false or even mixed evaluations; publication dates ; article sources injected with credibility verification as well as article text and language standardization. This one-of-a kind dataset serves as an essential tool in understanding both global information flow around the world concerning COVID 19 while simultaneously offering transparency into whose interests guide it

    More Datasets

    For more datasets, click here.

    Featured Notebooks

    • 🚨 Your notebook can be here! 🚨!

    How to use the dataset

    The FakeCovid dataset is a multilingual cross-domain collection of 7623 fact-checked news articles related to COVID-19. It is collected from 92 fact-checking websites and covers a wide range of sources and countries, including locations in Africa, Asia, Europe, The Americas, and Oceania. This dataset can be used for research related to understanding the truth and accuracy of news sources related to COVID-19 in different countries and languages.

    To use this dataset effectively, you will need basic knowledge of data science principles such as data manipulation with pandas or Python libraries such as NumPy or ScikitLearn. The data is in CSV (comma separated values) format that can be read by most spreadsheet applications or text editor like Notepad++. Here are some steps on how to get started: - Access the FakeCovid Fact Checked News Dataset from Kaggle: https://www.kaggle.com/c/fakecovidfactcheckednewsdataset/data - Download the provided CSV file containing all fact checked news articles and place it into your desired folder location - Load the CSV file into your preferred software application like Jupyter Notebook or RStudio 4)Explore your dataset using built-in functions within data science libraries such as Pandas & matplotlib – find meaningful information through statistical analysis &//or create visualizations 5)Modify parameters within the csv file if required & save 6)Share your creative projects through Gitter chatroom #fakecovidauthors 7 )Publish any interesting discoveries you find within open source repositories like GitHub 8 )Engage with our Hangouts group #FakeCoviDFactCheckersClub 9 )Show off fun graphics via Twitter hashtag #FakeCovidiauthors 10 )Reach out if you have further questions via email contactfakecovidadatateam 11 )Stay connected by joining our mailing list#FakeCoviDAuthorsGroup

    We hope this guide helps you better understand how to use our FakeCoviD Fact Checked News Dataset for generating meaningful insights relating to COVID-19 news articles worldwide!

    Research Ideas

    • Developing an automated algorithm to detect fake news related to COVID-19 by leveraging the fact-checking flags and other results included in this dataset for machine learning and natural language processing tasks.
    • Training a sentiment analysis model on the data to categorize articles according to their sentiments which can be used for further investigations into why certain news topics or countries have certain outcomes, motivations, or behaviors due to their content relatedness or author biasness(if any).
    • Using unsupervised clustering techniques, this dataset could be used as a tool for identifying any discrepancies between news circulated in different populations in different countries (langauge and regions) so that publicists can focus more on providing factual information rather than spreading false rumors or misinformation about the pandemic

    Acknowledgements

    If you use this dataset in your research, please credit the original authors. Data Source

    License

    **License: [CC0 1.0 Universal (CC0 1.0) - Public Do...

  19. Speedtest Open Data - Australia(NZ) 2020-2025; Q220 - Q325 extract by Qtr

    • figshare.com
    txt
    Updated Oct 24, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Richard Ferrers; Speedtest Global Index (2025). Speedtest Open Data - Australia(NZ) 2020-2025; Q220 - Q325 extract by Qtr [Dataset]. http://doi.org/10.6084/m9.figshare.13370504.v43
    Explore at:
    txtAvailable download formats
    Dataset updated
    Oct 24, 2025
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Richard Ferrers; Speedtest Global Index
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    New Zealand, Australia
    Description

    This is an Australian extract of Speedtest Open data available at Amazon WS (link below - opendata.aws).AWS data licence is "CC BY-NC-SA 4.0", so use of this data must be:- non-commercial (NC)- reuse must be share-alike (SA)(add same licence).This restricts the standard CC-BY Figshare licence.A world speedtest open data was dowloaded (>400Mb, 7M lines of data). An extract of Australia's location (lat, long) revealed 88,000 lines of data (attached as csv).A Jupyter notebook of extract process is attached.See Binder version at Github - https://github.com/areff2000/speedtestAU.+> Install: 173 packages | Downgrade: 1 packages | Total download: 432MBBuild container time: approx - load time 25secs.=> Error: Timesout - BUT UNABLE TO LOAD GLOBAL DATA FILE (6.6M lines).=> Error: Overflows 8GB RAM container provided with global data file (3GB)=> On local JupyterLab M2 MBP; loads in 6 mins.Added Binder from ARDC service: https://binderhub.rc.nectar.org.auDocs: https://ardc.edu.au/resource/fair-for-jupyter-notebooks-a-practical-guide/A link to Twitter thread of outputs provided.A link to Data tutorial provided (GitHub), including Jupyter Notebook to analyse World Speedtest data, selecting one US State.Data Shows: (Q220)- 3.1M speedtests | 762,000 devices |- 88,000 grid locations (600m * 600m), summarised as a point- average speed 33.7Mbps (down), 12.4M (up) | Max speed 724Mbps- data is for 600m * 600m grids, showing average speed up/down, number of tests, and number of users (IP). Added centroid, and now lat/long.See tweet of image of centroids also attached.NB: Discrepancy Q2-21, Speedtest Global shows June AU average speedtest at 80Mbps, whereas Q2 mean is 52Mbps (v17; Q1 45Mbps; v14). Dec 20 Speedtest Global has AU at 59Mbps. Could be possible timing difference. Or spatial anonymising masking shaping highest speeds. Else potentially data inconsistent between national average and geospatial detail. Check in upcoming quarters.NextSteps:Histogram - compare Q220, Q121, Q122. per v1.4.ipynb.Versions:v43. Added revised NZ vs AUS graph for Q325 (NZ; Q2 25) since had NZ available from Github (link below). Calc using PlayNZ.ipynb notebook. See images in Twitter - https://x.com/ValueMgmt/status/1981607615496122814v42: Added AUS Q325 (97.6k lines avg d/l 165.5 Mbps (median d/l 150.8 Mbps) u/l 28.08 Mbps). Imported using v2 Jupyter notebook (MBP 16Gb). Mean tests: 24.5. Mean devices: 6.02. Download, extract and publish: UNK - not measured mins. Download avg is double Q423. Noting, NBN increased D/L speeds from Sept '25; 100 -> 500, 250 -> 750. For 1Gbps, upload speed only increased from 50Mbps to 100Mbps. New 2Gbps services introduced on FTTP and HFC networks.v41: Added AUS Q225 (96k lines avg d/l 130.5 Mbps (median d/l 108.4 Mbps) u/l 22.45 Mbps). Imported using v2 Jupyter notebook (MBP 16Gb). Mean tests: 17.2. Mean devices: 5.11. Download, extract and publish: 20 mins. Download avg is double Q422.v40: Added AUS Q125 (93k lines avg d/l 116.6 Mbps u/l 21.35 Mbps). Imported using v2 Jupyter notebook (MBP 16Gb). Mean tests: 16.9. Mean devices: 5.13. Download, extract and publish: 14 mins.v39: Added AUS Q424 (95k lines avg d/l 110.9 Mbps u/l 21.02 Mbps). Imported using v2 Jupyter notebook (MBP 16Gb). Mean tests: 17.2. Mean devices: 5.24. Download, extract and publish: 14 mins.v38: Added AUS Q324 (92k lines avg d/l 107.0 Mbps u/l 20.79 Mbps). Imported using v2 Jupyter notebook (iMac 32Gb). Mean tests: 17.7. Mean devices: 5.33.Added github speedtest-workflow-importv2vis.ipynb Jupyter added datavis code to colour code national map. (per Binder on Github; link below).v37: Added AUS Q224 (91k lines avg d/l 97.40 Mbps u/l 19.88 Mbps). Imported using speedtest-workflow-importv2 jupyter notebook. Mean tests:18.1. Mean devices: 5.4.v36 Load UK data, Q1-23 and compare to AUS and NZ Q123 data. Add compare image (au-nz-ukQ123.png), calc PlayNZUK.ipynb, data load import-UK.ipynb. UK data bit rough and ready as uses rectangle to mark out UK, but includes some EIRE and FR. Indicative only and to be definitively needs geo-clean to exclude neighbouring countries.v35 Load Melb geo-maps of speed quartiles (0-25, 25-50, 50-75, 75-100, 100-). Avg in 2020; 41Mbps. Avg in 2023; 86Mbps. MelbQ323.png, MelbQ320.png. Calc with Speedtest-incHist.ipynb code. Needed to install conda mapclassify. ax=melb.plot(column=...dict(bins[25,50,75,100]))v34 Added AUS Q124 (93k lines avg d/l 87.00 Mbps u/l 18.86 Mbps). Imported using speedtest-workflow-importv2 jupyter notebook. Mean tests:18.3. Mean devices: 5.5.v33 Added AUS Q423 (92k lines avg d/l 82.62 Mbps). Imported using speedtest-workflow-importv2 jupyter notebook. Mean tests:18.0. Mean devices: 5.6. Added link to Github.v32 Recalc Au vs NZ for upload performance; added image. using PlayNZ Jupyter. NZ approx 40% locations at or above 100Mbps. Aus

  20. Z

    Performance results of different scheduling algorithms used in the...

    • data.niaid.nih.gov
    Updated May 9, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mustapha Regragui; Baptiste Coye; Laércio Lima Pilla; Raymond Namyst; Denis Barthou (2022). Performance results of different scheduling algorithms used in the simulation of a modern game engine [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_6532251
    Explore at:
    Dataset updated
    May 9, 2022
    Dataset provided by
    CNRS
    Bordeaux INP
    Ubisoft
    University of Bordeaux
    Authors
    Mustapha Regragui; Baptiste Coye; Laércio Lima Pilla; Raymond Namyst; Denis Barthou
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Performance results of different scheduling algorithms used in the simulation of a modern game engine

    These results are a companion to the paper entitled "Exploring scheduling algorithms for parallel task graphs: a modern game engine case study" by M. Regragui et al.

    General information

    This dataset contains raw outputs and scripts to visualize and analyze the scheduling results from our game engine simulator. The result analysis can be directly reproduced using the script run_analysis.sh. A series of Jupyter Notebook files are also available to help visualize the results.

    File information

    • All Scenario*.ipynb files contain python scripts to visualize and analyze the simulation results.
    • The Scenario*.py files contain python scripts that can be run directly with Jupyter Notebook.
    • The requirements.txt file contains the names and versions of python packages necessary to reproduce the analysis.
    • The run_analysis.sh file contains a bash script to install the required python packages and run the Scenario*.py scripts.

    The results are organized in five folders:

    1. Result_1 contains the results for Scenario 1 generated using file input_scenario_1.txt.
    2. Result_2 contains the results for Scenario 2 generated using file input_scenario_2.txt.
    3. Result_3 contains the results for Scenario 3 generated using file input_scenario_3.txt.
    4. Result_CP_1 contains the results for the critical path of Scenarios 1 and 2 generated using file input_CP_scenario_1.txt.
    5. Result_CP_3 contains the results for the critical path of Scenario 3 generated using file input_CP_scenario_3.txt.

    Each result file (e.g., HLF_NonSorted_Random_1_200_10.txt) contains 200 lines representing information of the 200 frames that were simulated. Each line contains four values: the frame number, the duration of the frame (in microseconds), a critical path estimation for the previous frame (in microseconds), and the load parameter (value between 0 and 1).

    The outputs of this analysis include some PDF files representing the figures in the paper (in order) and some CSV files representing the values shown in tables. The standard output shows the p-values computed in parts of the statistical analysis.

    Software and hardware information

    The simulation results were generated on an Intel Core i7-1185G7 processor, with 32 GB of LPDDR4 RAM (3200 MHz). The machine ran on Ubuntu 20.04.3 LTS (5.14.0-1034-oem), and g++ 9.4.0 was used for the simulator's compilation (-O3 flag).

    The results were analyzed using Python 3.8.10, pip 20.0.2 and jupyter-notebook 6.0.3. The following packages and their respective versions were used:

    • pandas 1.3.2
    • numpy 1.21.2
    • matplotlib 3.4.3
    • seaborn 0.11.2
    • scipy 1.7.1
    • pytz 2019.3
    • python-dateutil 2.7.3
    • kiwisolver 1.3.2
    • pyparsing 2.4.7
    • cycler 0.10.0
    • Pillow 7.0.0
    • six 1.14.0

    Simulation information

    Simulation results were generated from 4 to 20 resources. Each configuration was run with 50 different RNG seeds (1 up to 50).

    Each simulation is composed of 200 frames. The load parameter (lag) starts at zero and increases by 0.01 with each frame up to a value equal to 100% in frame 101. After that, the load parameter starts to decrease in the same rhythm down to 0.01 in frame 200.

    Algorithms abbreviation in presentation order

    FIFO serves as the baseline for comparisons.

    1. FIFO: First In First Out.
    2. LPT: Longest Processing Time First.
    3. SPT: Shortest Processing Time First.
    4. SLPT: LPT at a subtask level.
    5. SSPT: SPT at a subtask level.
    6. HRRN: Highest Response Ratio Next.
    7. WT: Longest Waiting Time First.
    8. HLF: Hu's Level First with unitary processing time of each task.
    9. HLFET: HLF with estimated times.
    10. CG: Coffman-Graham's Algorithm.
    11. DCP: Dynamic Critical Path Priority.

    Metrics

    • SF: slowest frame (maximum frame execution time)
    • DF: number of delayed frames (with 16.667 ms as the due date)
    • CS: cumulative slowdown (with 16.667 ms as the due date)
Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Richard Ferrers; Speedtest Global Index (2023). Speedtest Open Data - Four International cities - MEL, BKK, SHG, LAX plus ALC - 2020, 2022 [Dataset]. http://doi.org/10.6084/m9.figshare.13621169.v24
Organization logo

Speedtest Open Data - Four International cities - MEL, BKK, SHG, LAX plus ALC - 2020, 2022

Explore at:
txtAvailable download formats
Dataset updated
May 30, 2023
Dataset provided by
Figsharehttp://figshare.com/
Authors
Richard Ferrers; Speedtest Global Index
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

This dataset compares four cities FIXED-line broadband internet speeds: - Melbourne, AU - Bangkok, TH - Shanghai, CN - Los Angeles, US - Alice Springs, AU

ERRATA: 1.Data is for Q3 2020, but some files are labelled incorrectly as 02-20 of June 20. They all should read Sept 20, or 09-20 as Q3 20, rather than Q2. Will rename and reload. Amended in v7.

  1. LAX file named 0320, when should be Q320. Amended in v8.

*lines of data for each geojson file; a line equates to a 600m^2 location, inc total tests, devices used, and average upload and download speed - MEL 16181 locations/lines => 0.85M speedtests (16.7 tests per 100people) - SHG 31745 lines => 0.65M speedtests (2.5/100pp) - BKK 29296 lines => 1.5M speedtests (14.3/100pp) - LAX 15899 lines => 1.3M speedtests (10.4/100pp) - ALC 76 lines => 500 speedtests (2/100pp)

Geojsons of these 2* by 2* extracts for MEL, BKK, SHG now added, and LAX added v6. Alice Springs added v15.

This dataset unpacks, geospatially, data summaries provided in Speedtest Global Index (linked below). See Jupyter Notebook (*.ipynb) to interrogate geo data. See link to install Jupyter.

** To Do Will add Google Map versions so everyone can see without installing Jupyter. - Link to Google Map (BKK) added below. Key:Green > 100Mbps(Superfast). Black > 500Mbps (Ultrafast). CSV provided. Code in Speedtestv1.1.ipynb Jupyter Notebook. - Community (Whirlpool) surprised [Link: https://whrl.pl/RgAPTl] that Melb has 20% at or above 100Mbps. Suggest plot Top 20% on map for community. Google Map link - now added (and tweet).

** Python melb = au_tiles.cx[144:146 , -39:-37] #Lat/Lon extract shg = tiles.cx[120:122 , 30:32] #Lat/Lon extract bkk = tiles.cx[100:102 , 13:15] #Lat/Lon extract lax = tiles.cx[-118:-120, 33:35] #lat/Lon extract ALC=tiles.cx[132:134, -22:-24] #Lat/Lon extract

Histograms (v9), and data visualisations (v3,5,9,11) will be provided. Data Sourced from - This is an extract of Speedtest Open data available at Amazon WS (link below - opendata.aws).

**VERSIONS v.24 Add tweet and google map of Top 20% (over 100Mbps locations) in Mel Q322. Add v.1.5 MEL-Superfast notebook, and CSV of results (now on Google Map; link below). v23. Add graph of 2022 Broadband distribution, and compare 2020 - 2022. Updated v1.4 Jupyter notebook. v22. Add Import ipynb; workflow-import-4cities. v21. Add Q3 2022 data; five cities inc ALC. Geojson files. (2020; 4.3M tests 2022; 2.9M tests)

Melb 14784 lines Avg download speed 69.4M Tests 0.39M

SHG 31207 lines Avg 233.7M Tests 0.56M

ALC 113 lines Avg 51.5M Test 1092

BKK 29684 lines Avg 215.9M Tests 1.2M

LAX 15505 lines Avg 218.5M Tests 0.74M

v20. Speedtest - Five Cities inc ALC. v19. Add ALC2.ipynb. v18. Add ALC line graph. v17. Added ipynb for ALC. Added ALC to title.v16. Load Alice Springs Data Q221 - csv. Added Google Map link of ALC. v15. Load Melb Q1 2021 data - csv. V14. Added Melb Q1 2021 data - geojson. v13. Added Twitter link to pics. v12 Add Line-Compare pic (fastest 1000 locations) inc Jupyter (nbn-intl-v1.2.ipynb). v11 Add Line-Compare pic, plotting Four Cities on a graph. v10 Add Four Histograms in one pic. v9 Add Histogram for Four Cities. Add NBN-Intl.v1.1.ipynb (Jupyter Notebook). v8 Renamed LAX file to Q3, rather than 03. v7 Amended file names of BKK files to correctly label as Q3, not Q2 or 06. v6 Added LAX file. v5 Add screenshot of BKK Google Map. v4 Add BKK Google map(link below), and BKK csv mapping files. v3 replaced MEL map with big key version. Prev key was very tiny in top right corner. v2 Uploaded MEL, SHG, BKK data and Jupyter Notebook v1 Metadata record

** LICENCE AWS data licence on Speedtest data is "CC BY-NC-SA 4.0", so use of this data must be: - non-commercial (NC) - reuse must be share-alike (SA)(add same licence). This restricts the standard CC-BY Figshare licence.

** Other uses of Speedtest Open Data; - see link at Speedtest below.

Search
Clear search
Close search
Google apps
Main menu