58 datasets found

Speedtest Open Data - Four International cities - MEL, BKK, SHG, LAX plus...
figshare.com
txt
Updated May 30, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Richard Ferrers; Speedtest Global Index (2023). Speedtest Open Data - Four International cities - MEL, BKK, SHG, LAX plus ALC - 2020, 2022 [Dataset]. http://doi.org/10.6084/m9.figshare.13621169.v24
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.13621169.v24
Dataset updated
May 30, 2023
Dataset provided by
Figsharehttp://figshare.com/
Authors
Richard Ferrers; Speedtest Global Index
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset compares four cities FIXED-line broadband internet speeds: - Melbourne, AU - Bangkok, TH - Shanghai, CN - Los Angeles, US - Alice Springs, AU

ERRATA: 1.Data is for Q3 2020, but some files are labelled incorrectly as 02-20 of June 20. They all should read Sept 20, or 09-20 as Q3 20, rather than Q2. Will rename and reload. Amended in v7.

LAX file named 0320, when should be Q320. Amended in v8.

*lines of data for each geojson file; a line equates to a 600m^2 location, inc total tests, devices used, and average upload and download speed - MEL 16181 locations/lines => 0.85M speedtests (16.7 tests per 100people) - SHG 31745 lines => 0.65M speedtests (2.5/100pp) - BKK 29296 lines => 1.5M speedtests (14.3/100pp) - LAX 15899 lines => 1.3M speedtests (10.4/100pp) - ALC 76 lines => 500 speedtests (2/100pp)

Geojsons of these 2* by 2* extracts for MEL, BKK, SHG now added, and LAX added v6. Alice Springs added v15.

This dataset unpacks, geospatially, data summaries provided in Speedtest Global Index (linked below). See Jupyter Notebook (*.ipynb) to interrogate geo data. See link to install Jupyter.

** To Do Will add Google Map versions so everyone can see without installing Jupyter. - Link to Google Map (BKK) added below. Key:Green > 100Mbps(Superfast). Black > 500Mbps (Ultrafast). CSV provided. Code in Speedtestv1.1.ipynb Jupyter Notebook. - Community (Whirlpool) surprised [Link: https://whrl.pl/RgAPTl] that Melb has 20% at or above 100Mbps. Suggest plot Top 20% on map for community. Google Map link - now added (and tweet).

** Python melb = au_tiles.cx[144:146 , -39:-37] #Lat/Lon extract shg = tiles.cx[120:122 , 30:32] #Lat/Lon extract bkk = tiles.cx[100:102 , 13:15] #Lat/Lon extract lax = tiles.cx[-118:-120, 33:35] #lat/Lon extract ALC=tiles.cx[132:134, -22:-24] #Lat/Lon extract

Histograms (v9), and data visualisations (v3,5,9,11) will be provided. Data Sourced from - This is an extract of Speedtest Open data available at Amazon WS (link below - opendata.aws).

**VERSIONS v.24 Add tweet and google map of Top 20% (over 100Mbps locations) in Mel Q322. Add v.1.5 MEL-Superfast notebook, and CSV of results (now on Google Map; link below). v23. Add graph of 2022 Broadband distribution, and compare 2020 - 2022. Updated v1.4 Jupyter notebook. v22. Add Import ipynb; workflow-import-4cities. v21. Add Q3 2022 data; five cities inc ALC. Geojson files. (2020; 4.3M tests 2022; 2.9M tests)

Melb 14784 lines Avg download speed 69.4M Tests 0.39M

SHG 31207 lines Avg 233.7M Tests 0.56M

ALC 113 lines Avg 51.5M Test 1092

BKK 29684 lines Avg 215.9M Tests 1.2M

LAX 15505 lines Avg 218.5M Tests 0.74M

v20. Speedtest - Five Cities inc ALC. v19. Add ALC2.ipynb. v18. Add ALC line graph. v17. Added ipynb for ALC. Added ALC to title.v16. Load Alice Springs Data Q221 - csv. Added Google Map link of ALC. v15. Load Melb Q1 2021 data - csv. V14. Added Melb Q1 2021 data - geojson. v13. Added Twitter link to pics. v12 Add Line-Compare pic (fastest 1000 locations) inc Jupyter (nbn-intl-v1.2.ipynb). v11 Add Line-Compare pic, plotting Four Cities on a graph. v10 Add Four Histograms in one pic. v9 Add Histogram for Four Cities. Add NBN-Intl.v1.1.ipynb (Jupyter Notebook). v8 Renamed LAX file to Q3, rather than 03. v7 Amended file names of BKK files to correctly label as Q3, not Q2 or 06. v6 Added LAX file. v5 Add screenshot of BKK Google Map. v4 Add BKK Google map(link below), and BKK csv mapping files. v3 replaced MEL map with big key version. Prev key was very tiny in top right corner. v2 Uploaded MEL, SHG, BKK data and Jupyter Notebook v1 Metadata record

** LICENCE AWS data licence on Speedtest data is "CC BY-NC-SA 4.0", so use of this data must be: - non-commercial (NC) - reuse must be share-alike (SA)(add same licence). This restricts the standard CC-BY Figshare licence.

** Other uses of Speedtest Open Data; - see link at Speedtest below.
H
JavaScript code for retrieval of MODIS Collection 6 NDSI snow cover at...
beta.hydroshare.org
hydroshare.org
+1more
zip
Updated Feb 11, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Irene Garousi-Nejad; David Tarboton (2022). JavaScript code for retrieval of MODIS Collection 6 NDSI snow cover at SNOTEL sites and a Jupyter Notebook to merge/reprocess data [Dataset]. http://doi.org/10.4211/hs.d287f010b2dd48edb0573415a56d47f8
Explore at:
zip(52.2 KB)Available download formats
Unique identifier
https://doi.org/10.4211/hs.d287f010b2dd48edb0573415a56d47f8
Dataset updated
Feb 11, 2022
Dataset provided by
HydroShare
Authors
Irene Garousi-Nejad; David Tarboton
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered

Description
This JavaScript code has been developed to retrieve NDSI_Snow_Cover from MODIS version 6 for SNOTEL sites using the Google Earth Engine platform. To successfully run the code, you should have a Google Earth Engine account. An input file, called NWM_grid_Western_US_polygons_SNOTEL_ID.zip, is required to run the code. This input file includes 1 km grid cells of the NWM containing SNOTEL sites. You need to upload this input file to the Assets tap in the Google Earth Engine code editor. You also need to import the MOD10A1.006 Terra Snow Cover Daily Global 500m collection to the Google Earth Engine code editor. You may do this by searching for the product name in the search bar of the code editor.

The JavaScript works for s specified time range. We found that the best period is a month, which is the maximum allowable time range to do the computation for all SNOTEL sites on Google Earth Engine. The script consists of two main loops. The first loop retrieves data for the first day of a month up to day 28 through five periods. The second loop retrieves data from day 28 to the beginning of the next month. The results will be shown as graphs on the right-hand side of the Google Earth Engine code editor under the Console tap. To save results as CSV files, open each time-series by clicking on the button located at each graph's top right corner. From the new web page, you can click on the Download CSV button on top.

Here is the link to the script path: https://code.earthengine.google.com/?scriptPath=users%2Figarousi%2Fppr2-modis%3AMODIS-monthly

Then, run the Jupyter Notebook (merge_downloaded_csv_files.ipynb) to merge the downloaded CSV files that are stored for example in a folder called output/from_GEE into one single CSV file which is merged.csv. The Jupyter Notebook then applies some preprocessing steps and the final output is NDSI_FSCA_MODIS_C6.csv.
Z
Types, open citations, closed citations, publishers, and participation...
data.niaid.nih.gov
data-staging.niaid.nih.gov
+1more
Updated Jan 24, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Hiebi, Ivan; Peroni, Silvio; Shotton, David (2020). Types, open citations, closed citations, publishers, and participation reports of Crossref entities [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_2558257
Explore at:
Dataset updated
Jan 24, 2020
Dataset provided by
Oxford e-Research Centre, University of Oxford, Oxford, United Kingdom
Digital Humanities Advanced Research Centre, Department of Computer Science and Engineering, University of Bologna, Bologna, Italy
Authors
Hiebi, Ivan; Peroni, Silvio; Shotton, David
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
This publication contains several datasets that have been used in the paper "Crowdsourcing open citations with CROCI – An analysis of the current status of open citations, and a proposal" submitted to the 17th International Conference on Scientometrics and Bibliometrics (ISSI 2019), available at https://opencitations.wordpress.com/2019/02/07/crowdsourcing-open-citations-with-croci/.

Additional information about the analyses described in the paper, including the code and the data we have used to compute all the figures, is available as a Jupyter notebook at https://github.com/sosgang/pushing-open-citations-issi2019/blob/master/script/croci_nb.ipynb. The datasets contain the following information.

non_open.zip: it is a zipped (~5 GB unzipped) CSV file containing the numbers of open citations and closed citations received by the entities in the Crossref dump used in our computation, dated October 2018. All the entity types retrieved from Crossref were aligned to one of following five categories: journal, book, proceedings, dataset, other. The open CC0 citation data we used came from the CSV dump of most recent release of COCI dated 12 November 2018. The number of closed citations was calculated by subtracting the number of open citations to each entity available within COCI from the value “is-referenced-by-count” available in the Crossref metadata for that particular cited entity, which reports all the DOI-to-DOI citation links that point to the cited entity from within the whole Crossref database (including those present in the Crossref ‘closed’ dataset).

The columns of the CSV file are the following ones:

doi: the DOI of the publication in Crossref;

type: the type of the publication as indicated in Crossref;

cited_by: the number of open citations received by the publication according to COCI;

non_open: the number of closed citations received by the publication according to Crossref + COCI.

croci_types.csv: it is a CSV file that contains the numbers of open citations and closed citations received by the entities in the Crossref dump used in our computation, as collected in the previous CSV file, alligned in five classes depening on the entity types retrieved from Crossref: journal (Crossref types: journal-article, journal-issue, journal-volume, journal), book (Crossref types: book, book-chapter, book-section, monograph, book track, book-part, book-set, reference-book, dissertation, book series, edited book), proceedings (Crossref types: proceedings-article, proceedings, proceedings-series), dataset (Crossref types: dataset), other (Crossref types: other, report, peer review, reference-entry, component, report-series, standard, posted-content, standard-series).

The columns of the CSV file are the following ones:

type: the type publication between "journal", "book", "proceedings", "dataset", "other";

label: the label assigned to the type for visualisation purposes;

coci_open_cit: the number of open citations received by the publication type according to COCI;

crossref_close_cit: the number of closed citations received by the publication according to Crossref + COCI.

publishers_cits.csv: it is a CSV file that contains the top twenty publishers that received the greatest number of open citations. The columns of the CSV file are the following ones:

publisher: the name of the publisher;

doi_prefix: the list of DOI prefixes used assigned by the publisher;

coci_open_cit: the number of open citations received by the publications of the publisher according to COCI;

crossref_close_cit: the number of closed citations received by the publications of the publishers according to Crossref + COCI;

total_cit: the total number of citations received by the publications of the publisher (= coci_open_cit + crossref_close_cit).

20publishers_cr.csv: it is a CSV file that contains the numbers of the contributions to open citations made by the twenty publishers introduced in the previous CSV file as of 24 January 2018, according to the data available through the Crossref API. The counts listed in this file refers to the number of publications for which each publisher has submitted metadata to Crossref that include the publication’s reference list. The categories 'closed', 'limited' and 'open' refer to publications for which the reference lists are not visible to anyone outside the Crossref Cited-by membership, are visible only to them and to Crossref Metadata Plus members, or are visible to all, respectively. In addition, the file also record the total number of publications for which the publisher has submitted metadata to Crossref, whether or not those metadata include the reference lists of those publications.

The columns of the CSV file are the following ones:

publisher: the name of the publisher;

open: the number of publications in Crossref with an 'open' visibility for their reference lists;

limited: the number of publications in Crossref with an 'limited' visibility for their reference lists;

closed: the number of publications in Crossref with an 'closed' visibility for their reference lists;

overall_deposited: the overall number of publications for which the publisher has submitted metadata to Crossref.
d
Reporting behavior from WHO COVID-19 public data
search.dataone.org
data.niaid.nih.gov
+1more
Updated Jul 14, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Auss Abbood (2025). Reporting behavior from WHO COVID-19 public data [Dataset]. http://doi.org/10.5061/dryad.9s4mw6mmb
Explore at:
Unique identifier
https://doi.org/10.5061/dryad.9s4mw6mmb
Dataset updated
Jul 14, 2025
Dataset provided by
Dryad Digital Repository
Authors
Auss Abbood
Time period covered
Dec 16, 2022
Description
Objective Daily COVID-19 data reported by the World Health Organization (WHO) may provide the basis for political ad hoc decisions including travel restrictions. Data reported by countries, however, is heterogeneous and metrics to evaluate its quality are scarce. In this work, we analyzed COVID-19 case counts provided by WHO and developed tools to evaluate country-specific reporting behaviors. Methods In this retrospective cross-sectional study, COVID-19 data reported daily to WHO from 3rd January 2020 until 14th June 2021 were analyzed. We proposed the concepts of binary reporting rate and relative reporting behavior and performed descriptive analyses for all countries with these metrics. We developed a score to evaluate the consistency of incidence and binary reporting rates. Further, we performed spectral clustering of the binary reporting rate and relative reporting behavior to identify salient patterns in these metrics. Results Our final analysis included 222 countries and regions...., Data collection COVID-19 data was downloaded from WHO. Using a public repository, we have added the countries' full names to the WHO data set using the two-letter abbreviations for each country to merge both data sets. The provided COVID-19 data covers January 2020 until June 2021. We uploaded the final data set used for the analyses of this paper. Data processing We processed data using a Jupyter Notebook with a Python kernel and publically available external libraries. This upload contains the required Jupyter Notebook (reporting_behavior.ipynb) with all analyses and some additional work, a README, and the conda environment yml (env.yml)., Any text editor including Microsoft Excel and their free alternatives can open the uploaded CSV file. Any web browser and some code editors (like the freely available Visual Studio Code) can show the uploaded Jupyter Notebook if the required Python environment is set up correctly.
Cognitive Fatigue
figshare.com
csv
Updated Nov 5, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Rui Varandas; Inês Silveira; Hugo Gamboa (2025). Cognitive Fatigue [Dataset]. http://doi.org/10.6084/m9.figshare.28188143.v3
Explore at:
csvAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.28188143.v3
Dataset updated
Nov 5, 2025
Dataset provided by
Figsharehttp://figshare.com/
Authors
Rui Varandas; Inês Silveira; Hugo Gamboa
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Cognitive FatigueWhile executing the proposed tasks, the participants’ physiological signals were monitored using two biosignalsplux devices from PLUX Wireless Biosignals, Lisbon, Portugal, with a sampling frequency of 100 Hz a resolution of 16 bits (24 bits in the case of fNIRS). Six different sensors were used: EEG and fNIRS positioned around the F7 and F8 of the 10–20 system (dorsolateral prefrontal cortex is often used to assess CW and fatigue as well as cognitive states); ECG monitored an approximation of Lead I of the Einthoven system, EDA placed on the palm of the non-dominant hand; ACC was positioned on the right side of the head to measure head movement and overall posture changes, and the RIP sensor was attached to the upper-abdominal area to measure the respiration cycles—the combination of the three allows to infer about the response of the Autonomic Nervous System (ANS) of the human body, namely, the response of the sympathetic and parasympathetic nervous system.2.1. Experimental designCognitive fatigue (CF) is a phenomenon that arises following the prolonged engagement in mentally demanding cognitive tasks. Thus, we developed an experimental procedure that involved three demanding tasks: a digital lesson in Jupyter Notebook format, three repetitions of Corsi-Block task, and two repetitions of a concentration test.Before the Corsi-Block task and after the concentration task there were periods of baseline of two min. In our analysis, the first baseline period, although not explicitly present in the dataset, was designated as representing no CF, whereas the final baseline period was designated as representing the presence of CF. Between repetitions of the Corsi-Block task, there were periods of baseline of 15 s after the task and of 30 s before the beginning of each repetition of the task.2.2. Data recordingA data sample of 10 volunteer participants (4 females) aged between 22 and 48 years old (M = 28.2, SD = 7.6) took part in this study. All volunteers were recruited at NOVA School of Science and Technology, fluent in English, right-handed, none reported suffering from psychological disorders, and none reported taking regular medication. Written informed consent was obtained before participating and all Ethical Procedures approved by the Ethics Committee of NOVA University of Lisbon were thoroughly followed.In this study, we omitted the data from one participant due to the insufficient duration of data acquisition.2.3. Data labellingThe labels easy, difficult, very difficult and repeat found in the ECG_lesson_answers.txt files represent the subjects' opinion of the content read in the ECG lesson. The repeat label represents the most difficult level. It's called repeat because when you press it, the answer to the question is shown again. This system is based on the Anki system, which has been proposed and used to memorise information effectively. In addition, the PB description JSON files include timestamps indicating the start and end of cognitive tasks, baseline periods, and other events, which are useful for defining CF states as we defined in 2.1.2.4. Data descriptionBiosignals include EEG, fNIRS (not converted to oxi and deoxiHb), ECG, EDA, respiration (RIP), accelerometer (ACC), and push-button data (PB). All signals have already been converted to physical units. In each biosignal file, the first column corresponds to the timestamps.HCI features encompass keyboard, mouse, and screenshot data. Below is a Python code snippet for extracting screenshot files from the screenshots CSV file.import base64from os import mkdirfrom os.path import joinfile = '...'with open(file, 'r') as f: lines = f.readlines()for line in lines[1:]: timestamp = line.split(',')[0] code = line.split(',')[-1][:-2] imgdata = base64.b64decode(code) filename = str(timestamp) + '.jpeg' mkdir('screenshot') with open(join('screenshot', filename), 'wb') as f: f.write(imgdata)A characterization file containing age and gender information for all subjects in each dataset is provided within the respective dataset folder (e.g., D2_subject-info.csv). Other complementary files include (i) description of the pushbuttons to help segment the signals (e.g., D2_S2_PB_description.json) and (ii) labelling (e.g., D2_S2_ECG_lesson_results.txt). The files D2_Sx_results_corsi-block_board_1.json and D2_Sx_results_corsi-block_board_2.json show the results for the first and second iterations of the corsi-block task, where, for example, row_0_1 = 12 means that the subject got 12 pairs right in the first row of the first board, and row_0_2 = 12 means that the subject got 12 pairs right in the first row of the second board.
Z
The S&M-HSTPM2d5 dataset: High Spatial-Temporal Resolution PM 2.5 Measures...
data.niaid.nih.gov
Updated Sep 25, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Chen, Xinlei; Liu, Xinyu; Eng, Kent X.; Liu, Jingxiao; Noh, Hae Young; Zhang, Lin; Zhang, Pei (2020). The S&M-HSTPM2d5 dataset: High Spatial-Temporal Resolution PM 2.5 Measures in Multiple Cities Sensed by Static & Mobile Devices [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_4028129
Explore at:
Dataset updated
Sep 25, 2020
Dataset provided by
Tsinghua University
Carnegie Mellon University
Stanford University
Authors
Chen, Xinlei; Liu, Xinyu; Eng, Kent X.; Liu, Jingxiao; Noh, Hae Young; Zhang, Lin; Zhang, Pei
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This S&M-HSTPM2d5 dataset contains the high spatial and temporal resolution of the particulates (PM2.5) measures with the corresponding timestamp and GPS location of mobile and static devices in the three Chinese cities: Foshan, Cangzhou, and Tianjin. Different numbers of static and mobile devices were set up in each city. The sampling rate was set up as one minute in Cangzhou, and three seconds in Foshan and Tianjin. For the specific detail of the setup, please refer to the Device_Setup_Description.txt file in this repository and the data descriptor paper.

After the data collection process, the data cleaning process was performed to remove and adjust the abnormal and drifting data. The script of the data cleaning algorithm is provided in this repository. The data cleaning algorithm only adjusts or removes individual data points. The removal of the entire device's data was done after the data cleaning algorithm with empirical judgment and graphic visualization. For specific detail of the data cleaning process, please refer to the script (Data_cleaning_algorithm.ipynb) in this repository and the data descriptor paper.

The dataset in this repository is the processed version. The raw dataset and removed devices are not included in this repository.

The data is stored as a CSV file. Each CSV file which is named by the device ID represents the data that was collected by the corresponding device. Each CSV file has three types of data: timestamp as the China Standard Time (GMT+8), geographic location as latitude and longitude, and PM2.5 concentration with the unit of microgram per cubic meter. The CSV files are stored in either Static or Mobile folder which represents the devices' type. The Static and Mobile folder are stored in the corresponding city's folder.

To access the dataset, any programming language that can access CSV files is appropriate. Users can also open the CSV file directly. The get_dataset.ipynb file in this repository also provides an option of accessing the dataset. To successfully execute ipynb file, Jupyter Notebook with Python 3.0 is required. The following python library is also required:

get_dataset.ipynb: 1. os library 2. pandas library

Data_cleaning_algorithm.ipynb: 1. os library 2. pandas library 3. datetime library 4. math library

The instruction of installing the libraries above can be found online. After installing the Jupyter Notebook with Python 3.0 and the required libraries, users can try to open the ipynb file with Jupyter Notebook and follow the instruction inside the file.

For questions or suggestions please e-mail Xinlei Chen
O
Renewable power plants
data.open-power-system-data.org
csv, sqlite, xlsx
Updated Apr 5, 2019
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ingmar Schlecht; Milos Simic (2019). Renewable power plants [Dataset]. http://doi.org/10.25832/renewable_power_plants/2019-04-05
Explore at:
sqlite, csv, xlsxAvailable download formats
Unique identifier
https://doi.org/10.25832/renewable_power_plants/2019-04-05
Dataset updated
Apr 5, 2019
Dataset provided by
Open Power System Data
Authors
Ingmar Schlecht; Milos Simic
Time period covered
Jan 1, 1901 - Jan 21, 2019
Variables measured
day, CH_wind_capacity, DE_wind_capacity, DK_wind_capacity, CH_hydro_capacity, CH_solar_capacity, DE_solar_capacity, DK_solar_capacity, DE_storage_capacity, GB-GBN_wind_capacity, and 26 more
Description
List of renewable energy power stations. This Data Package contains a list of renewable energy power plants in lists of renewable energy-based power plants of Germany, Denmark, France, Switzerland, the United Kingdom and Poland. Germany: More than 1.7 million renewable power plant entries, eligible under the renewable support scheme (EEG). Denmark: Wind and phovoltaic power plants with a high level of detail. France: Aggregated capacity and number of installations per energy source per municipality (Commune). Poland: Summed capacity and number of installations per energy source per municipality (Powiat). Switzerland: Renewable power plants eligible under the Swiss feed in tariff KEV (Kostendeckende Einspeisevergütung). United Kingdom: Renewable power plants in the United Kingdom. Due to different data availability, the power plant lists are of different accurancy and partly provide different power plant parameter. Due to that, the lists are provided as seperate csv-files per country and as separate sheets in the excel file. Suspect data or entries with high probability of duplication are marked in the column 'comment'. Theses validation markers are explained in the file validation_marker.csv. Additionally, the Data Package includes daily time series of cumulated installed capacity per energy source type for Germany, Denmark, Switzerland and the United Kingdom. All data processing is conducted in Python and pandas and has been documented in the Jupyter Notebooks linked below.
Effects of community management on user activity in online communities
zenodo.org
data.niaid.nih.gov
zip
Updated Apr 24, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Alberto Cottica; Alberto Cottica (2025). Effects of community management on user activity in online communities [Dataset]. http://doi.org/10.5281/zenodo.1320261
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.1320261
Dataset updated
Apr 24, 2025
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Alberto Cottica; Alberto Cottica
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Data and code needed to reproduce the results of the paper "Effects of community management on user activity in online communities", available in draft here.

Instructions:

Unzip the files.

Start with JSON files obtained from calling platform APIs: each dataset consists of one file for posts, one for comments, one for users. In the paper we use two datasets, one referring Edgeryders, the other to Matera 2019.

Run them through edgesense (https://github.com/edgeryders/edgesense). Edgesense allows to set the length of the observation period. We set it to 1 week and 1 day for Edgeryders data, and to 1 day for Matera 2019 data. Edgesense stores its results in a file called JSON network.min.json, which we then rename to keep track of the data source and observation length.

Launch Jupyter Notebook and run the notebook provided to convert the network.min.json files into CSV flat files, one for each netwrk file

Launch Stata and open each flat csv files with it, then save it in Stata format.

Use the provided Stata .do scripts to replicate results.

Please note: I use both Stata and Jupyter Notebook interactively, running a block with a few lines of code at a time. Expect to have to change directories, file names etc.
Data from: Data and code from: Cultivation and dynamic cropping processes...
catalog.data.gov
agdatacommons.nal.usda.gov
Updated Dec 2, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Agricultural Research Service (2025). Data and code from: Cultivation and dynamic cropping processes impart land-cover heterogeneity within agroecosystems: a metrics-based case study in the Yazoo-Mississippi Delta (USA) [Dataset]. https://catalog.data.gov/dataset/data-and-code-from-cultivation-and-dynamic-cropping-processes-impart-land-cover-heterogene-f5f78
Explore at:
Dataset updated
Dec 2, 2025
Dataset provided by
Agricultural Research Servicehttps://www.ars.usda.gov/
Area covered
Mississippi Delta, United States, Mississippi
Description
This dataset contains data and code from the manuscript:Heintzman, L.J., McIntyre, N.E., Langendoen, E.J., & Read, Q.D. (2024). Cultivation and dynamic cropping processes impart land-cover heterogeneity within agroecosystems: a metrics-based case study in the Yazoo-Mississippi Delta (USA). Landscape Ecology 39, 29 (2024). https://doi.org/10.1007/s10980-024-01797-0There are 14 rasters of land use and land cover data for the study region, in .tif format with associated auxiliary files, two shape files with county boundaries and study area extent, a CSV file with summary information derived from the rasters, and a Jupyter notebook containing Python code.The rasters included here represent an intermediate data product. Original unprocessed rasters from NASS CropScape are not included here, nor is the code to process them.List of filesMS_Delta_maps.zipMSDeltaCounties_UTMZone15N.shp: Depiction of the 19 counties (labeled) that intersect the Mississippi Alluvial Plain in western Mississippi.MS_Delta_MAP_UTMZone15N.shp: Depiction of the study area extent.mf8h_20082021.zipmf8h_XXXX.tif: Yearly, reclassified and majority filtered LULC data used to build comboall1.csv - derived from USDA NASS CropScape. There are 14 .tif files total for years 2008-2021. Each .tif file includes auxiliary files with the same file name and the following extensions: .tfw, .tif.aux.xml, .tif.ovr., .tif.vat.cpg., .tif.vat.dbf.comboall1.csv: Combined dataset of LULC information for all 14 years in study period.analysis.ipynb_.txt: Jupyter Notebook used to analyze comboall1.csv. Convert to .ipynb format to open with Jupyter.This research was conducted under USDA Agricultural Research Service, National Program 211 (Water Availability and Watershed Management).
Sample Park Analysis
figshare.com
zip
Updated Nov 2, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Eric Delmelle (2025). Sample Park Analysis [Dataset]. http://doi.org/10.6084/m9.figshare.30509021.v1
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.30509021.v1
Dataset updated
Nov 2, 2025
Dataset provided by
figshare
Figsharehttp://figshare.com/
Authors
Eric Delmelle
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
README – Sample Park Analysis## OverviewThis repository contains a Google Colab / Jupyter notebook and accompanying dataset used for analyzing park features and associated metrics. The notebook demonstrates data loading, cleaning, and exploratory analysis of the Hope_Park_original.csv file.## Contents- sample park analysis.ipynb — The main analysis notebook (Colab/Jupyter format)- Hope_Park_original.csv — Source dataset containing park information- README.md — Documentation for the contents and usage## Usage1. Open the notebook in Google Colab or Jupyter.2. Upload the Hope_Park_original.csv file to the working directory (or adjust the file path in the notebook).3. Run each cell sequentially to reproduce the analysis.## RequirementsThe notebook uses standard Python data science libraries:```pythonpandasnumpymatplotlibseaborn
u
Data from: T1DiabetesGranada: a longitudinal multi-modal dataset of type 1...
produccioncientifica.ugr.es
data.niaid.nih.gov
Updated 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Rodriguez-Leon, Ciro; Aviles Perez, Maria Dolores; Banos, Oresti; Quesada-Charneco, Miguel; Lopez-Ibarra, Pablo J; Villalonga, Claudia; Munoz-Torres, Manuel; Rodriguez-Leon, Ciro; Aviles Perez, Maria Dolores; Banos, Oresti; Quesada-Charneco, Miguel; Lopez-Ibarra, Pablo J; Villalonga, Claudia; Munoz-Torres, Manuel (2023). T1DiabetesGranada: a longitudinal multi-modal dataset of type 1 diabetes mellitus [Dataset]. https://produccioncientifica.ugr.es/documentos/668fc429b9e7c03b01bd53b7
Explore at:
Dataset updated
2023
Authors
Rodriguez-Leon, Ciro; Aviles Perez, Maria Dolores; Banos, Oresti; Quesada-Charneco, Miguel; Lopez-Ibarra, Pablo J; Villalonga, Claudia; Munoz-Torres, Manuel; Rodriguez-Leon, Ciro; Aviles Perez, Maria Dolores; Banos, Oresti; Quesada-Charneco, Miguel; Lopez-Ibarra, Pablo J; Villalonga, Claudia; Munoz-Torres, Manuel
Description
T1DiabetesGranada

A longitudinal multi-modal dataset of type 1 diabetes mellitus

Documented by:

Rodriguez-Leon, C., Aviles-Perez, M. D., Banos, O., Quesada-Charneco, M., Lopez-Ibarra, P. J., Villalonga, C., & Munoz-Torres, M. (2023). T1DiabetesGranada: a longitudinal multi-modal dataset of type 1 diabetes mellitus. Scientific Data, 10(1), 916. https://doi.org/10.1038/s41597-023-02737-4

Background

Type 1 diabetes mellitus (T1D) patients face daily difficulties in keeping their blood glucose levels within appropriate ranges. Several techniques and devices, such as flash glucose meters, have been developed to help T1D patients improve their quality of life. Most recently, the data collected via these devices is being used to train advanced artificial intelligence models to characterize the evolution of the disease and support its management. The main problem for the generation of these models is the scarcity of data, as most published works use private or artificially generated datasets. For this reason, this work presents T1DiabetesGranada, a open under specific permission longitudinal dataset that not only provides continuous glucose levels, but also patient demographic and clinical information. The dataset includes 257780 days of measurements over four years from 736 T1D patients from the province of Granada, Spain. This dataset progresses significantly beyond the state of the art as one the longest and largest open datasets of continuous glucose measurements, thus boosting the development of new artificial intelligence models for glucose level characterization and prediction.

Data Records

The data are stored in four comma-separated values (CSV) files which are available in T1DiabetesGranada.zip. These files are described in detail below.

Patient_info.csv

Patient_info.csv is the file containing information about the patients, such as demographic data, start and end dates of blood glucose level measurements and biochemical parameters, number of biochemical parameters or number of diagnostics. This file is composed of 736 records, one for each patient in the dataset, and includes the following variables:

Patient_ID – Unique identifier of the patient. Format: LIB19XXXX.

Sex – Sex of the patient. Values: F (for female), masculine (for male)

Birth_year – Year of birth of the patient. Format: YYYY.

Initial_measurement_date – Date of the first blood glucose level measurement of the patient in the Glucose_measurements.csv file. Format: YYYY-MM-DD.

Final_measurement_date – Date of the last blood glucose level measurement of the patient in the Glucose_measurements.csv file. Format: YYYY-MM-DD.

Number_of_days_with_measures – Number of days with blood glucose level measurements of the patient, extracted from the Glucose_measurements.csv file. Values: ranging from 8 to 1463.

Number_of_measurements – Number of blood glucose level measurements of the patient, extracted from the Glucose_measurements.csv file. Values: ranging from 400 to 137292.

Initial_biochemical_parameters_date – Date of the first biochemical test to measure some biochemical parameter of the patient, extracted from the Biochemical_parameters.csv file. Format: YYYY-MM-DD.

Final_biochemical_parameters_date – Date of the last biochemical test to measure some biochemical parameter of the patient, extracted from the Biochemical_parameters.csv file. Format: YYYY-MM-DD.

Number_of_biochemical_parameters – Number of biochemical parameters measured on the patient, extracted from the Biochemical_parameters.csv file. Values: ranging from 4 to 846.

Number_of_diagnostics – Number of diagnoses realized to the patient, extracted from the Diagnostics.csv file. Values: ranging from 1 to 24.

Glucose_measurements.csv

Glucose_measurements.csv is the file containing the continuous blood glucose level measurements of the patients. The file is composed of more than 22.6 million records that constitute the time series of continuous blood glucose level measurements. It includes the following variables:

Patient_ID – Unique identifier of the patient. Format: LIB19XXXX.

Measurement_date – Date of the blood glucose level measurement. Format: YYYY-MM-DD.

Measurement_time – Time of the blood glucose level measurement. Format: HH:MM:SS.

Measurement – Value of the blood glucose level measurement in mg/dL. Values: ranging from 40 to 500.

Biochemical_parameters.csv

Biochemical_parameters.csv is the file containing data of the biochemical tests performed on patients to measure their biochemical parameters. This file is composed of 87482 records and includes the following variables:

Patient_ID – Unique identifier of the patient. Format: LIB19XXXX.

Reception_date – Date of receipt in the laboratory of the sample to measure the biochemical parameter. Format: YYYY-MM-DD.

Name – Name of the measured biochemical parameter. Values: 'Potassium', 'HDL cholesterol', 'Gammaglutamyl Transferase (GGT)', 'Creatinine', 'Glucose', 'Uric acid', 'Triglycerides', 'Alanine transaminase (GPT)', 'Chlorine', 'Thyrotropin (TSH)', 'Sodium', 'Glycated hemoglobin (Ac)', 'Total cholesterol', 'Albumin (urine)', 'Creatinine (urine)', 'Insulin', 'IA ANTIBODIES'.

Value – Value of the biochemical parameter. Values: ranging from -4.0 to 6446.74.

Diagnostics.csv

Diagnostics.csv is the file containing diagnoses of diabetes mellitus complications or other diseases that patients have in addition to type 1 diabetes mellitus. This file is composed of 1757 records and includes the following variables:

Patient_ID – Unique identifier of the patient. Format: LIB19XXXX.

Code – ICD-9-CM diagnosis code. Values: subset of 594 of the ICD-9-CM codes (https://www.cms.gov/Medicare/Coding/ICD9ProviderDiagnosticCodes/codes).

Description – ICD-9-CM long description. Values: subset of 594 of the ICD-9-CM long description (https://www.cms.gov/Medicare/Coding/ICD9ProviderDiagnosticCodes/codes).

Technical Validation

Blood glucose level measurements are collected using FreeStyle Libre devices, which are widely used for healthcare in patients with T1D. Abbott Diabetes Care, Inc., Alameda, CA, USA, the manufacturer company, has conducted validation studies of these devices concluding that the measurements made by their sensors compare to YSI analyzer devices (Xylem Inc.), the gold standard, yielding results of 99.9% of the time within zones A and B of the consensus error grid. In addition, other studies external to the company concluded that the accuracy of the measurements is adequate.

Moreover, it was also checked in most cases the blood glucose level measurements per patient were continuous (i.e. a sample at least every 15 minutes) in the Glucose_measurements.csv file as they should be.

Usage Notes

For data downloading, it is necessary to be authenticated on the Zenodo platform, accept the Data Usage Agreement and send a request specifying full name, email, and the justification of the data use. This request will be processed by the Secretary of the Department of Computer Engineering, Automatics, and Robotics of the University of Granada and access to the dataset will be granted.

The files that compose the dataset are CSV type files delimited by commas and are available in T1DiabetesGranada.zip. A Jupyter Notebook (Python v. 3.8) with code that may help to a better understanding of the dataset, with graphics and statistics, is available in UsageNotes.zip.

Graphs_and_stats.ipynb

The Jupyter Notebook generates tables, graphs and statistics for a better understanding of the dataset. It has four main sections, one dedicated to each file in the dataset. In addition, it has useful functions such as calculating the patient age, deleting a patient list from a dataset file and leaving only a patient list in a dataset file.

Code Availability

The dataset was generated using some custom code located in CodeAvailability.zip. The code is provided as Jupyter Notebooks created with Python v. 3.8. The code was used to conduct tasks such as data curation and transformation, and variables extraction.

Original_patient_info_curation.ipynb

In the Jupyter Notebook is preprocessed the original file with patient data. Mainly irrelevant rows and columns are removed, and the sex variable is recoded.

Glucose_measurements_curation.ipynb

In the Jupyter Notebook is preprocessed the original file with the continuous glucose level measurements of the patients. Principally rows without information or duplicated rows are removed and the variable with the timestamp is transformed into two new variables, measurement date and measurement time.

Biochemical_parameters_curation.ipynb

In the Jupyter Notebook is preprocessed the original file with patient data of the biochemical tests performed on patients to measure their biochemical parameters. Mainly irrelevant rows and columns are removed and the variable with the name of the measured biochemical parameter is translated.

Diagnostic_curation.ipynb

In the Jupyter Notebook is preprocessed the original file with patient data of the diagnoses of diabetes mellitus complications or other diseases that patients have in addition to T1D.

Get_patient_info_variables.ipynb

In the Jupyter Notebook it is coded the feature extraction process from the files Glucose_measurements.csv, Biochemical_parameters.csv and Diagnostics.csv to complete the file Patient_info.csv. It is divided into six sections, the first three to extract the features from each of the mentioned files and the next three to add the extracted features to the resulting new file.

Data Usage Agreement

The conditions for use are as follows:

You confirm that you will not attempt to re-identify research participants for any reason, including for re-identification theory research.

You commit to keeping the T1DiabetesGranada dataset confidential and secure and will not redistribute data or Zenodo account credentials.

You will require
Data set from: Rates of Compact Object Coalescences
zenodo.org
data.niaid.nih.gov
+1more
csv, pdf, zip
Updated Jul 18, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Floor Broekgaarden; Floor Broekgaarden; Ilya Mandel; Ilya Mandel (2024). Data set from: Rates of Compact Object Coalescences [Dataset]. http://doi.org/10.5281/zenodo.5232245
Explore at:
csv, zip, pdfAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.5232245
Dataset updated
Jul 18, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Floor Broekgaarden; Floor Broekgaarden; Ilya Mandel; Ilya Mandel
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Data from: Rates of Compact Object Coalescence

Brief overview:
This Zenodo entry contains the data that has been used to make the figures for the living review "Rates of Compact Object Coalescence" by Ilya Mandel & Floor Broekgaarden (2021). To reproduce the figures, download all the *.csv files and run the jupyter notebook created to reproduce the results in the publicly available Github directory https://github.com/FloorBroekgaarden/Rates_of_Compact_Object_Coalescence (the exact jupyter notebook can be found here)

For any suggestions, questions or inquiry, please email one, or both, of the authors:

Ilya Mandel: ilya.mandel@monash.edu

Floor Broekgaarden: floor.broekgaarden@cfa.harvard.edu

We very much welcome suggestions for additional/missing literature with rate predictions or measurements.

Extra figures:
Extra figures that can be used can be found here:

Vertical figures: https://docs.google.com/presentation/d/1GqJ0k2zpnxBGwIYNeQ0BfsLSU7H2942gspL-PN_iaJY/edit?usp=sharing

The authors are currently working on making an interactive tool for plotting the rates that will be available soon. In the mean time, feel free to send requests for plots/figures to the authors.

Reference
If you use this data/code for publication, please cite both the paper: Mandel & Broekgaarden (2021) (https://ui.adsabs.harvard.edu/abs/2021arXiv210714239M/abstract) and the dataset on Zenodo through it's doi (see tabs on the right of this zenodo entry)

Details datafiles:

The PDF COC_rates_supplementary_material.pdf attached (and in the Github repository) describes how each of the rates in the data files of this Zenodo entry are retrieved. The other 26 files are .csv files, where each csv file contains the rates from one specific double compact object type: NS-NS, NS-BH or BH-BH, and specific rate group (isolated binary evolution, gravitational wave observations etc.). The files in this entry are:

Data_Mandel_and_Broekgaarden_2021.zip all the files below conveniently in one zip file so that you only have to do 1 download.

COC_rates_supplementary_material.pdf # PDF document describing how the rates are retrieved and quoted rom each study

BH-BH_rates_CHE.csv # BH-BH rates for chemically homogeneous evolution

BH-BH_rates_globular-clusters.csv # BH-BH rates for dynamical formation in globular clusters

BH-BH_rates_isolated-binary-evolution.csv # BH-BH rates for isolated binary evolution

BH-BH_rates_nuclear-clusters.csv # BH-BH rates for (dynamical )formation in (active) nuclear star clusters

BH-BH_rates_observations-GWs.csv # BH-BH rates for observations from gravitational waves

BH-BH_rates_population-III.csv # BH-BH rates for population-III stars

BH-BH_rates_primordial.csv # BH-BH rates for primordial formation

BH-BH_rates_triples.csv. # BH-BH rates for formation in (hierarchical) triples

BH-BH_rates_young-stellar-clusters.csv # BH-BH rates for dynamical formation in young/open star clusters

NS-BH_rates_CHE.csv # NS-BH rates for chemically homogeneous evolution

NS-BH_rates_globular-clusters.csv # NS-BH rates for dynamical formation in globular clusters

NS-BH_rates_isolated-binary-evolution.csv. # NS-BH rates for isolated binary evolution

NS-BH_rates_nuclear-clusters.csv # NS-BH rates for (dynamical )formation in (active) nuclear star clusters

NS-BH_rates_observations-GWs.csv # NS-BH rates for observations from gravitational waves

NS-BH_rates_population-III.csv # NS-BH rates for population-III stars

NS-BH_rates_triples.csv # NS-BH rates for formation in (hierarchical) triples

NS-BH_rates_young-stellar-clusters.csv # BH-BH rates for dynamical formation in young/open star clusters

NS-NS_rates_globular-clusters.csv # NS-NS rates for dynamical formation in globular clusters

NS-NS_rates_isolated-binary-evolution.csv # NS-NS rates for isolated binary evolution

NS-NS_rates_nuclear-clusters.csv # NS-NS rates for (dynamical )formation in (active) nuclear star clusters

NS-NS_rates_observations-GWs.csv # NS-NS rates for observations from gravitational waves

NS-NS_rates_observations-kilonovae.csv # NS-NS rates for observations from kilonovae

NS-NS_rates_observations-pulsars.csv # NS-NS rates for observations from Galactic pulsars

NS-NS_rates_observations-sGRBs.csv # NS-NS rates for observations short gamma-ray bursts

NS-NS_rates_triples.csv # NS-NS rates for formation in (hierarchical) triples

NS-NS_rates_young-stellar-clusters.csv # NS-NS rates for dynamical formation in young/open star clusters

Each csv file contains the following header:
ADS year # year of the paper in the ADS entry
ADS month # month of the paper in the ADS entry
ADS abstract link # link to the ADS abstract
ArXiv link # link to the ArXiv version of the paper
First Author # name of the first author
label string # label of the study, that corresponds to the label in the figure
code (optional) # name of the code used in this study
type of limit (for plotting, see jupyter notebook for a dictionary) # integer, that is used to map to a certain limit visualization in the plot (e.g. scatter points vs upper limit).

Each entry takes two columns in the csv files. One for the rates (quoted under the header 'rate [Gpc^-3 yr^-1]') and one for "notes" where we sometimes added notes about the rates (such as whether it is an upper or lower limit).
H
National Water Model HydroLearn Python Notebooks
hydroshare.org
zip
Updated Nov 14, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dan Ames; Justin Hunter (2023). National Water Model HydroLearn Python Notebooks [Dataset]. http://doi.org/10.4211/hs.5949aec47b484e689573beeb004a2917
Explore at:
zip(1.8 MB)Available download formats
Unique identifier
https://doi.org/10.4211/hs.5949aec47b484e689573beeb004a2917
Dataset updated
Nov 14, 2023
Dataset provided by
HydroShare
Authors
Dan Ames; Justin Hunter
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered

Description
This resource contains Jupyter Python notebooks which are intended to be used to learn about the U.S. National Water Model (NWM). These notebooks explore NWM forecasts in various ways. NWM Notebooks 1, 2, and 3, access NWM forecasts directly from the NOAA NOMADS file sharing system. Notebook 4 accesses NWM forecasts from Google Cloud Platform (GCP) storage in addition to NOMADS. A brief summary of what each notebook does is included below:

Notebook 1 (NWM1_Visualization) focuses on visualization. It includes functions for downloading and extracting time series forecasts for any of the 2.7 million stream reaches of the U.S. NWM. It also demonstrates ways to visualize forecasts using Python packages like matplotlib.

Notebook 2 (NWM2_Xarray) explores methods for slicing and dicing NWM NetCDF files using the python library, XArray.

Notebook 3 (NWM3_Subsetting) is focused on subsetting NWM forecasts and NetCDF files for specified reaches and exporting NWM forecast data to CSV files.

Notebook 4 (NWM4_Hydrotools) uses Hydrotools, a new suite of tools for evaluating NWM data, to retrieve NWM forecasts both from NOMADS and from Google Cloud Platform storage where older NWM forecasts are cached. This notebook also briefly covers visualizing, subsetting, and exporting forecasts retrieved with Hydrotools.

NOTE: Notebook 4 Requires a newer version of NumPy that is not available on the default CUAHSI JupyterHub instance. Please use the instance "HydroLearn - Intelligent Earth" and ensure to run !pip install hydrotools.nwm_client[gcp].

The notebooks are part of a NWM learning module on HydroLearn.org. When the associated learning module is complete, the link to it will be added here. It is recommended that these notebooks be opened through the CUAHSI JupyterHub App on Hydroshare. This can be done via the 'Open With' button at the top of this resource page.
Replication Package: Unboxing Default Argument Breaking Changes in 1 + 2...
zenodo.org
application/gzip
Updated Jul 15, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
João Eduardo Montandon; Luciana Lourdes Silva; Cristiano Politowski; Daniel Prates; Arthur Bonifácio; Ghizlane El Boussaidi; João Eduardo Montandon; Luciana Lourdes Silva; Cristiano Politowski; Daniel Prates; Arthur Bonifácio; Ghizlane El Boussaidi (2024). Replication Package: Unboxing Default Argument Breaking Changes in 1 + 2 Data Science Libraries in Python [Dataset]. http://doi.org/10.5281/zenodo.11584961
Explore at:
application/gzipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.11584961
Dataset updated
Jul 15, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
João Eduardo Montandon; Luciana Lourdes Silva; Cristiano Politowski; Daniel Prates; Arthur Bonifácio; Ghizlane El Boussaidi; João Eduardo Montandon; Luciana Lourdes Silva; Cristiano Politowski; Daniel Prates; Arthur Bonifácio; Ghizlane El Boussaidi
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Replication Package

This repository contains data and source files needed to replicate our work described in the paper "Unboxing Default Argument Breaking Changes in Scikit Learn".

Requirements

We recommend the following requirements to replicate our study:

Internet access

At least 100GB of space

Docker installed

Git installed

Package Structure

We relied on Docker containers to provide a working environment that is easier to replicate. Specifically, we configure the following containers:

data-analysis, an R-based Container we used to run our data analysis.

data-collection, a Python Container we used to collect Scikit's default arguments and detect them in client applications.

database, a Postgres Container we used to store clients' data, obtainer from Grotov et al.

storage, a directory used to store the data processed in data-analysis and data-collection. This directory is shared in both containers.

docker-compose.yml, the Docker file that configures all containers used in the package.

In the remainder of this document, we describe how to set up each container properly.

Using VSCode to Setup the Package

We selected VSCode as the IDE of choice because its extensions allow us to implement our scripts directly inside the containers. In this package, we provide configuration parameters for both data-analysis and data-collection containers. This way you can directly access and run each container inside it without any specific configuration.

You first need to set up the containers

$ cd /replication/package/folder $ docker-compose build $ docker-compose up # Wait docker creating and running all containers

Then, you can open them in Visual Studio Code:

Open VSCode in project root folder

Access the command palette and select "Dev Container: Reopen in Container"

Select either Data Collection or Data Analysis.

Start working

If you want/need a more customized organization, the remainder of this file describes it in detail.

Longest Road: Manual Package Setup

Database Setup

The database container will automatically restore the dump in dump_matroskin.tar in its first launch. To set up and run the container, you should:

Build an image:

$ cd ./database $ docker build --tag 'dabc-database' . $ docker image ls REPOSITORY TAG IMAGE ID CREATED SIZE dabc-database latest b6f8af99c90d 50 minutes ago 18.5GB

Create and enter inside the container:

$ docker run -it --name dabc-database-1 dabc-database $ docker exec -it dabc-database-1 /bin/bash root# psql -U postgres -h localhost -d jupyter-notebooks jupyter-notebooks=# \dt List of relations Schema | Name | Type | Owner --------+-------------------+-------+------- public | Cell | table | root public | Code_cell | table | root public | Md_cell | table | root public | Notebook | table | root public | Notebook_features | table | root public | Notebook_metadata | table | root public | repository | table | root

If you got the tables list as above, your database is properly setup.

It is important to mention that this database is extended from the one provided by Grotov et al.. Basically, we added three columns in the table Notebook_features (API_functions_calls, defined_functions_calls, andother_functions_calls) containing the function calls performed by each client in the database.

Data Collection Setup

This container is responsible for collecting the data to answer our research questions. It has the following structure:

dabcs.py, extract DABCs from Scikit Learn source code, and export them to a CSV file.

dabcs-clients.py, extract function calls from clients and export them to a CSV file. We rely on a modified version of Matroskin to leverage the function calls. You can find the tool's source code in the `matroskin`` directory.

Makefile, commands to set up and run both dabcs.py and dabcs-clients.py

matroskin, the directory containing the modified version of matroskin tool. We extended the library to collect the function calls performed on the client notebooks of Grotov's dataset.

storage, a docker volume where the data-collection should save the exported data. This data will be used later in Data Analysis.

requirements.txt, Python dependencies adopted in this module.

Note that the container will automatically configure this module for you, e.g., install dependencies, configure matroskin, download scikit learn source code, etc. For this, you must run the following commands:

$ cd ./data-collection $ docker build --tag "data-collection" . $ docker run -it -d --name data-collection-1 -v $(pwd)/:/data-collection -v $(pwd)/../storage/:/data-collection/storage/ data-collection $ docker exec -it data-collection-1 /bin/bash $ ls Dockerfile Makefile config.yml dabcs-clients.py dabcs.py matroskin storage requirements.txt utils.py

If you see project files, it means the container is configured accordingly.

Data Analysis Setup

We use this container to conduct the analysis over the data produced by the Data Collection container. It has the following structure:

dependencies.R, an R script containing the dependencies used in our data analysis.

data-analysis.Rmd, the R notebook we used to perform our data analysis

datasets, a docker volume pointing to the storage directory.

Execute the following commands to run this container:

$ cd ./data-analysis $ docker build --tag "data-analysis" . $ docker run -it -d --name data-analysis-1 -v $(pwd)/:/data-analysis -v $(pwd)/../storage/:/data-collection/datasets/ data-analysis $ docker exec -it data-analysis-1 /bin/bash $ ls data-analysis.Rmd datasets dependencies.R Dockerfile figures Makefile

If you see project files, it means the container is configured accordingly.

A note on storage shared folder

As mentioned, the storage folder is mounted as a volume and shared between data-collection and data-analysis containers. We compressed the content of this folder due to space constraints. Therefore, before starting working on Data Collection or Data Analysis, make sure you extracted the compressed files. You can do this by running the Makefile inside storage folder.

$ make unzip # extract files $ ls clients-dabcs.csv clients-validation.csv dabcs.csv Makefile scikit-learn-versions.csv versions.csv $ make zip # compress files $ ls csv-files.tar.gz Makefile
H
Replication Data for: Formal Models of Opinion Formation and their...
dataverse.harvard.edu
Updated Oct 30, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ivan V. Kozitsin (2020). Replication Data for: Formal Models of Opinion Formation and their Application to Real Data: Evidence from Online Social Networks [Dataset]. http://doi.org/10.7910/DVN/CLI1PC
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.7910/DVN/CLI1PC
Dataset updated
Oct 30, 2020
Dataset provided by
Harvard Dataverse
Authors
Ivan V. Kozitsin
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
Replication Data for: “Formal Models of Opinion Formation and their Application to Real Data: Evidence from Online Social Networks” The repository includes three files: Code.ipynb, X_friends.npz, and X_opinions.csv. The first one is a notebook document used by Jupyter Notebook, which stores the Python code to replicate the results from the Manuscript. To open this file one need install Python and Jupyter. The second file contains the matrix A of users’ friendship connections. The matrix representation is the sparse csr-format. The file X_opinions.csv consists of the matrices X ̂ and X ̂_- stacked together along the horizontal axis. The last two files keep the data, which was originally introduced in (Kozitsin et al., 2019) and located on the repository https://doi.org/10.7910/DVN/OUZY74. For convenience of usage, we copy it here. References Kozitsin, I. V., Marchenko, A. M., Goiko, V. L., & Palkin, R. V. (2019). Symmetric Convex Mechanism of Opinion Formation Predicts Directions of Users’ Opinions Trajectories. 1–5.
O
Time series
data.open-power-system-data.org
csv, sqlite
Updated Oct 27, 2016
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jonathan Muehlenpfordt (2016). Time series [Dataset]. https://data.open-power-system-data.org/time_series/2016-10-27
Explore at:
csv, sqliteAvailable download formats
Dataset updated
Oct 27, 2016
Dataset provided by
Open Power System Data
Authors
Jonathan Muehlenpfordt
Time period covered
Dec 31, 1999 - Sep 29, 2016
Variables measured
comment, timestamp, ce(s)t-timestamp, solar_DE_profile, solar_DE_capacity, solar_CZ_generation, solar_DE_generation, wind-onshore_DE_profile, wind_DE-tennet_forecast, solar_DE-tennet_forecast, and 28 more
Description
Load, wind and solar, prices in hourly resolution. This data package contains different kinds of timeseries data relevant for power system modelling, namely electricity consumption (load) for 36 European countries as well as wind and solar power generation and capacities and prices for a growing subset of countries. The timeseries become available at different points in time depending on the sources. The data has been downloaded from the sources, resampled and merged in a large CSV file with hourly resolution. Additionally, the data available at a higher resolution (Some renewables in-feed, 15 minutes) is provided in a separate file. All data processing is conducted in python and pandas and has been documented in the Jupyter notebooks linked below.
H
Notebook to get the indices of National Water Model grid cells containing...
beta.hydroshare.org
hydroshare.org
+1more
zip
Updated Jan 27, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Irene Garousi-Nejad; David Tarboton (2022). Notebook to get the indices of National Water Model grid cells containing SNOTEL sites [Dataset]. http://doi.org/10.4211/hs.7839e3f3b4f54940bd3591b24803cacf
Explore at:
zip(160.7 MB)Available download formats
Unique identifier
https://doi.org/10.4211/hs.7839e3f3b4f54940bd3591b24803cacf
Dataset updated
Jan 27, 2022
Dataset provided by
HydroShare
Authors
Irene Garousi-Nejad; David Tarboton
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The Jupyter Notebook shared here determines X and Y indices of the National Water Model grid cells that contain snow telemetry (SNOTEL) sites. It uses two inputs: one CSV file that includes SNOTEL site information and one NetCDF file that is a land surface model output of the NWM reanalysis results. You can open this resource with CUAHSI JupyterHub and run the notebook within the code folder. The output is a CSV file that gives X and Y indices of the National Water Model grid cells associated with each SNOTEL site.
OpenOrca
kaggle.com
opendatalab.com
+1more
zip
Updated Nov 22, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The Devastator (2023). OpenOrca [Dataset]. https://www.kaggle.com/datasets/thedevastator/open-orca-augmented-flan-dataset/versions/2
Explore at:
zip(2548102631 bytes)Available download formats
Dataset updated
Nov 22, 2023
Authors
The Devastator
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Open-Orca Augmented FLAN Dataset

Unlocking Advanced Language Understanding and ML Model Performance

By Huggingface Hub [source]

About this dataset

The Open-Orca Augmented FLAN Collection is a revolutionary dataset that unlocks new levels of language understanding and machine learning model performance. This dataset was created to support research on natural language processing, machine learning models, and language understanding through leveraging the power of reasoning trace-enhancement techniques. By enabling models to understand complex relationships between words, phrases, and even entire sentences in a more robust way than ever before, this dataset provides researchers expanded opportunities for furthering the progress of linguistics research. With its unique combination of features including system prompts, questions from users and responses from systems, this dataset opens up exciting possibilities for deeper exploration into the cutting edge concepts underlying advanced linguistics applications. Experience a new level of accuracy and performance - explore Open-Orca Augmented FLAN Collection today!

More Datasets

For more datasets, click here.

Featured Notebooks

🚨 Your notebook can be here! 🚨!

How to use the dataset

This guide provides an introduction to the Open-Orca Augmented FLAN Collection dataset and outlines how researchers can utilize it for their language understanding and natural language processing (NLP) work. The Open-Orca dataset includes system prompts, questions posed by users, and responses from the system.

Getting Started The first step is to download the data set from Kaggle at https://www.kaggle.com/openai/open-orca-augmented-flan and save it in a project directory of your choice on your computer or cloud storage space. Once you have downloaded the data set, launch your ‘Jupyter Notebook’ or ‘Google Colab’ program with which you want to work with this data set.

Exploring & Preprocessing Data: To get a better understanding of the features in this dataset, import them into Pandas DataFrame as shown below. You can use other libraries as per your need:

import pandas as pd # Library used for importing datasets into Python df = pd.read_csv('train.csv') #Imports train csv file into Pandas};#DataFrame df[['system_prompt','question','response']].head() #Views top 5 rows with columns 'system_prompt','question','response'

After importing check each feature using basic descriptive statistics such Pandas groupby statement: We can use groupby statements to have greater clarity over the variables present in each feature(elements). The below command will show counts of each element in System Prompt column present under train CVS file :

df['system prompt'].value_counts().head()#shows count of each element present under 'System Prompt'column Output: User says hello guys 587 <br>System asks How are you?: 555 times<br>User says I am doing good: 487 times <br>..and so on

Data Transformation: After inspecting & exploring different features one may want/need certain changes that best suits their needs from this dataset before training modeling algorithms on it.
Common transformation steps include : Removing punctuation marks : Since punctuation marks may not add any value to computation operations , we can remove them using regex functions write .replace('[^A-Za -z]+','' ) as

Research Ideas

Automated Question Answering: Leverage the dataset to train and develop question answering models that can provide tailored answers to specific user queries while retaining language understanding abilities.

Natural Language Understanding: Use the dataset as an exploratory tool for fine-tuning natural language processing applications, such as sentiment analysis, document categorization, parts-of-speech tagging and more.

Machine Learning Optimizations: The dataset can be used to build highly customized machine learning pipelines that allow users to harness the power of conditioning data with pre-existing rules or models for improved accuracy and performance in automated tasks

Acknowledgements

If you use this dataset in your research, please credit the original authors. Data Source

License

License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. [See Other Information](ht...
r
Data from: Coherent X-ray Scattering Reveals Nanoscale Fluctuations in...
demo.researchdata.se
researchdata.se
+2more
Updated May 23, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Maddalena Bin (2023). Coherent X-ray Scattering Reveals Nanoscale Fluctuations in Hydrated Proteins [Dataset]. http://doi.org/10.17045/STHLMUNI.22756400
Explore at:
Unique identifier
https://doi.org/10.17045/STHLMUNI.22756400
Dataset updated
May 23, 2023
Dataset provided by
Stockholm University
Authors
Maddalena Bin
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Datasets:

"Figure1a.csv": scattering intensity of hydrated proteins in Wide-Angle X-ray Scattering for different fluences (in units of photons/second/area).

"Figure1a_inset.csv": scattering intensity of hydrated proteins in Small-Angle X-ray Scattering for different fluences (in units of photons/second/area).

"Figure1b.csv": Intensity autocorrelation functions g2 at momentum transfer Q = 0.08 1/nm for different fluences (in units of photons/second/area).

"Figure1b_inset.csv": decay rate (in second) as a function of the momentum transfer Q (in 1/nm) for different fluences (in units of photons/second/area).

"Figure1c.csv": decay rate (in second) for variable fluence (in photons/second/um^2) at the momentum transfer Q = 0.08 1/nm.

"Figure1d.csv": renormalised intensity autocorrelation functions g2 at momentum transfer Q = 0.08 1/nm for variable fluence (in photons/second/um^2), where the time axis is normalised to the corresponding fluence F by calculating t/(1 + a · F·τ0), where τ0 is the equilibrium time constant extracted by extrapolation to F=0 (from data in "Figure1c.csv)"

"Figure2a.csv": The Wide-Angle X-ray Scattering scattering intensity at different temperatures T=180-290 K

"Figure2b.csv": The Small-Angle X-ray Scattering scattering intensity at different temperatures T=180-290 K

"Figure2c.csv": Intensity autocorrelation functions g2 for different temperatures (T=180-290 K) at momentum transfer Q = 0.1 1/nm.

"Figure2d-2e.csv": time constants (in second) and the Kohlrausch-Williams-Watts (KWW) exponent extracted from the fits of data in "Figure2c.csv" as a function of temperature (in K)

"Figure3b.csv": The normalised variance Chi_T at different temperatures (T=180-290 K) extracted from the two-time correlation functions.

"Figure3c.csv": The maximum of the normalised variance Chi_0 as a function of temperature (in K).

Additionally, a Jupyter notebook "open-data.ipynb" which shows how to load and plot the data from the csv files in Python.
Youtube Quality Videos Classification
kaggle.com
zip
Updated Oct 21, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The Devastator (2022). Youtube Quality Videos Classification [Dataset]. https://www.kaggle.com/datasets/thedevastator/youtube-quality-videos-classification/code
Explore at:
zip(712986 bytes)Available download formats
Dataset updated
Oct 21, 2022
Authors
The Devastator
Area covered
YouTube
Description
Youtube Quality Videos Classification

How to Tell If a Video is Good or Bad

About this dataset

This dataset is important as it can help users find good quality videos more easily. The data was collected using the Youtube API and includes a total of _ videos

Columns: Channel title, view count, like count, comment count, definition, caption, subscribers, total views, average polarity score, label

How to use the dataset

In order to use this dataset, you will need to have the following: -A YouTube API key -A text editor (e.g. Notepad++, Sublime Text, etc.)

Once you have collected these items, you can begin using the dataset. Here is a step-by-step guide: 1) Navigate to the folder where you saved the dataset. 2) Right-click on the file and select Open with > Your text editor. 3) copy your YouTube API key and paste it in place of Your_API_Key in line 4 of the code. 4) Save the file and close your text editor. 5) Navigate to the folder in your terminal/command prompt and type jupyter notebook. This will open a Jupyter Notebook in your browser window.

Research Ideas

This dataset can be used for a number of different things including: 1. Finding good quality videos on youtube 2. Determining which videos are more likely to be reputable 3. Helping people find videos they will enjoy

Acknowledgements

The data for this dataset was collected using the Youtube API and includes a total of _ videos

License

See the dataset description for more information.

Columns

File: dataframeclean.csv | Column name | Description | |:-----------------------|:--------------| | **** | | | channelTitle | | | viewCount | | | likeCount | | | commentCount | | | definition | | | caption | | | subscribers | | | totalViews | | | avg polarity score | | | Label | | | pushblishYear | | | durationSecs | | | tagCount | | | title length | | | description length | |

File: ytdataframe.csv | Column name | Description | |:-----------------------|:------------------------------------------------------------| | **** | | | channelTitle | | | viewCount | | | likeCount | | | commentCount | | | definition | | | caption | | | subscribers | | | totalViews | | | avg polarity score | | | Label | | | title | The title of the video. (String) | | description | A description of the video. (String) | | tags | The tags associated with the video. (String) | | publishedAt | The date and time the video was published. (String) | | favouriteCount | The number of times the video has been favorited. (Integer) | | duration | The length of the video in seconds. (Integer) |

File: ytdataframe2.csv | Column name | Description | |:-----------------------|:------------------------------------------------------------| | **** | | | channelTitle | | | title | The title of the video. (String) | | description | A description of the video. (String) | | tags | The tags associated with the video. (String) | | publishedAt | The date and time the video was published. (String) | | viewCount | | | **...

Facebook

Twitter

Click to copy link

Link copied

Cite

Richard Ferrers; Speedtest Global Index (2023). Speedtest Open Data - Four International cities - MEL, BKK, SHG, LAX plus ALC - 2020, 2022 [Dataset]. http://doi.org/10.6084/m9.figshare.13621169.v24

Speedtest Open Data - Four International cities - MEL, BKK, SHG, LAX plus ALC - 2020, 2022

Explore at:

txtAvailable download formats

Unique identifier

https://doi.org/10.6084/m9.figshare.13621169.v24

Dataset updated

May 30, 2023

Dataset provided by

Figsharehttp://figshare.com/

Authors

Richard Ferrers; Speedtest Global Index

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

This dataset compares four cities FIXED-line broadband internet speeds: - Melbourne, AU - Bangkok, TH - Shanghai, CN - Los Angeles, US - Alice Springs, AU

ERRATA: 1.Data is for Q3 2020, but some files are labelled incorrectly as 02-20 of June 20. They all should read Sept 20, or 09-20 as Q3 20, rather than Q2. Will rename and reload. Amended in v7.

LAX file named 0320, when should be Q320. Amended in v8.

*lines of data for each geojson file; a line equates to a 600m^2 location, inc total tests, devices used, and average upload and download speed - MEL 16181 locations/lines => 0.85M speedtests (16.7 tests per 100people) - SHG 31745 lines => 0.65M speedtests (2.5/100pp) - BKK 29296 lines => 1.5M speedtests (14.3/100pp) - LAX 15899 lines => 1.3M speedtests (10.4/100pp) - ALC 76 lines => 500 speedtests (2/100pp)

Geojsons of these 2* by 2* extracts for MEL, BKK, SHG now added, and LAX added v6. Alice Springs added v15.

This dataset unpacks, geospatially, data summaries provided in Speedtest Global Index (linked below). See Jupyter Notebook (*.ipynb) to interrogate geo data. See link to install Jupyter.

** To Do Will add Google Map versions so everyone can see without installing Jupyter. - Link to Google Map (BKK) added below. Key:Green > 100Mbps(Superfast). Black > 500Mbps (Ultrafast). CSV provided. Code in Speedtestv1.1.ipynb Jupyter Notebook. - Community (Whirlpool) surprised [Link: https://whrl.pl/RgAPTl] that Melb has 20% at or above 100Mbps. Suggest plot Top 20% on map for community. Google Map link - now added (and tweet).

** Python melb = au_tiles.cx[144:146 , -39:-37] #Lat/Lon extract shg = tiles.cx[120:122 , 30:32] #Lat/Lon extract bkk = tiles.cx[100:102 , 13:15] #Lat/Lon extract lax = tiles.cx[-118:-120, 33:35] #lat/Lon extract ALC=tiles.cx[132:134, -22:-24] #Lat/Lon extract

Histograms (v9), and data visualisations (v3,5,9,11) will be provided. Data Sourced from - This is an extract of Speedtest Open data available at Amazon WS (link below - opendata.aws).

**VERSIONS v.24 Add tweet and google map of Top 20% (over 100Mbps locations) in Mel Q322. Add v.1.5 MEL-Superfast notebook, and CSV of results (now on Google Map; link below). v23. Add graph of 2022 Broadband distribution, and compare 2020 - 2022. Updated v1.4 Jupyter notebook. v22. Add Import ipynb; workflow-import-4cities. v21. Add Q3 2022 data; five cities inc ALC. Geojson files. (2020; 4.3M tests 2022; 2.9M tests)

Melb 14784 lines Avg download speed 69.4M Tests 0.39M

SHG 31207 lines Avg 233.7M Tests 0.56M

ALC 113 lines Avg 51.5M Test 1092

BKK 29684 lines Avg 215.9M Tests 1.2M

LAX 15505 lines Avg 218.5M Tests 0.74M

v20. Speedtest - Five Cities inc ALC. v19. Add ALC2.ipynb. v18. Add ALC line graph. v17. Added ipynb for ALC. Added ALC to title.v16. Load Alice Springs Data Q221 - csv. Added Google Map link of ALC. v15. Load Melb Q1 2021 data - csv. V14. Added Melb Q1 2021 data - geojson. v13. Added Twitter link to pics. v12 Add Line-Compare pic (fastest 1000 locations) inc Jupyter (nbn-intl-v1.2.ipynb). v11 Add Line-Compare pic, plotting Four Cities on a graph. v10 Add Four Histograms in one pic. v9 Add Histogram for Four Cities. Add NBN-Intl.v1.1.ipynb (Jupyter Notebook). v8 Renamed LAX file to Q3, rather than 03. v7 Amended file names of BKK files to correctly label as Q3, not Q2 or 06. v6 Added LAX file. v5 Add screenshot of BKK Google Map. v4 Add BKK Google map(link below), and BKK csv mapping files. v3 replaced MEL map with big key version. Prev key was very tiny in top right corner. v2 Uploaded MEL, SHG, BKK data and Jupyter Notebook v1 Metadata record

** LICENCE AWS data licence on Speedtest data is "CC BY-NC-SA 4.0", so use of this data must be: - non-commercial (NC) - reuse must be share-alike (SA)(add same licence). This restricts the standard CC-BY Figshare licence.

** Other uses of Speedtest Open Data; - see link at Speedtest below.

Clear search

Close search

Google apps

Main menu

Speedtest Open Data - Four International cities - MEL, BKK, SHG, LAX plus...

Melb 14784 lines Avg download speed 69.4M Tests 0.39M

SHG 31207 lines Avg 233.7M Tests 0.56M

ALC 113 lines Avg 51.5M Test 1092

BKK 29684 lines Avg 215.9M Tests 1.2M

LAX 15505 lines Avg 218.5M Tests 0.74M

JavaScript code for retrieval of MODIS Collection 6 NDSI snow cover at...

Types, open citations, closed citations, publishers, and participation...

Reporting behavior from WHO COVID-19 public data

Cognitive Fatigue

The S&M-HSTPM2d5 dataset: High Spatial-Temporal Resolution PM 2.5 Measures...

Renewable power plants

Effects of community management on user activity in online communities

Data from: Data and code from: Cultivation and dynamic cropping processes...

Sample Park Analysis

Data from: T1DiabetesGranada: a longitudinal multi-modal dataset of type 1...

Data set from: Rates of Compact Object Coalescences

National Water Model HydroLearn Python Notebooks

Replication Package: Unboxing Default Argument Breaking Changes in 1 + 2...

Replication Data for: Formal Models of Opinion Formation and their...

Time series

Notebook to get the indices of National Water Model grid cells containing...

OpenOrca

Open-Orca Augmented FLAN Dataset

Unlocking Advanced Language Understanding and ML Model Performance

About this dataset

More Datasets

Featured Notebooks

How to use the dataset

Research Ideas

Acknowledgements

License

Data from: Coherent X-ray Scattering Reveals Nanoscale Fluctuations in...

Youtube Quality Videos Classification

Youtube Quality Videos Classification

How to Tell If a Video is Good or Bad

About this dataset

How to use the dataset

Research Ideas

Acknowledgements

License

Columns

Speedtest Open Data - Four International cities - MEL, BKK, SHG, LAX plus ALC - 2020, 2022

Melb 14784 lines Avg download speed 69.4M Tests 0.39M

SHG 31207 lines Avg 233.7M Tests 0.56M

ALC 113 lines Avg 51.5M Test 1092

BKK 29684 lines Avg 215.9M Tests 1.2M

LAX 15505 lines Avg 218.5M Tests 0.74M