25 datasets found

csv file for jupyter notebook
figshare.com
txt
Updated Nov 21, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Johanna Schultz (2022). csv file for jupyter notebook [Dataset]. http://doi.org/10.6084/m9.figshare.21590175.v1
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.21590175.v1
Dataset updated
Nov 21, 2022
Dataset provided by
Figsharehttp://figshare.com/
Authors
Johanna Schultz
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
df_force_kin_filtered.csv is the data sheet used for the DATA3 python notebook to analyse kinematics and dynamics combined. It contains the footfalls that hava data for both: kinematics and dynamics. To see how this file is generated, read the first half of the jupyter notebook
Speedtest Open Data - Four International cities - MEL, BKK, SHG, LAX plus...
figshare.com
txt
Updated May 30, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Richard Ferrers; Speedtest Global Index (2023). Speedtest Open Data - Four International cities - MEL, BKK, SHG, LAX plus ALC - 2020, 2022 [Dataset]. http://doi.org/10.6084/m9.figshare.13621169.v24
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.13621169.v24
Dataset updated
May 30, 2023
Dataset provided by
Figsharehttp://figshare.com/
Authors
Richard Ferrers; Speedtest Global Index
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset compares four cities FIXED-line broadband internet speeds: - Melbourne, AU - Bangkok, TH - Shanghai, CN - Los Angeles, US - Alice Springs, AU

ERRATA: 1.Data is for Q3 2020, but some files are labelled incorrectly as 02-20 of June 20. They all should read Sept 20, or 09-20 as Q3 20, rather than Q2. Will rename and reload. Amended in v7.

LAX file named 0320, when should be Q320. Amended in v8.

*lines of data for each geojson file; a line equates to a 600m^2 location, inc total tests, devices used, and average upload and download speed - MEL 16181 locations/lines => 0.85M speedtests (16.7 tests per 100people) - SHG 31745 lines => 0.65M speedtests (2.5/100pp) - BKK 29296 lines => 1.5M speedtests (14.3/100pp) - LAX 15899 lines => 1.3M speedtests (10.4/100pp) - ALC 76 lines => 500 speedtests (2/100pp)

Geojsons of these 2* by 2* extracts for MEL, BKK, SHG now added, and LAX added v6. Alice Springs added v15.

This dataset unpacks, geospatially, data summaries provided in Speedtest Global Index (linked below). See Jupyter Notebook (*.ipynb) to interrogate geo data. See link to install Jupyter.

** To Do Will add Google Map versions so everyone can see without installing Jupyter. - Link to Google Map (BKK) added below. Key:Green > 100Mbps(Superfast). Black > 500Mbps (Ultrafast). CSV provided. Code in Speedtestv1.1.ipynb Jupyter Notebook. - Community (Whirlpool) surprised [Link: https://whrl.pl/RgAPTl] that Melb has 20% at or above 100Mbps. Suggest plot Top 20% on map for community. Google Map link - now added (and tweet).

** Python melb = au_tiles.cx[144:146 , -39:-37] #Lat/Lon extract shg = tiles.cx[120:122 , 30:32] #Lat/Lon extract bkk = tiles.cx[100:102 , 13:15] #Lat/Lon extract lax = tiles.cx[-118:-120, 33:35] #lat/Lon extract ALC=tiles.cx[132:134, -22:-24] #Lat/Lon extract

Histograms (v9), and data visualisations (v3,5,9,11) will be provided. Data Sourced from - This is an extract of Speedtest Open data available at Amazon WS (link below - opendata.aws).

**VERSIONS v.24 Add tweet and google map of Top 20% (over 100Mbps locations) in Mel Q322. Add v.1.5 MEL-Superfast notebook, and CSV of results (now on Google Map; link below). v23. Add graph of 2022 Broadband distribution, and compare 2020 - 2022. Updated v1.4 Jupyter notebook. v22. Add Import ipynb; workflow-import-4cities. v21. Add Q3 2022 data; five cities inc ALC. Geojson files. (2020; 4.3M tests 2022; 2.9M tests)

Melb 14784 lines Avg download speed 69.4M Tests 0.39M

SHG 31207 lines Avg 233.7M Tests 0.56M

ALC 113 lines Avg 51.5M Test 1092

BKK 29684 lines Avg 215.9M Tests 1.2M

LAX 15505 lines Avg 218.5M Tests 0.74M

v20. Speedtest - Five Cities inc ALC. v19. Add ALC2.ipynb. v18. Add ALC line graph. v17. Added ipynb for ALC. Added ALC to title.v16. Load Alice Springs Data Q221 - csv. Added Google Map link of ALC. v15. Load Melb Q1 2021 data - csv. V14. Added Melb Q1 2021 data - geojson. v13. Added Twitter link to pics. v12 Add Line-Compare pic (fastest 1000 locations) inc Jupyter (nbn-intl-v1.2.ipynb). v11 Add Line-Compare pic, plotting Four Cities on a graph. v10 Add Four Histograms in one pic. v9 Add Histogram for Four Cities. Add NBN-Intl.v1.1.ipynb (Jupyter Notebook). v8 Renamed LAX file to Q3, rather than 03. v7 Amended file names of BKK files to correctly label as Q3, not Q2 or 06. v6 Added LAX file. v5 Add screenshot of BKK Google Map. v4 Add BKK Google map(link below), and BKK csv mapping files. v3 replaced MEL map with big key version. Prev key was very tiny in top right corner. v2 Uploaded MEL, SHG, BKK data and Jupyter Notebook v1 Metadata record

** LICENCE AWS data licence on Speedtest data is "CC BY-NC-SA 4.0", so use of this data must be: - non-commercial (NC) - reuse must be share-alike (SA)(add same licence). This restricts the standard CC-BY Figshare licence.

** Other uses of Speedtest Open Data; - see link at Speedtest below.
v
Update CSV item in ArcGIS
anrgeodata.vermont.gov
Updated Mar 18, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
ArcGIS Survey123 (2022). Update CSV item in ArcGIS [Dataset]. https://anrgeodata.vermont.gov/documents/dc69467c3e7243719c9125679bbcee9b
Explore at:
Dataset updated
Mar 18, 2022
Dataset authored and provided by
ArcGIS Survey123
Description
ArcGIS Survey123 utilizes CSV data in several workflows, including external choice lists, the search() appearance, and pulldata() calculations. When you need to periodically update the CSV content used in a survey, a useful method is to upload the CSV files to your ArcGIS organization and link the CSV items to your survey. Once linked, any updates to the CSV items will automatically pull through to your survey without the need to republish the survey. To learn more about linking items to a survey, see Linked content.This notebook demonstrates how to automate updating a CSV item in your ArcGIS organization.Note: It is recommended to run this notebook on your computer in Jupyter Notebook or ArcGIS Pro, as that will provide the best experience when reading locally stored CSV files. If you intend to schedule this notebook in ArcGIS Online or ArcGIS Notebook Server, additional configuration may be required to read CSV files from online file storage, such as Microsoft OneDrive or Google Drive.
Amazon Web Scrapping Dataset
kaggle.com
zip
Updated Jun 17, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mohammad Hurairah (2023). Amazon Web Scrapping Dataset [Dataset]. https://www.kaggle.com/datasets/mohammadhurairah/amazon-web-scrapper-dataset
Explore at:
zip(2220 bytes)Available download formats
Dataset updated
Jun 17, 2023
Authors
Mohammad Hurairah
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Amazon Scrapping Dataset; 1. Import libraries 2. Connect to the website 3. Import CSV and datetime 4. Import pandas 5. Appending dataset to CSV 6. Automation Dataset updated 7. Timers setup 8. Email notification
H
JavaScript code for retrieval of MODIS Collection 6 NDSI snow cover at...
beta.hydroshare.org
hydroshare.org
+1more
zip
Updated Feb 11, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Irene Garousi-Nejad; David Tarboton (2022). JavaScript code for retrieval of MODIS Collection 6 NDSI snow cover at SNOTEL sites and a Jupyter Notebook to merge/reprocess data [Dataset]. http://doi.org/10.4211/hs.d287f010b2dd48edb0573415a56d47f8
Explore at:
zip(52.2 KB)Available download formats
Unique identifier
https://doi.org/10.4211/hs.d287f010b2dd48edb0573415a56d47f8
Dataset updated
Feb 11, 2022
Dataset provided by
HydroShare
Authors
Irene Garousi-Nejad; David Tarboton
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered

Description
This JavaScript code has been developed to retrieve NDSI_Snow_Cover from MODIS version 6 for SNOTEL sites using the Google Earth Engine platform. To successfully run the code, you should have a Google Earth Engine account. An input file, called NWM_grid_Western_US_polygons_SNOTEL_ID.zip, is required to run the code. This input file includes 1 km grid cells of the NWM containing SNOTEL sites. You need to upload this input file to the Assets tap in the Google Earth Engine code editor. You also need to import the MOD10A1.006 Terra Snow Cover Daily Global 500m collection to the Google Earth Engine code editor. You may do this by searching for the product name in the search bar of the code editor.

The JavaScript works for s specified time range. We found that the best period is a month, which is the maximum allowable time range to do the computation for all SNOTEL sites on Google Earth Engine. The script consists of two main loops. The first loop retrieves data for the first day of a month up to day 28 through five periods. The second loop retrieves data from day 28 to the beginning of the next month. The results will be shown as graphs on the right-hand side of the Google Earth Engine code editor under the Console tap. To save results as CSV files, open each time-series by clicking on the button located at each graph's top right corner. From the new web page, you can click on the Download CSV button on top.

Here is the link to the script path: https://code.earthengine.google.com/?scriptPath=users%2Figarousi%2Fppr2-modis%3AMODIS-monthly

Then, run the Jupyter Notebook (merge_downloaded_csv_files.ipynb) to merge the downloaded CSV files that are stored for example in a folder called output/from_GEE into one single CSV file which is merged.csv. The Jupyter Notebook then applies some preprocessing steps and the final output is NDSI_FSCA_MODIS_C6.csv.
Galaxy Training Material for the 'Use Jupyter notebooks in Galaxy' tutorial
zenodo.org
csv
Updated Apr 22, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Delphine Lariviere; Delphine Lariviere; Teresa Müller; Teresa Müller (2025). Galaxy Training Material for the 'Use Jupyter notebooks in Galaxy' tutorial [Dataset]. http://doi.org/10.5281/zenodo.15263830
Explore at:
csvAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.15263830
Dataset updated
Apr 22, 2025
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Delphine Lariviere; Delphine Lariviere; Teresa Müller; Teresa Müller
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset was originally curated by Software Carpentry, a branch of The Carpentries non-profit organization, and is based on data from the Gapminder Foundation. It consists of six tabular CSV files containing GDP data for various countries across different years. The dataset was initially prepared for the Software Carpentry tutorial "Plotting and Programming in Python" and is also reused in the Galaxy Training Network (GTN) tutorial "Use Jupyter Notebooks in Galaxy."

This GTN tutorial provides an introduction to launching a Jupyter Notebook in Galaxy, installing dependencies, and importing and exporting data. It serves as a setup guide for a Jupyter Notebook environment that can be used to follow the Software Carpentry tutorial "Plotting and Programming in Python."
Chicago Data Portal
kaggle.com
zip
Updated Dec 8, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
David (2020). Chicago Data Portal [Dataset]. https://www.kaggle.com/zhaodianwen/chicago-data-portal
Explore at:
zip(125083 bytes)Available download formats
Dataset updated
Dec 8, 2020
Authors
David
Description
Assignment Topic: In this assignment, you will download the datasets provided, load them into a database, write and execute SQL queries to answer the problems provided, and upload a screenshot showing the correct SQL query and result for review by your peers. A Jupyter notebook is provided in the preceding lesson to help you with the process.

This assignment involves 3 datasets for the city of Chicago obtained from the Chicago Data Portal:

Chicago Socioeconomic Indicators

This dataset contains a selection of six socioeconomic indicators of public health significance and a hardship index, by Chicago community area, for the years 2008 – 2012.

Chicago Public Schools

This dataset shows all school level performance data used to create CPS School Report Cards for the 2011-2012 school year.

Chicago Crime Data

This dataset reflects reported incidents of crime (with the exception of murders where data exists for each victim) that occurred in the City of Chicago from 2001 to present, minus the most recent seven days.

Instructions:

Review the datasets

Before you begin, you will need to become familiar with the datasets. Snapshots for the three datasets in .CSV format can be downloaded from the following links:

Chicago Socioeconomic Indicators: Click here

Chicago Public Schools: Click here

Chicago Crime Data: Click here

NOTE: Ensure you have downloaded the datasets using the links above instead of directly from the Chicago Data Portal. The versions linked here are subsets of the original datasets and have some of the column names modified to be more database friendly which will make it easier to complete this assignment. The CSV file provided above for the Chicago Crime Data is a very small subset of the full dataset available from the Chicago Data Portal. The original dataset is over 1.55GB in size and contains over 6.5 million rows. For the purposes of this assignment you will use a much smaller sample with only about 500 rows.

Load the datasets into a database

Perform this step using the LOAD tool in the Db2 console. You will need to create 3 tables in the database, one for each dataset, named as follows, and then load the respective .CSV file into the table:

CENSUS_DATA

CHICAGO_PUBLIC_SCHOOLS

CHICAGO_CRIME_DATA
d
Using HydroShare Buckets to Access Resource Files
search.dataone.org
Updated Aug 9, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Pabitra Dash (2025). Using HydroShare Buckets to Access Resource Files [Dataset]. https://search.dataone.org/view/sha256%3Ab25a0f5e5d62530d70ecd6a86f1bd3fa2ab804a8350dc7ba087327839fcb1fb1
Explore at:
Dataset updated
Aug 9, 2025
Dataset provided by
Hydroshare
Authors
Pabitra Dash
Description
This resource contains a draft Jupyter Notebook that has example code snippets showing how to access HydroShare resource files using HydroShare S3 buckets. The user_account.py is a utility to read user hydroshare cached account information in any of the JupyterHub instances that HydroShare has access to. The example notebook uses this utility so that you don't have to enter your hydroshare account information in order to access hydroshare buckets.

Here are the 3 notebooks in this resource:

hydroshare_s3_bucket_access_examples.ipynb:

The above notebook has examples showing how to upload/download resource files from the resource bucket. It also contains examples how to list files and folders of a resource in a bucket.

python-modules-direct-read-from-bucket/hs_bucket_access_gdal_example.ipynb:

The above notebook has examples for reading raster and shapefile from bucket using gdal without the need of downloading the file from the bucket to local disk.

python-modules-direct-read-from-bucket/hs_bucket_access_non_gdal_example.ipynb

The above notebook has examples of using h5netcdf and xarray for reading netcdf file directly from bucket. It also contains examples of using rioxarray to read raster file, and pandas to read CSV file from hydroshare buckets.
OpenOrca
kaggle.com
opendatalab.com
+1more
zip
Updated Nov 22, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The Devastator (2023). OpenOrca [Dataset]. https://www.kaggle.com/datasets/thedevastator/open-orca-augmented-flan-dataset/versions/2
Explore at:
zip(2548102631 bytes)Available download formats
Dataset updated
Nov 22, 2023
Authors
The Devastator
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Open-Orca Augmented FLAN Dataset

Unlocking Advanced Language Understanding and ML Model Performance

By Huggingface Hub [source]

About this dataset

The Open-Orca Augmented FLAN Collection is a revolutionary dataset that unlocks new levels of language understanding and machine learning model performance. This dataset was created to support research on natural language processing, machine learning models, and language understanding through leveraging the power of reasoning trace-enhancement techniques. By enabling models to understand complex relationships between words, phrases, and even entire sentences in a more robust way than ever before, this dataset provides researchers expanded opportunities for furthering the progress of linguistics research. With its unique combination of features including system prompts, questions from users and responses from systems, this dataset opens up exciting possibilities for deeper exploration into the cutting edge concepts underlying advanced linguistics applications. Experience a new level of accuracy and performance - explore Open-Orca Augmented FLAN Collection today!

More Datasets

For more datasets, click here.

Featured Notebooks

🚨 Your notebook can be here! 🚨!

How to use the dataset

This guide provides an introduction to the Open-Orca Augmented FLAN Collection dataset and outlines how researchers can utilize it for their language understanding and natural language processing (NLP) work. The Open-Orca dataset includes system prompts, questions posed by users, and responses from the system.

Getting Started The first step is to download the data set from Kaggle at https://www.kaggle.com/openai/open-orca-augmented-flan and save it in a project directory of your choice on your computer or cloud storage space. Once you have downloaded the data set, launch your ‘Jupyter Notebook’ or ‘Google Colab’ program with which you want to work with this data set.

Exploring & Preprocessing Data: To get a better understanding of the features in this dataset, import them into Pandas DataFrame as shown below. You can use other libraries as per your need:

import pandas as pd # Library used for importing datasets into Python df = pd.read_csv('train.csv') #Imports train csv file into Pandas};#DataFrame df[['system_prompt','question','response']].head() #Views top 5 rows with columns 'system_prompt','question','response'

After importing check each feature using basic descriptive statistics such Pandas groupby statement: We can use groupby statements to have greater clarity over the variables present in each feature(elements). The below command will show counts of each element in System Prompt column present under train CVS file :

df['system prompt'].value_counts().head()#shows count of each element present under 'System Prompt'column Output: User says hello guys 587 <br>System asks How are you?: 555 times<br>User says I am doing good: 487 times <br>..and so on

Data Transformation: After inspecting & exploring different features one may want/need certain changes that best suits their needs from this dataset before training modeling algorithms on it.
Common transformation steps include : Removing punctuation marks : Since punctuation marks may not add any value to computation operations , we can remove them using regex functions write .replace('[^A-Za -z]+','' ) as

Research Ideas

Automated Question Answering: Leverage the dataset to train and develop question answering models that can provide tailored answers to specific user queries while retaining language understanding abilities.

Natural Language Understanding: Use the dataset as an exploratory tool for fine-tuning natural language processing applications, such as sentiment analysis, document categorization, parts-of-speech tagging and more.

Machine Learning Optimizations: The dataset can be used to build highly customized machine learning pipelines that allow users to harness the power of conditioning data with pre-existing rules or models for improved accuracy and performance in automated tasks

Acknowledgements

If you use this dataset in your research, please credit the original authors. Data Source

License

License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. [See Other Information](ht...
Z
Gaussian Process Model and Sensor Placement for Detroit Green...
data.niaid.nih.gov
Updated Feb 23, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mason, Brooke; Kerkez, Branko (2023). Gaussian Process Model and Sensor Placement for Detroit Green Infrastructure: Datasets and Code [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7667043
Explore at:
Dataset updated
Feb 23, 2023
Dataset provided by
University of Michigan
Authors
Mason, Brooke; Kerkez, Branko
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Detroit
Description
code.zip: Zip folder containing a folder titled "code" which holds:

csv file titled "MonitoredRainGardens.csv" containing the 14 monitored green infrastructure (GI) sites with their design and physiographic features;

csv file titled "storm_constants.csv" which contain the computed decay constants for every storm in every GI during the measurement period;

csv file titled "newGIsites_AllData.csv" which contain the other 130 GI sites in Detroit and their design and physiographic features;

csv file titled "Detroit_Data_MeanDesignFeatures.csv" which contain the design and physiographic features for all of Detroit;

Jupyter notebook titled "GI_GP_SensorPlacement.ipynb" which provides the code for training the GP models and displaying the sensor placement results;

a folder titled "MATLAB" which contains the following:

folder titled "SFO" which contains the SFO toolbox for the sensor placement work

file titled "sensor_placement.mlx" that contains the code for the sensor placement work

several .mat files created in Python for importing into Matlab for the sensor placement work: "constants_sigma.mat", "constants_coords.mat", "GInew_sigma.mat", "GInew_coords.mat", and "R1_sensor.mat" through "R6_sensor.mat"

several .mat files created in Matalb for importing into Python for visualizing the results: "MI_DETselectedGI.mat" and "DETselectedGI.mat"
FiN-2: Larg-Scale Powerline Communication Dataset (Pt.1)
zenodo.org
bin, png, zip
Updated Jul 11, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Christoph Balada; Christoph Balada; Max Bondorf; Sheraz Ahmed; Andreas Dengel; Andreas Dengel; Markus Zdrallek; Max Bondorf; Sheraz Ahmed; Markus Zdrallek (2024). FiN-2: Larg-Scale Powerline Communication Dataset (Pt.1) [Dataset]. http://doi.org/10.5281/zenodo.8328113
Explore at:
bin, zip, pngAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.8328113
Dataset updated
Jul 11, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Christoph Balada; Christoph Balada; Max Bondorf; Sheraz Ahmed; Andreas Dengel; Andreas Dengel; Markus Zdrallek; Max Bondorf; Sheraz Ahmed; Markus Zdrallek
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
# FiN-2 Large-Scale Real-World PLC-Dataset

## About
#### FiN-2 dataset in a nutshell:
FiN-2 is the first large-scale real-world dataset on data collected in a powerline communication infrastructure. Since the electricity grid is inherently a graph, our dataset could be interpreted as a graph dataset. Therefore, we use the word node to describe points (cable distribution cabinets) of measurement within the low-voltage electricity grid and the word edge to describe connections (cables) in between them. However, since these are PLC connections, an edge does not necessarily have to correspond to a real cable; more on this in our paper.
FiN-2 shows measurements that relate to the nodes (voltage, total harmonic distortion) as well as to the edges (signal-to-noise ratio spectrum, tonemap). In total, FiN-2 is distributed across three different sites with a total of 1,930,762,116 node measurements each for the individual features and 638,394,025 edge measurements each for all 917 PLC channels. All data was collected over a 25-month period from mid-2020 to the end of 2022.
We propose this dataset to foster research in the domain of grid automation and smart grid. Therefore, we provide different example use cases in asset management, grid state visualization, forecasting, predictive maintenance, and novelty detection. For more decent information on this dataset, please see our [paper](https://arxiv.org/abs/2209.12693).

* * *
## Content
FiN-2 dataset splits up into two compressed `csv-Files`: *nodes.csv* and *edges.csv*.

All files are provided as a compressed ZIP file and are divided into four parts. The first part can be found in this repo, while the remaining parts can be found in the following:
- https://zenodo.org/record/8328105
- https://zenodo.org/record/8328108
- https://zenodo.org/record/8328111

### Node data

| id | ts | v1 | v2 | v3 | thd1 | thd2 | thd3 | phase_angle1 | phase_angle2 | phase_angle3 | temp |
|----|----|----|----|----|----|----|----|----|----|----|----|----|----|
|112|1605530460|236.5|236.4|236.0|2.9|2.5|2.4|120.0|119.8|120.0|35.3|
|112|1605530520|236.9|236.6|236.6|3.1|2.7|2.5|120.1|119.8|120.0|35.3|
|112|1605530580|236.2|236.4|236.0|3.1|2.7|2.5|120.0|120.0|119.9|35.5|

- id / ts: Unique identifier of the node that is measured and timestemp of the measurement
- v1/v2/v3: Voltage measurements of all three phases
- thd1/thd2/thd3: Total harmonic distortion of all three phases
- phase_angle1/2/3: Phase angle of all three phases
- temp: Temperature in-circuit of the sensor inside a cable distribution unit (in °C)

### Edge data
| src | dst | ts | snr0 | snr1 | snr2 | ... | snr916 |
|----|----|----|----|----|----|----|----|
|62|94|1605528900|70|72|45|...|-53|
|62|32|1605529800|16|24|13|...|-51|
|17|94|1605530700|37|25|24|...|-55|

- src & dst & ts: Unique identifier of the source and target nodes where the spectrum is measured and time of measurement
- snr0/snr1/.../snr916: 917 SNR measurements in tenths of a decibel (e.g. 50 --> 5dB).

### Metadata
Metadata that is provided along with the data covers:

- Number of cable joints
- Cable properties (length, type, number of sections)
- Relative position of the nodes (location, zero-centered gps)
- Adjacent PV or wallbox installations
- Year of installation w.r.t. the nodes and cables

Since the electricity grid is part of the critical infrastructure, it is not possible to provide exact GPS locations.

* * *
## Usage
Simple data access using pandas:

```
import pandas as pd

nodes_file = "nodes.csv.gz" # /path/to/nodes.csv.gz
edges_file = "edges.csv.gz" # /path/to/edges.csv.gz

# read the first 10 rows
data = pd.read_csv(nodes_file, nrows=10, compression='gzip')

# read the row number 5 to 15
data = pd.read_csv(nodes_file, nrows=10, skiprows=[i for i in range(1,6)], compression='gzip')

# ... same for the edges
```

Compressed csv-data format was used to make sharing as easy as possible, however it comes with significant drawbacks for machine learning. Due to the inherent graph structure, a single snapshot of the whole graph consists of a set of node and edge measurements. But due to timeouts, noise and other disturbances, nodes sometimes fail in collecting the data, wherefore the number of measurements for a specific timestamp differs. This, plus the high sparsity of the graph, leads to a high inefficiency when using the csv-format for an ML training.
To utilize the data in an ML pipeline, we recommend other data formats like [datadings](https://datadings.readthedocs.io/en/latest/) or specialized database solutions like [VictoriaMetrics](https://victoriametrics.com/).

### Example use case (voltage forecasting)

Forecasting of the voltage is one potential use cases. The Jupyter notebook provided in the repository gives an overview of how the dataset can be loaded, preprocessed and used for ML training. Thereby, a MinMax scaling was used as simple preprocessing and a PyTorch dataset class was created to handle the data. Furthermore, a vanilla autoencoder is utilized to process and forecast the voltage into the future.
Uniquely Popular Businesses
kaggle.com
zip
Updated Jan 22, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The Devastator (2023). Uniquely Popular Businesses [Dataset]. https://www.kaggle.com/datasets/thedevastator/uniquely-popular-businesses
Explore at:
zip(48480 bytes)Available download formats
Dataset updated
Jan 22, 2023
Authors
The Devastator
Description
Uniquely Popular Businesses

Rankings of Business Categories in Seattle & NYC Neighborhoods

By data.world's Admin [source]

About this dataset

This dataset contains data used to analyze the uniquely popular business types in the neighborhoods of Seattle and New York City. We used publically available neighborhood-level shapefiles to identify neighborhoods, and then crossed that information against Yelp's Business Category API to find businesses operating within each neighborhood. The ratio of businesses from each category was studied in comparison to their ratios in the entire city to determine any significant differences between each borough.

Any single business with more than one category was repeated for each one, however none of them were ever recorded twice for any single category. Moreover, if a certain business type didn't make up at least 1% of a particular neighborhood's businesses overall it was removed from the analysis altogether.

The data available here is free to use under MIT license, with appropriate attribution given back to Yelp for providing this information. It is an invaluable resource for researchers across different disciplines looking into consumer behavior or clustering within urban areas!

More Datasets

For more datasets, click here.

Featured Notebooks

🚨 Your notebook can be here! 🚨!

How to use the dataset

How to Use This Dataset

To get started using this dataset: - Download the appropriate file for the area you’re researching - either salt5_Seattle.csv or top5_NewYorkCity.csv - from the Kaggle site which hosts this dataset (https://www.kaggle.com/puddingmagazine/uniquely-popular-businesses). - Read through each columns information available under Columns section associated with this kaggle description (above).
- Take note of columns that are relevant to your analysis such as nCount which indicates the number of businesses in a neighborhood, rank which shows how popular that business type is overall and neighborhoodTotal which specifies total number of businesses in a particular neighborhood etc.,
- ) Load your selected file into an application designed for data analysis such as Jupyter Notebook, Microsoft Excel, Power BI etc.,
- ) Begin performing various analyses related to understanding where certain types of unique business are most common by subsetting rows based on specific neighborhoods; alternatively perform regressions-based analyses related to trends similar unique type's ranks over multiple neighborhoods etc.,

If you have any questions about interpreting data from this source please reach out if needed!

Research Ideas

Analyzing the unique business trends in Seattle and New York City to identify potential investment opportunities.

Creating a tool that helps businesses understand what local competitions they face by neighborhood.

Exploring the distinctions between neighborhoods by plotting out the different businesses they have in comparison with each other and other cities

Acknowledgements

If you use this dataset in your research, please credit the original authors. Data Source

License

See the dataset description for more information.

Columns

File: top5_Seattle.csv | Column name | Description | |:----------------------|:----------------------------------------------------------------------------------------------------------------------------------| | neighborhood | Name of the neighborhood. (String) | | yelpAlias | The Yelp-specified Alias for the business type. (String) | | yelpTitle | The Title given to this business type by Yelp. (String) | | nCount | Number of businesses with this type within a particular neighborhood. (Integer) | | neighborhoodTotal | Total number of businesses located within that particular region. (Integer) | | cCount | Number of businesses with this storefront within an entire city. (Integer) | | cityTotal | Total number of all types of storefronts within an entire city. (Integer) ...
H
Linking soil structure, SHPs, and soc dynamics to study the impact of...
beta.hydroshare.org
hydroshare.org
+1more
zip
Updated Jan 12, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Achla Jha (2023). Linking soil structure, SHPs, and soc dynamics to study the impact of climate change and land management [Dataset]. http://doi.org/10.4211/hs.6e4f08d8380a49f99314bae8a7ac41e2
Explore at:
zip(1.1 MB)Available download formats
Unique identifier
https://doi.org/10.4211/hs.6e4f08d8380a49f99314bae8a7ac41e2
Dataset updated
Jan 12, 2023
Dataset provided by
HydroShare
Authors
Achla Jha
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This resource includes the Python scripts of the modeling framework, simulation result CSV files, and the figures for the manuscript entitled 'Linking Soil Structure, Hydraulic Properties, and Organic Carbon Dynamics: A Holistic Framework to Study the Impact of Climate Change and Land Management.' results_simulations_constant_porosity.csv file contains the simulation results of the modeling framework at constant porosity. results_simulations_dynamic_porosity.csv file contains the simulation results of the modeling framework at dynamic porosity. model_constant_phi is the python script with the description of the modeling framework at a constant porosity that needs to be imported and run along with the simulation constant porosity Jupyter notebook to obtain the simulation results at constant porosity. model_dynamic_phi is the python script describing the modeling framework at a dynamic porosity that needs to be imported and run along with the simulation dynamic porosity notebook to obtain the simulation results at dynamic porosity.

Figure 3 and Figure 4 Jupyter notebooks have the codes for those plots in the manuscript.

To find the updated version of the codes, visit the GitHub link: https://github.com/Achla-Jha/Soil-Structure.git.
Cognitive Fatigue
figshare.com
csv
Updated Nov 5, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Rui Varandas; Inês Silveira; Hugo Gamboa (2025). Cognitive Fatigue [Dataset]. http://doi.org/10.6084/m9.figshare.28188143.v3
Explore at:
csvAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.28188143.v3
Dataset updated
Nov 5, 2025
Dataset provided by
Figsharehttp://figshare.com/
Authors
Rui Varandas; Inês Silveira; Hugo Gamboa
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Cognitive FatigueWhile executing the proposed tasks, the participants’ physiological signals were monitored using two biosignalsplux devices from PLUX Wireless Biosignals, Lisbon, Portugal, with a sampling frequency of 100 Hz a resolution of 16 bits (24 bits in the case of fNIRS). Six different sensors were used: EEG and fNIRS positioned around the F7 and F8 of the 10–20 system (dorsolateral prefrontal cortex is often used to assess CW and fatigue as well as cognitive states); ECG monitored an approximation of Lead I of the Einthoven system, EDA placed on the palm of the non-dominant hand; ACC was positioned on the right side of the head to measure head movement and overall posture changes, and the RIP sensor was attached to the upper-abdominal area to measure the respiration cycles—the combination of the three allows to infer about the response of the Autonomic Nervous System (ANS) of the human body, namely, the response of the sympathetic and parasympathetic nervous system.2.1. Experimental designCognitive fatigue (CF) is a phenomenon that arises following the prolonged engagement in mentally demanding cognitive tasks. Thus, we developed an experimental procedure that involved three demanding tasks: a digital lesson in Jupyter Notebook format, three repetitions of Corsi-Block task, and two repetitions of a concentration test.Before the Corsi-Block task and after the concentration task there were periods of baseline of two min. In our analysis, the first baseline period, although not explicitly present in the dataset, was designated as representing no CF, whereas the final baseline period was designated as representing the presence of CF. Between repetitions of the Corsi-Block task, there were periods of baseline of 15 s after the task and of 30 s before the beginning of each repetition of the task.2.2. Data recordingA data sample of 10 volunteer participants (4 females) aged between 22 and 48 years old (M = 28.2, SD = 7.6) took part in this study. All volunteers were recruited at NOVA School of Science and Technology, fluent in English, right-handed, none reported suffering from psychological disorders, and none reported taking regular medication. Written informed consent was obtained before participating and all Ethical Procedures approved by the Ethics Committee of NOVA University of Lisbon were thoroughly followed.In this study, we omitted the data from one participant due to the insufficient duration of data acquisition.2.3. Data labellingThe labels easy, difficult, very difficult and repeat found in the ECG_lesson_answers.txt files represent the subjects' opinion of the content read in the ECG lesson. The repeat label represents the most difficult level. It's called repeat because when you press it, the answer to the question is shown again. This system is based on the Anki system, which has been proposed and used to memorise information effectively. In addition, the PB description JSON files include timestamps indicating the start and end of cognitive tasks, baseline periods, and other events, which are useful for defining CF states as we defined in 2.1.2.4. Data descriptionBiosignals include EEG, fNIRS (not converted to oxi and deoxiHb), ECG, EDA, respiration (RIP), accelerometer (ACC), and push-button data (PB). All signals have already been converted to physical units. In each biosignal file, the first column corresponds to the timestamps.HCI features encompass keyboard, mouse, and screenshot data. Below is a Python code snippet for extracting screenshot files from the screenshots CSV file.import base64from os import mkdirfrom os.path import joinfile = '...'with open(file, 'r') as f: lines = f.readlines()for line in lines[1:]: timestamp = line.split(',')[0] code = line.split(',')[-1][:-2] imgdata = base64.b64decode(code) filename = str(timestamp) + '.jpeg' mkdir('screenshot') with open(join('screenshot', filename), 'wb') as f: f.write(imgdata)A characterization file containing age and gender information for all subjects in each dataset is provided within the respective dataset folder (e.g., D2_subject-info.csv). Other complementary files include (i) description of the pushbuttons to help segment the signals (e.g., D2_S2_PB_description.json) and (ii) labelling (e.g., D2_S2_ECG_lesson_results.txt). The files D2_Sx_results_corsi-block_board_1.json and D2_Sx_results_corsi-block_board_2.json show the results for the first and second iterations of the corsi-block task, where, for example, row_0_1 = 12 means that the subject got 12 pairs right in the first row of the first board, and row_0_2 = 12 means that the subject got 12 pairs right in the first row of the second board.
d
Data from: Multi-task Deep Learning for Water Temperature and Streamflow...
catalog.data.gov
Updated Nov 11, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. Geological Survey (2025). Multi-task Deep Learning for Water Temperature and Streamflow Prediction (ver. 1.1, June 2022) [Dataset]. https://catalog.data.gov/dataset/multi-task-deep-learning-for-water-temperature-and-streamflow-prediction-ver-1-1-june-2022
Explore at:
Dataset updated
Nov 11, 2025
Dataset provided by
United States Geological Surveyhttp://www.usgs.gov/
Description
This item contains data and code used in experiments that produced the results for Sadler et. al (2022) (see below for full reference). We ran five experiments for the analysis, Experiment A, Experiment B, Experiment C, Experiment D, and Experiment AuxIn. Experiment A tested multi-task learning for predicting streamflow with 25 years of training data and using a different model for each of 101 sites. Experiment B tested multi-task learning for predicting streamflow with 25 years of training data and using a single model for all 101 sites. Experiment C tested multi-task learning for predicting streamflow with just 2 years of training data. Experiment D tested multi-task learning for predicting water temperature with over 25 years of training data. Experiment AuxIn used water temperature as an input variable for predicting streamflow. These experiments and their results are described in detail in the WRR paper. Data from a total of 101 sites across the US was used for the experiments. The model input data and streamflow data were from the Catchment Attributes and Meteorology for Large-sample Studies (CAMELS) dataset (Newman et. al 2014, Addor et. al 2017). The water temperature data were gathered from the National Water Information System (NWIS) (U.S. Geological Survey, 2016). The contents of this item are broken into 13 files or groups of files aggregated into zip files:

input_data_processing.zip: A zip file containing the scripts used to collate the observations, input weather drivers, and catchment attributes for the multi-task modeling experiments

flow_observations.zip: A zip file containing collated daily streamflow data for the sites used in multi-task modeling experiments. The streamflow data were originally accessed from the CAMELs dataset. The data are stored in csv and Zarr formats.

temperature_observations.zip: A zip file containing collated daily water temperature data for the sites used in multi-task modeling experiments. The data were originally accessed via NWIS. The data are stored in csv and Zarr formats.

temperature_sites.geojson: Geojson file of the locations of the water temperature and streamflow sites used in the analysis.

model_drivers.zip: A zip file containing the daily input weather driver data for the multi-task deep learning models. These data are from the Daymet drivers and were collated from the CAMELS dataset. The data are stored in csv and Zarr formats.

catchment_attrs.csv: Catchment attributes collatted from the CAMELS dataset. These data are used for the Random Forest modeling. For full metadata regarding these data see CAMELS dataset.

experiment_workflow_files.zip: A zip file containing workflow definitions used to run multi-task deep learning experiments. These are Snakemake workflows. To run a given experiment, one would run (for experiment A) 'snakemake -s expA_Snakefile --configfile expA_config.yml'

river-dl-paper_v0.zip: A zip file containing python code used to run multi-task deep learning experiments. This code was called by the Snakemake workflows contained in 'experiment_workflow_files.zip'.

random_forest_scripts.zip: A zip file containing Python code and a Python Jupyter Notebook used to prepare data for, train, and visualize feature importance of a Random Forest model.

plotting_code.zip: A zip file containing python code and Snakemake workflow used to produce figures showing the results of multi-task deep learning experiments.

results.zip: A zip file containing results of multi-task deep learning experiments. The results are stored in csv and netcdf formats. The netcdf files were used by the plotting libraries in 'plotting_code.zip'. These files are for five experiments, 'A', 'B', 'C', 'D', and 'AuxIn'. These experiment names are shown in the file name.

sample_scripts.zip: A zip file containing scripts for creating sample output to demonstrate how the modeling workflow was executed.

sample_output.zip: A zip file containing sample output data. Similar files are created by running the sample scripts provided.

A. Newman; K. Sampson; M. P. Clark; A. Bock; R. J. Viger; D. Blodgett, 2014. A large-sample watershed-scale hydrometeorological dataset for the contiguous USA. Boulder, CO: UCAR/NCAR. https://dx.doi.org/10.5065/D6MW2F4D

N. Addor, A. Newman, M. Mizukami, and M. P. Clark, 2017. Catchment attributes for large-sample studies. Boulder, CO: UCAR/NCAR. https://doi.org/10.5065/D6G73C3Q

Sadler, J. M., Appling, A. P., Read, J. S., Oliver, S. K., Jia, X., Zwart, J. A., & Kumar, V. (2022). Multi-Task Deep Learning of Daily Streamflow and Water Temperature. Water Resources Research, 58(4), e2021WR030138. https://doi.org/10.1029/2021WR030138

U.S. Geological Survey, 2016, National Water Information System data available on the World Wide Web (USGS Water Data for the Nation), accessed Dec. 2020.
Speedtest Open Data - Australia(NZ) 2020-2025; Q220 - Q325 extract by Qtr
figshare.com
txt
Updated Oct 24, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Richard Ferrers; Speedtest Global Index (2025). Speedtest Open Data - Australia(NZ) 2020-2025; Q220 - Q325 extract by Qtr [Dataset]. http://doi.org/10.6084/m9.figshare.13370504.v43
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.13370504.v43
Dataset updated
Oct 24, 2025
Dataset provided by
Figsharehttp://figshare.com/
Authors
Richard Ferrers; Speedtest Global Index
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
New Zealand, Australia
Description
This is an Australian extract of Speedtest Open data available at Amazon WS (link below - opendata.aws).AWS data licence is "CC BY-NC-SA 4.0", so use of this data must be:- non-commercial (NC)- reuse must be share-alike (SA)(add same licence).This restricts the standard CC-BY Figshare licence.A world speedtest open data was dowloaded (>400Mb, 7M lines of data). An extract of Australia's location (lat, long) revealed 88,000 lines of data (attached as csv).A Jupyter notebook of extract process is attached.See Binder version at Github - https://github.com/areff2000/speedtestAU.+> Install: 173 packages | Downgrade: 1 packages | Total download: 432MBBuild container time: approx - load time 25secs.=> Error: Timesout - BUT UNABLE TO LOAD GLOBAL DATA FILE (6.6M lines).=> Error: Overflows 8GB RAM container provided with global data file (3GB)=> On local JupyterLab M2 MBP; loads in 6 mins.Added Binder from ARDC service: https://binderhub.rc.nectar.org.auDocs: https://ardc.edu.au/resource/fair-for-jupyter-notebooks-a-practical-guide/A link to Twitter thread of outputs provided.A link to Data tutorial provided (GitHub), including Jupyter Notebook to analyse World Speedtest data, selecting one US State.Data Shows: (Q220)- 3.1M speedtests | 762,000 devices |- 88,000 grid locations (600m * 600m), summarised as a point- average speed 33.7Mbps (down), 12.4M (up) | Max speed 724Mbps- data is for 600m * 600m grids, showing average speed up/down, number of tests, and number of users (IP). Added centroid, and now lat/long.See tweet of image of centroids also attached.NB: Discrepancy Q2-21, Speedtest Global shows June AU average speedtest at 80Mbps, whereas Q2 mean is 52Mbps (v17; Q1 45Mbps; v14). Dec 20 Speedtest Global has AU at 59Mbps. Could be possible timing difference. Or spatial anonymising masking shaping highest speeds. Else potentially data inconsistent between national average and geospatial detail. Check in upcoming quarters.NextSteps:Histogram - compare Q220, Q121, Q122. per v1.4.ipynb.Versions:v43. Added revised NZ vs AUS graph for Q325 (NZ; Q2 25) since had NZ available from Github (link below). Calc using PlayNZ.ipynb notebook. See images in Twitter - https://x.com/ValueMgmt/status/1981607615496122814v42: Added AUS Q325 (97.6k lines avg d/l 165.5 Mbps (median d/l 150.8 Mbps) u/l 28.08 Mbps). Imported using v2 Jupyter notebook (MBP 16Gb). Mean tests: 24.5. Mean devices: 6.02. Download, extract and publish: UNK - not measured mins. Download avg is double Q423. Noting, NBN increased D/L speeds from Sept '25; 100 -> 500, 250 -> 750. For 1Gbps, upload speed only increased from 50Mbps to 100Mbps. New 2Gbps services introduced on FTTP and HFC networks.v41: Added AUS Q225 (96k lines avg d/l 130.5 Mbps (median d/l 108.4 Mbps) u/l 22.45 Mbps). Imported using v2 Jupyter notebook (MBP 16Gb). Mean tests: 17.2. Mean devices: 5.11. Download, extract and publish: 20 mins. Download avg is double Q422.v40: Added AUS Q125 (93k lines avg d/l 116.6 Mbps u/l 21.35 Mbps). Imported using v2 Jupyter notebook (MBP 16Gb). Mean tests: 16.9. Mean devices: 5.13. Download, extract and publish: 14 mins.v39: Added AUS Q424 (95k lines avg d/l 110.9 Mbps u/l 21.02 Mbps). Imported using v2 Jupyter notebook (MBP 16Gb). Mean tests: 17.2. Mean devices: 5.24. Download, extract and publish: 14 mins.v38: Added AUS Q324 (92k lines avg d/l 107.0 Mbps u/l 20.79 Mbps). Imported using v2 Jupyter notebook (iMac 32Gb). Mean tests: 17.7. Mean devices: 5.33.Added github speedtest-workflow-importv2vis.ipynb Jupyter added datavis code to colour code national map. (per Binder on Github; link below).v37: Added AUS Q224 (91k lines avg d/l 97.40 Mbps u/l 19.88 Mbps). Imported using speedtest-workflow-importv2 jupyter notebook. Mean tests:18.1. Mean devices: 5.4.v36 Load UK data, Q1-23 and compare to AUS and NZ Q123 data. Add compare image (au-nz-ukQ123.png), calc PlayNZUK.ipynb, data load import-UK.ipynb. UK data bit rough and ready as uses rectangle to mark out UK, but includes some EIRE and FR. Indicative only and to be definitively needs geo-clean to exclude neighbouring countries.v35 Load Melb geo-maps of speed quartiles (0-25, 25-50, 50-75, 75-100, 100-). Avg in 2020; 41Mbps. Avg in 2023; 86Mbps. MelbQ323.png, MelbQ320.png. Calc with Speedtest-incHist.ipynb code. Needed to install conda mapclassify. ax=melb.plot(column=...dict(bins[25,50,75,100]))v34 Added AUS Q124 (93k lines avg d/l 87.00 Mbps u/l 18.86 Mbps). Imported using speedtest-workflow-importv2 jupyter notebook. Mean tests:18.3. Mean devices: 5.5.v33 Added AUS Q423 (92k lines avg d/l 82.62 Mbps). Imported using speedtest-workflow-importv2 jupyter notebook. Mean tests:18.0. Mean devices: 5.6. Added link to Github.v32 Recalc Au vs NZ for upload performance; added image. using PlayNZ Jupyter. NZ approx 40% locations at or above 100Mbps. Aus
Cleaned Contoso Dataset
kaggle.com
zip
Updated Aug 27, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Bhanu (2023). Cleaned Contoso Dataset [Dataset]. https://www.kaggle.com/datasets/bhanuthakurr/cleaned-contoso-dataset
Explore at:
zip(487695063 bytes)Available download formats
Dataset updated
Aug 27, 2023
Authors
Bhanu
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Data was imported from the BAK file found here into SQL Server, and then individual tables were exported as CSV. Jupyter Notebook containing the code used to clean the data can be found here

Version 6 has a some more cleaning and structuring that was noticed after importing in Power BI. Changes were made by adding code in python notebook to export new cleaned dataset, such as adding MonthNumber for sorting by month number, similar for WeekDayNumber.

Cleaning was done in python while also using SQL Server to quickly find things. Headers were added separately, ensuring no data loss.Data was cleaned for NaN, garbage values and other columns.
NeoModeling Framework: Leveraging Graph-Based Persistence for Large-Scale...
zenodo.org
zip
Updated Sep 30, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Luciano Marchezan; Luciano Marchezan; Nikitchyn Vitalii; Eugene Syriani; Eugene Syriani; Nikitchyn Vitalii (2025). NeoModeling Framework: Leveraging Graph-Based Persistence for Large-Scale Model-Driven Engineering (replication package) [Dataset]. http://doi.org/10.5281/zenodo.17238878
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.17238878
Dataset updated
Sep 30, 2025
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Luciano Marchezan; Luciano Marchezan; Nikitchyn Vitalii; Eugene Syriani; Eugene Syriani; Nikitchyn Vitalii
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This is the replication package for the paper "NeoModeling Framework: Leveraging Graph-Based Persistence for Large-Scale Model-Driven Engineering" where we present Neo Modeling Framework (NMF), an open-source set of tools primarily designed to manipulate ultra-large datasets in the Neo4j database.

Repository structure

NeoModelingFramework.zip - contains the replication package, including the source code for NMF, test files to run the evaluation, used artifacts, and instructions to run the framework. The most import folders are listed below:

codeGenerator - NMF generator module

modelLoader - NMF loader module

modelEditor - NMF editor module

Evaluation - contains the evaluation artifacts and results (a copy

metamodels - Ecore files used for RQ1 and RQ2

results - CSV files with the results from RQ1, RQ2 and RQ3

analysis - Jupyter notebooks used to analyze and plot the results

Running NMF

The best way to run NMF is following the instructions at our GitHub repository. A copy of the Readme file is also present inside the zip file available here.

Empirical Evaluation

Make sure that you follow the instructions to run NMF.

The quantitative evaluation can be re-run by running RQ1Eval.kt, RQ2Eval.kt inside modelLoader/src/test/kotlin/evaluation and RQ2Eval.kt inside modelEditor/src/test/kotlin/evaluation.

Make sure that you have an empty instance of Neo4j running.

Results will be generated as CSV files, under Evaluation/results and the results can be plotted by running the Jupyter Notebooks at Evaluation/analysis.

Please note that due to differences in hardware, re-running the experiments will probably generate slightly different results than those reported in the paper.
Z
Performance results of different scheduling algorithms used in the...
data.niaid.nih.gov
Updated May 9, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mustapha Regragui; Baptiste Coye; Laércio Lima Pilla; Raymond Namyst; Denis Barthou (2022). Performance results of different scheduling algorithms used in the simulation of a modern game engine [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_6532251
Explore at:
Dataset updated
May 9, 2022
Dataset provided by
CNRS
Bordeaux INP
Ubisoft
University of Bordeaux
Authors
Mustapha Regragui; Baptiste Coye; Laércio Lima Pilla; Raymond Namyst; Denis Barthou
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Performance results of different scheduling algorithms used in the simulation of a modern game engine

These results are a companion to the paper entitled "Exploring scheduling algorithms for parallel task graphs: a modern game engine case study" by M. Regragui et al.

General information

This dataset contains raw outputs and scripts to visualize and analyze the scheduling results from our game engine simulator. The result analysis can be directly reproduced using the script run_analysis.sh. A series of Jupyter Notebook files are also available to help visualize the results.

File information

All Scenario*.ipynb files contain python scripts to visualize and analyze the simulation results.

The Scenario*.py files contain python scripts that can be run directly with Jupyter Notebook.

The requirements.txt file contains the names and versions of python packages necessary to reproduce the analysis.

The run_analysis.sh file contains a bash script to install the required python packages and run the Scenario*.py scripts.

The results are organized in five folders:

Result_1 contains the results for Scenario 1 generated using file input_scenario_1.txt.

Result_2 contains the results for Scenario 2 generated using file input_scenario_2.txt.

Result_3 contains the results for Scenario 3 generated using file input_scenario_3.txt.

Result_CP_1 contains the results for the critical path of Scenarios 1 and 2 generated using file input_CP_scenario_1.txt.

Result_CP_3 contains the results for the critical path of Scenario 3 generated using file input_CP_scenario_3.txt.

Each result file (e.g., HLF_NonSorted_Random_1_200_10.txt) contains 200 lines representing information of the 200 frames that were simulated. Each line contains four values: the frame number, the duration of the frame (in microseconds), a critical path estimation for the previous frame (in microseconds), and the load parameter (value between 0 and 1).

The outputs of this analysis include some PDF files representing the figures in the paper (in order) and some CSV files representing the values shown in tables. The standard output shows the p-values computed in parts of the statistical analysis.

Software and hardware information

The simulation results were generated on an Intel Core i7-1185G7 processor, with 32 GB of LPDDR4 RAM (3200 MHz). The machine ran on Ubuntu 20.04.3 LTS (5.14.0-1034-oem), and g++ 9.4.0 was used for the simulator's compilation (-O3 flag).

The results were analyzed using Python 3.8.10, pip 20.0.2 and jupyter-notebook 6.0.3. The following packages and their respective versions were used:

pandas 1.3.2

numpy 1.21.2

matplotlib 3.4.3

seaborn 0.11.2

scipy 1.7.1

pytz 2019.3

python-dateutil 2.7.3

kiwisolver 1.3.2

pyparsing 2.4.7

cycler 0.10.0

Pillow 7.0.0

six 1.14.0

Simulation information

Simulation results were generated from 4 to 20 resources. Each configuration was run with 50 different RNG seeds (1 up to 50).

Each simulation is composed of 200 frames. The load parameter (lag) starts at zero and increases by 0.01 with each frame up to a value equal to 100% in frame 101. After that, the load parameter starts to decrease in the same rhythm down to 0.01 in frame 200.

Algorithms abbreviation in presentation order

FIFO serves as the baseline for comparisons.

FIFO: First In First Out.

LPT: Longest Processing Time First.

SPT: Shortest Processing Time First.

SLPT: LPT at a subtask level.

SSPT: SPT at a subtask level.

HRRN: Highest Response Ratio Next.

WT: Longest Waiting Time First.

HLF: Hu's Level First with unitary processing time of each task.

HLFET: HLF with estimated times.

CG: Coffman-Graham's Algorithm.

DCP: Dynamic Critical Path Priority.

Metrics

SF: slowest frame (maximum frame execution time)

DF: number of delayed frames (with 16.667 ms as the due date)

CS: cumulative slowdown (with 16.667 ms as the due date)
O
Household Data
data.open-power-system-data.org
csv, sqlite, xlsx
Updated Nov 10, 2017
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Adrian Minde (2017). Household Data [Dataset]. http://doi.org/10.25832/household_data/2017-11-10
Explore at:
csv, sqlite, xlsxAvailable download formats
Unique identifier
https://doi.org/10.25832/household_data/2017-11-10
Dataset updated
Nov 10, 2017
Dataset provided by
Open Power System Data
Authors
Adrian Minde
Time period covered
Oct 26, 2015 - Feb 9, 2017
Variables measured
utc_timestamp, cet_cest_timestamp, interpolated_values, DE_KN_industrial3_ev, DE_KN_industrial3_pv, DE_KN_residential1_pv, DE_KN_residential3_pv, DE_KN_residential4_ev, DE_KN_residential4_pv, DE_KN_residential6_pv, and 63 more
Description
Detailed household load and solar generation in minutely to hourly resolution. This data package contains measured time series data for several small businesses and private households relevant for household- or low-voltage-level power system modeling. The data includes solar power generation as well as electricity consumption (load) in a resolution up to single device consumption. The starting point for the time series, as well as data quality, varies between households, with gaps spanning from a few minutes to entire hours. In general, data is adjusted to fit uniform, regular time intervals without changing its validity. Except for small gaps, filled using linear interpolation. The numbers are cumulative power consumption/generation over time. Hence overall energy consumption/generation is retained in case of data gaps. Measurements were initially conducted in 3-minute intervals, later in 1-minute intervals. Data for both measurement resolutions are published separately in large CSV files. Additionally, data in 15 and 60-minute resolution is provided for compatibility with other time series data. Data processing is conducted in Jupyter Notebooks/Python/pandas.

Facebook

Twitter

Click to copy link

Link copied

Cite

Johanna Schultz (2022). csv file for jupyter notebook [Dataset]. http://doi.org/10.6084/m9.figshare.21590175.v1

csv file for jupyter notebook

Explore at:

txtAvailable download formats

Unique identifier

https://doi.org/10.6084/m9.figshare.21590175.v1

Dataset updated

Nov 21, 2022

Dataset provided by

Figsharehttp://figshare.com/

Authors

Johanna Schultz

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

df_force_kin_filtered.csv is the data sheet used for the DATA3 python notebook to analyse kinematics and dynamics combined. It contains the footfalls that hava data for both: kinematics and dynamics. To see how this file is generated, read the first half of the jupyter notebook

Clear search

Close search

Google apps

Main menu

csv file for jupyter notebook

Speedtest Open Data - Four International cities - MEL, BKK, SHG, LAX plus...

Melb 14784 lines Avg download speed 69.4M Tests 0.39M

SHG 31207 lines Avg 233.7M Tests 0.56M

ALC 113 lines Avg 51.5M Test 1092

BKK 29684 lines Avg 215.9M Tests 1.2M

LAX 15505 lines Avg 218.5M Tests 0.74M

Update CSV item in ArcGIS

Amazon Web Scrapping Dataset

JavaScript code for retrieval of MODIS Collection 6 NDSI snow cover at...

Galaxy Training Material for the 'Use Jupyter notebooks in Galaxy' tutorial

Chicago Data Portal

Using HydroShare Buckets to Access Resource Files

OpenOrca

Open-Orca Augmented FLAN Dataset

Unlocking Advanced Language Understanding and ML Model Performance

About this dataset

More Datasets

Featured Notebooks

How to use the dataset

Research Ideas

Acknowledgements

License

Gaussian Process Model and Sensor Placement for Detroit Green...

FiN-2: Larg-Scale Powerline Communication Dataset (Pt.1)

Uniquely Popular Businesses

Uniquely Popular Businesses

Rankings of Business Categories in Seattle & NYC Neighborhoods

About this dataset

More Datasets

Featured Notebooks

How to use the dataset

Research Ideas

Acknowledgements

License

Columns

Linking soil structure, SHPs, and soc dynamics to study the impact of...

Cognitive Fatigue

Data from: Multi-task Deep Learning for Water Temperature and Streamflow...

Speedtest Open Data - Australia(NZ) 2020-2025; Q220 - Q325 extract by Qtr

Cleaned Contoso Dataset

NeoModeling Framework: Leveraging Graph-Based Persistence for Large-Scale...

Repository structure

Running NMF

Empirical Evaluation

Performance results of different scheduling algorithms used in the...

Household Data

csv file for jupyter notebook