Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
df_force_kin_filtered.csv is the data sheet used for the DATA3 python notebook to analyse kinematics and dynamics combined. It contains the footfalls that hava data for both: kinematics and dynamics. To see how this file is generated, read the first half of the jupyter notebook
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset compares four cities FIXED-line broadband internet speeds: - Melbourne, AU - Bangkok, TH - Shanghai, CN - Los Angeles, US - Alice Springs, AU
ERRATA: 1.Data is for Q3 2020, but some files are labelled incorrectly as 02-20 of June 20. They all should read Sept 20, or 09-20 as Q3 20, rather than Q2. Will rename and reload. Amended in v7.
*lines of data for each geojson file; a line equates to a 600m^2 location, inc total tests, devices used, and average upload and download speed - MEL 16181 locations/lines => 0.85M speedtests (16.7 tests per 100people) - SHG 31745 lines => 0.65M speedtests (2.5/100pp) - BKK 29296 lines => 1.5M speedtests (14.3/100pp) - LAX 15899 lines => 1.3M speedtests (10.4/100pp) - ALC 76 lines => 500 speedtests (2/100pp)
Geojsons of these 2* by 2* extracts for MEL, BKK, SHG now added, and LAX added v6. Alice Springs added v15.
This dataset unpacks, geospatially, data summaries provided in Speedtest Global Index (linked below). See Jupyter Notebook (*.ipynb) to interrogate geo data. See link to install Jupyter.
** To Do Will add Google Map versions so everyone can see without installing Jupyter. - Link to Google Map (BKK) added below. Key:Green > 100Mbps(Superfast). Black > 500Mbps (Ultrafast). CSV provided. Code in Speedtestv1.1.ipynb Jupyter Notebook. - Community (Whirlpool) surprised [Link: https://whrl.pl/RgAPTl] that Melb has 20% at or above 100Mbps. Suggest plot Top 20% on map for community. Google Map link - now added (and tweet).
** Python melb = au_tiles.cx[144:146 , -39:-37] #Lat/Lon extract shg = tiles.cx[120:122 , 30:32] #Lat/Lon extract bkk = tiles.cx[100:102 , 13:15] #Lat/Lon extract lax = tiles.cx[-118:-120, 33:35] #lat/Lon extract ALC=tiles.cx[132:134, -22:-24] #Lat/Lon extract
Histograms (v9), and data visualisations (v3,5,9,11) will be provided. Data Sourced from - This is an extract of Speedtest Open data available at Amazon WS (link below - opendata.aws).
**VERSIONS v.24 Add tweet and google map of Top 20% (over 100Mbps locations) in Mel Q322. Add v.1.5 MEL-Superfast notebook, and CSV of results (now on Google Map; link below). v23. Add graph of 2022 Broadband distribution, and compare 2020 - 2022. Updated v1.4 Jupyter notebook. v22. Add Import ipynb; workflow-import-4cities. v21. Add Q3 2022 data; five cities inc ALC. Geojson files. (2020; 4.3M tests 2022; 2.9M tests)
v20. Speedtest - Five Cities inc ALC. v19. Add ALC2.ipynb. v18. Add ALC line graph. v17. Added ipynb for ALC. Added ALC to title.v16. Load Alice Springs Data Q221 - csv. Added Google Map link of ALC. v15. Load Melb Q1 2021 data - csv. V14. Added Melb Q1 2021 data - geojson. v13. Added Twitter link to pics. v12 Add Line-Compare pic (fastest 1000 locations) inc Jupyter (nbn-intl-v1.2.ipynb). v11 Add Line-Compare pic, plotting Four Cities on a graph. v10 Add Four Histograms in one pic. v9 Add Histogram for Four Cities. Add NBN-Intl.v1.1.ipynb (Jupyter Notebook). v8 Renamed LAX file to Q3, rather than 03. v7 Amended file names of BKK files to correctly label as Q3, not Q2 or 06. v6 Added LAX file. v5 Add screenshot of BKK Google Map. v4 Add BKK Google map(link below), and BKK csv mapping files. v3 replaced MEL map with big key version. Prev key was very tiny in top right corner. v2 Uploaded MEL, SHG, BKK data and Jupyter Notebook v1 Metadata record
** LICENCE AWS data licence on Speedtest data is "CC BY-NC-SA 4.0", so use of this data must be: - non-commercial (NC) - reuse must be share-alike (SA)(add same licence). This restricts the standard CC-BY Figshare licence.
** Other uses of Speedtest Open Data; - see link at Speedtest below.
Facebook
TwitterArcGIS Survey123 utilizes CSV data in several workflows, including external choice lists, the search() appearance, and pulldata() calculations. When you need to periodically update the CSV content used in a survey, a useful method is to upload the CSV files to your ArcGIS organization and link the CSV items to your survey. Once linked, any updates to the CSV items will automatically pull through to your survey without the need to republish the survey. To learn more about linking items to a survey, see Linked content.This notebook demonstrates how to automate updating a CSV item in your ArcGIS organization.Note: It is recommended to run this notebook on your computer in Jupyter Notebook or ArcGIS Pro, as that will provide the best experience when reading locally stored CSV files. If you intend to schedule this notebook in ArcGIS Online or ArcGIS Notebook Server, additional configuration may be required to read CSV files from online file storage, such as Microsoft OneDrive or Google Drive.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Amazon Scrapping Dataset; 1. Import libraries 2. Connect to the website 3. Import CSV and datetime 4. Import pandas 5. Appending dataset to CSV 6. Automation Dataset updated 7. Timers setup 8. Email notification
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This JavaScript code has been developed to retrieve NDSI_Snow_Cover from MODIS version 6 for SNOTEL sites using the Google Earth Engine platform. To successfully run the code, you should have a Google Earth Engine account. An input file, called NWM_grid_Western_US_polygons_SNOTEL_ID.zip, is required to run the code. This input file includes 1 km grid cells of the NWM containing SNOTEL sites. You need to upload this input file to the Assets tap in the Google Earth Engine code editor. You also need to import the MOD10A1.006 Terra Snow Cover Daily Global 500m collection to the Google Earth Engine code editor. You may do this by searching for the product name in the search bar of the code editor.
The JavaScript works for s specified time range. We found that the best period is a month, which is the maximum allowable time range to do the computation for all SNOTEL sites on Google Earth Engine. The script consists of two main loops. The first loop retrieves data for the first day of a month up to day 28 through five periods. The second loop retrieves data from day 28 to the beginning of the next month. The results will be shown as graphs on the right-hand side of the Google Earth Engine code editor under the Console tap. To save results as CSV files, open each time-series by clicking on the button located at each graph's top right corner. From the new web page, you can click on the Download CSV button on top.
Here is the link to the script path: https://code.earthengine.google.com/?scriptPath=users%2Figarousi%2Fppr2-modis%3AMODIS-monthly
Then, run the Jupyter Notebook (merge_downloaded_csv_files.ipynb) to merge the downloaded CSV files that are stored for example in a folder called output/from_GEE into one single CSV file which is merged.csv. The Jupyter Notebook then applies some preprocessing steps and the final output is NDSI_FSCA_MODIS_C6.csv.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset was originally curated by Software Carpentry, a branch of The Carpentries non-profit organization, and is based on data from the Gapminder Foundation. It consists of six tabular CSV files containing GDP data for various countries across different years. The dataset was initially prepared for the Software Carpentry tutorial "Plotting and Programming in Python" and is also reused in the Galaxy Training Network (GTN) tutorial "Use Jupyter Notebooks in Galaxy."
This GTN tutorial provides an introduction to launching a Jupyter Notebook in Galaxy, installing dependencies, and importing and exporting data. It serves as a setup guide for a Jupyter Notebook environment that can be used to follow the Software Carpentry tutorial "Plotting and Programming in Python."
Facebook
TwitterAssignment Topic: In this assignment, you will download the datasets provided, load them into a database, write and execute SQL queries to answer the problems provided, and upload a screenshot showing the correct SQL query and result for review by your peers. A Jupyter notebook is provided in the preceding lesson to help you with the process.
This assignment involves 3 datasets for the city of Chicago obtained from the Chicago Data Portal:
This dataset contains a selection of six socioeconomic indicators of public health significance and a hardship index, by Chicago community area, for the years 2008 – 2012.
This dataset shows all school level performance data used to create CPS School Report Cards for the 2011-2012 school year.
This dataset reflects reported incidents of crime (with the exception of murders where data exists for each victim) that occurred in the City of Chicago from 2001 to present, minus the most recent seven days.
Instructions:
Before you begin, you will need to become familiar with the datasets. Snapshots for the three datasets in .CSV format can be downloaded from the following links:
Chicago Socioeconomic Indicators: Click here
Chicago Public Schools: Click here
Chicago Crime Data: Click here
NOTE: Ensure you have downloaded the datasets using the links above instead of directly from the Chicago Data Portal. The versions linked here are subsets of the original datasets and have some of the column names modified to be more database friendly which will make it easier to complete this assignment. The CSV file provided above for the Chicago Crime Data is a very small subset of the full dataset available from the Chicago Data Portal. The original dataset is over 1.55GB in size and contains over 6.5 million rows. For the purposes of this assignment you will use a much smaller sample with only about 500 rows.
Perform this step using the LOAD tool in the Db2 console. You will need to create 3 tables in the database, one for each dataset, named as follows, and then load the respective .CSV file into the table:
CENSUS_DATA
CHICAGO_PUBLIC_SCHOOLS
CHICAGO_CRIME_DATA
Facebook
TwitterThis resource contains a draft Jupyter Notebook that has example code snippets showing how to access HydroShare resource files using HydroShare S3 buckets. The user_account.py is a utility to read user hydroshare cached account information in any of the JupyterHub instances that HydroShare has access to. The example notebook uses this utility so that you don't have to enter your hydroshare account information in order to access hydroshare buckets.
Here are the 3 notebooks in this resource:
The above notebook has examples showing how to upload/download resource files from the resource bucket. It also contains examples how to list files and folders of a resource in a bucket.
The above notebook has examples for reading raster and shapefile from bucket using gdal without the need of downloading the file from the bucket to local disk.
The above notebook has examples of using h5netcdf and xarray for reading netcdf file directly from bucket. It also contains examples of using rioxarray to read raster file, and pandas to read CSV file from hydroshare buckets.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
By Huggingface Hub [source]
The Open-Orca Augmented FLAN Collection is a revolutionary dataset that unlocks new levels of language understanding and machine learning model performance. This dataset was created to support research on natural language processing, machine learning models, and language understanding through leveraging the power of reasoning trace-enhancement techniques. By enabling models to understand complex relationships between words, phrases, and even entire sentences in a more robust way than ever before, this dataset provides researchers expanded opportunities for furthering the progress of linguistics research. With its unique combination of features including system prompts, questions from users and responses from systems, this dataset opens up exciting possibilities for deeper exploration into the cutting edge concepts underlying advanced linguistics applications. Experience a new level of accuracy and performance - explore Open-Orca Augmented FLAN Collection today!
For more datasets, click here.
- 🚨 Your notebook can be here! 🚨!
This guide provides an introduction to the Open-Orca Augmented FLAN Collection dataset and outlines how researchers can utilize it for their language understanding and natural language processing (NLP) work. The Open-Orca dataset includes system prompts, questions posed by users, and responses from the system.
Getting Started The first step is to download the data set from Kaggle at https://www.kaggle.com/openai/open-orca-augmented-flan and save it in a project directory of your choice on your computer or cloud storage space. Once you have downloaded the data set, launch your ‘Jupyter Notebook’ or ‘Google Colab’ program with which you want to work with this data set.
Exploring & Preprocessing Data: To get a better understanding of the features in this dataset, import them into Pandas DataFrame as shown below. You can use other libraries as per your need:
import pandas as pd # Library used for importing datasets into Python df = pd.read_csv('train.csv') #Imports train csv file into Pandas};#DataFrame df[['system_prompt','question','response']].head() #Views top 5 rows with columns 'system_prompt','question','response'After importing check each feature using basic descriptive statistics such Pandas groupby statement: We can use groupby statements to have greater clarity over the variables present in each feature(elements). The below command will show counts of each element in System Prompt column present under train CVS file :
df['system prompt'].value_counts().head()#shows count of each element present under 'System Prompt'column Output: User says hello guys 587 <br>System asks How are you?: 555 times<br>User says I am doing good: 487 times <br>..and so onData Transformation: After inspecting & exploring different features one may want/need certain changes that best suits their needs from this dataset before training modeling algorithms on it.
Common transformation steps include : Removing punctuation marks : Since punctuation marks may not add any value to computation operations , we can remove them using regex functions write .replace('[^A-Za -z]+','' ) as
- Automated Question Answering: Leverage the dataset to train and develop question answering models that can provide tailored answers to specific user queries while retaining language understanding abilities.
- Natural Language Understanding: Use the dataset as an exploratory tool for fine-tuning natural language processing applications, such as sentiment analysis, document categorization, parts-of-speech tagging and more.
- Machine Learning Optimizations: The dataset can be used to build highly customized machine learning pipelines that allow users to harness the power of conditioning data with pre-existing rules or models for improved accuracy and performance in automated tasks
If you use this dataset in your research, please credit the original authors. Data Source
License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. [See Other Information](ht...
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
code.zip: Zip folder containing a folder titled "code" which holds:
csv file titled "MonitoredRainGardens.csv" containing the 14 monitored green infrastructure (GI) sites with their design and physiographic features;
csv file titled "storm_constants.csv" which contain the computed decay constants for every storm in every GI during the measurement period;
csv file titled "newGIsites_AllData.csv" which contain the other 130 GI sites in Detroit and their design and physiographic features;
csv file titled "Detroit_Data_MeanDesignFeatures.csv" which contain the design and physiographic features for all of Detroit;
Jupyter notebook titled "GI_GP_SensorPlacement.ipynb" which provides the code for training the GP models and displaying the sensor placement results;
a folder titled "MATLAB" which contains the following:
folder titled "SFO" which contains the SFO toolbox for the sensor placement work
file titled "sensor_placement.mlx" that contains the code for the sensor placement work
several .mat files created in Python for importing into Matlab for the sensor placement work: "constants_sigma.mat", "constants_coords.mat", "GInew_sigma.mat", "GInew_coords.mat", and "R1_sensor.mat" through "R6_sensor.mat"
several .mat files created in Matalb for importing into Python for visualizing the results: "MI_DETselectedGI.mat" and "DETselectedGI.mat"
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
# FiN-2 Large-Scale Real-World PLC-Dataset
## About
#### FiN-2 dataset in a nutshell:
FiN-2 is the first large-scale real-world dataset on data collected in a powerline communication infrastructure. Since the electricity grid is inherently a graph, our dataset could be interpreted as a graph dataset. Therefore, we use the word node to describe points (cable distribution cabinets) of measurement within the low-voltage electricity grid and the word edge to describe connections (cables) in between them. However, since these are PLC connections, an edge does not necessarily have to correspond to a real cable; more on this in our paper.
FiN-2 shows measurements that relate to the nodes (voltage, total harmonic distortion) as well as to the edges (signal-to-noise ratio spectrum, tonemap). In total, FiN-2 is distributed across three different sites with a total of 1,930,762,116 node measurements each for the individual features and 638,394,025 edge measurements each for all 917 PLC channels. All data was collected over a 25-month period from mid-2020 to the end of 2022.
We propose this dataset to foster research in the domain of grid automation and smart grid. Therefore, we provide different example use cases in asset management, grid state visualization, forecasting, predictive maintenance, and novelty detection. For more decent information on this dataset, please see our [paper](https://arxiv.org/abs/2209.12693).
* * *
## Content
FiN-2 dataset splits up into two compressed `csv-Files`: *nodes.csv* and *edges.csv*.
All files are provided as a compressed ZIP file and are divided into four parts. The first part can be found in this repo, while the remaining parts can be found in the following:
- https://zenodo.org/record/8328105
- https://zenodo.org/record/8328108
- https://zenodo.org/record/8328111
### Node data
| id | ts | v1 | v2 | v3 | thd1 | thd2 | thd3 | phase_angle1 | phase_angle2 | phase_angle3 | temp |
|----|----|----|----|----|----|----|----|----|----|----|----|----|----|
|112|1605530460|236.5|236.4|236.0|2.9|2.5|2.4|120.0|119.8|120.0|35.3|
|112|1605530520|236.9|236.6|236.6|3.1|2.7|2.5|120.1|119.8|120.0|35.3|
|112|1605530580|236.2|236.4|236.0|3.1|2.7|2.5|120.0|120.0|119.9|35.5|
- id / ts: Unique identifier of the node that is measured and timestemp of the measurement
- v1/v2/v3: Voltage measurements of all three phases
- thd1/thd2/thd3: Total harmonic distortion of all three phases
- phase_angle1/2/3: Phase angle of all three phases
- temp: Temperature in-circuit of the sensor inside a cable distribution unit (in °C)
### Edge data
| src | dst | ts | snr0 | snr1 | snr2 | ... | snr916 |
|----|----|----|----|----|----|----|----|
|62|94|1605528900|70|72|45|...|-53|
|62|32|1605529800|16|24|13|...|-51|
|17|94|1605530700|37|25|24|...|-55|
- src & dst & ts: Unique identifier of the source and target nodes where the spectrum is measured and time of measurement
- snr0/snr1/.../snr916: 917 SNR measurements in tenths of a decibel (e.g. 50 --> 5dB).
### Metadata
Metadata that is provided along with the data covers:
- Number of cable joints
- Cable properties (length, type, number of sections)
- Relative position of the nodes (location, zero-centered gps)
- Adjacent PV or wallbox installations
- Year of installation w.r.t. the nodes and cables
Since the electricity grid is part of the critical infrastructure, it is not possible to provide exact GPS locations.
* * *
## Usage
Simple data access using pandas:
```
import pandas as pd
nodes_file = "nodes.csv.gz" # /path/to/nodes.csv.gz
edges_file = "edges.csv.gz" # /path/to/edges.csv.gz
# read the first 10 rows
data = pd.read_csv(nodes_file, nrows=10, compression='gzip')
# read the row number 5 to 15
data = pd.read_csv(nodes_file, nrows=10, skiprows=[i for i in range(1,6)], compression='gzip')
# ... same for the edges
```
Compressed csv-data format was used to make sharing as easy as possible, however it comes with significant drawbacks for machine learning. Due to the inherent graph structure, a single snapshot of the whole graph consists of a set of node and edge measurements. But due to timeouts, noise and other disturbances, nodes sometimes fail in collecting the data, wherefore the number of measurements for a specific timestamp differs. This, plus the high sparsity of the graph, leads to a high inefficiency when using the csv-format for an ML training.
To utilize the data in an ML pipeline, we recommend other data formats like [datadings](https://datadings.readthedocs.io/en/latest/) or specialized database solutions like [VictoriaMetrics](https://victoriametrics.com/).
### Example use case (voltage forecasting)
Forecasting of the voltage is one potential use cases. The Jupyter notebook provided in the repository gives an overview of how the dataset can be loaded, preprocessed and used for ML training. Thereby, a MinMax scaling was used as simple preprocessing and a PyTorch dataset class was created to handle the data. Furthermore, a vanilla autoencoder is utilized to process and forecast the voltage into the future.
Facebook
TwitterBy data.world's Admin [source]
This dataset contains data used to analyze the uniquely popular business types in the neighborhoods of Seattle and New York City. We used publically available neighborhood-level shapefiles to identify neighborhoods, and then crossed that information against Yelp's Business Category API to find businesses operating within each neighborhood. The ratio of businesses from each category was studied in comparison to their ratios in the entire city to determine any significant differences between each borough.
Any single business with more than one category was repeated for each one, however none of them were ever recorded twice for any single category. Moreover, if a certain business type didn't make up at least 1% of a particular neighborhood's businesses overall it was removed from the analysis altogether.
The data available here is free to use under MIT license, with appropriate attribution given back to Yelp for providing this information. It is an invaluable resource for researchers across different disciplines looking into consumer behavior or clustering within urban areas!
For more datasets, click here.
- 🚨 Your notebook can be here! 🚨!
How to Use This Dataset
To get started using this dataset: - Download the appropriate file for the area you’re researching - either salt5_Seattle.csv or top5_NewYorkCity.csv - from the Kaggle site which hosts this dataset (https://www.kaggle.com/puddingmagazine/uniquely-popular-businesses). - Read through each columns information available under Columns section associated with this kaggle description (above).
- Take note of columns that are relevant to your analysis such as nCount which indicates the number of businesses in a neighborhood, rank which shows how popular that business type is overall and neighborhoodTotal which specifies total number of businesses in a particular neighborhood etc.,
- ) Load your selected file into an application designed for data analysis such as Jupyter Notebook, Microsoft Excel, Power BI etc.,
- ) Begin performing various analyses related to understanding where certain types of unique business are most common by subsetting rows based on specific neighborhoods; alternatively perform regressions-based analyses related to trends similar unique type's ranks over multiple neighborhoods etc.,If you have any questions about interpreting data from this source please reach out if needed!
- Analyzing the unique business trends in Seattle and New York City to identify potential investment opportunities.
- Creating a tool that helps businesses understand what local competitions they face by neighborhood.
- Exploring the distinctions between neighborhoods by plotting out the different businesses they have in comparison with each other and other cities
If you use this dataset in your research, please credit the original authors. Data Source
See the dataset description for more information.
File: top5_Seattle.csv | Column name | Description | |:----------------------|:----------------------------------------------------------------------------------------------------------------------------------| | neighborhood | Name of the neighborhood. (String) | | yelpAlias | The Yelp-specified Alias for the business type. (String) | | yelpTitle | The Title given to this business type by Yelp. (String) | | nCount | Number of businesses with this type within a particular neighborhood. (Integer) | | neighborhoodTotal | Total number of businesses located within that particular region. (Integer) | | cCount | Number of businesses with this storefront within an entire city. (Integer) | | cityTotal | Total number of all types of storefronts within an entire city. (Integer) ...
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This resource includes the Python scripts of the modeling framework, simulation result CSV files, and the figures for the manuscript entitled 'Linking Soil Structure, Hydraulic Properties, and Organic Carbon Dynamics: A Holistic Framework to Study the Impact of Climate Change and Land Management.' results_simulations_constant_porosity.csv file contains the simulation results of the modeling framework at constant porosity. results_simulations_dynamic_porosity.csv file contains the simulation results of the modeling framework at dynamic porosity. model_constant_phi is the python script with the description of the modeling framework at a constant porosity that needs to be imported and run along with the simulation constant porosity Jupyter notebook to obtain the simulation results at constant porosity. model_dynamic_phi is the python script describing the modeling framework at a dynamic porosity that needs to be imported and run along with the simulation dynamic porosity notebook to obtain the simulation results at dynamic porosity.
Figure 3 and Figure 4 Jupyter notebooks have the codes for those plots in the manuscript.
To find the updated version of the codes, visit the GitHub link: https://github.com/Achla-Jha/Soil-Structure.git.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Facebook
TwitterThis item contains data and code used in experiments that produced the results for Sadler et. al (2022) (see below for full reference). We ran five experiments for the analysis, Experiment A, Experiment B, Experiment C, Experiment D, and Experiment AuxIn. Experiment A tested multi-task learning for predicting streamflow with 25 years of training data and using a different model for each of 101 sites. Experiment B tested multi-task learning for predicting streamflow with 25 years of training data and using a single model for all 101 sites. Experiment C tested multi-task learning for predicting streamflow with just 2 years of training data. Experiment D tested multi-task learning for predicting water temperature with over 25 years of training data. Experiment AuxIn used water temperature as an input variable for predicting streamflow. These experiments and their results are described in detail in the WRR paper. Data from a total of 101 sites across the US was used for the experiments. The model input data and streamflow data were from the Catchment Attributes and Meteorology for Large-sample Studies (CAMELS) dataset (Newman et. al 2014, Addor et. al 2017). The water temperature data were gathered from the National Water Information System (NWIS) (U.S. Geological Survey, 2016). The contents of this item are broken into 13 files or groups of files aggregated into zip files:
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This is an Australian extract of Speedtest Open data available at Amazon WS (link below - opendata.aws).AWS data licence is "CC BY-NC-SA 4.0", so use of this data must be:- non-commercial (NC)- reuse must be share-alike (SA)(add same licence).This restricts the standard CC-BY Figshare licence.A world speedtest open data was dowloaded (>400Mb, 7M lines of data). An extract of Australia's location (lat, long) revealed 88,000 lines of data (attached as csv).A Jupyter notebook of extract process is attached.See Binder version at Github - https://github.com/areff2000/speedtestAU.+> Install: 173 packages | Downgrade: 1 packages | Total download: 432MBBuild container time: approx - load time 25secs.=> Error: Timesout - BUT UNABLE TO LOAD GLOBAL DATA FILE (6.6M lines).=> Error: Overflows 8GB RAM container provided with global data file (3GB)=> On local JupyterLab M2 MBP; loads in 6 mins.Added Binder from ARDC service: https://binderhub.rc.nectar.org.auDocs: https://ardc.edu.au/resource/fair-for-jupyter-notebooks-a-practical-guide/A link to Twitter thread of outputs provided.A link to Data tutorial provided (GitHub), including Jupyter Notebook to analyse World Speedtest data, selecting one US State.Data Shows: (Q220)- 3.1M speedtests | 762,000 devices |- 88,000 grid locations (600m * 600m), summarised as a point- average speed 33.7Mbps (down), 12.4M (up) | Max speed 724Mbps- data is for 600m * 600m grids, showing average speed up/down, number of tests, and number of users (IP). Added centroid, and now lat/long.See tweet of image of centroids also attached.NB: Discrepancy Q2-21, Speedtest Global shows June AU average speedtest at 80Mbps, whereas Q2 mean is 52Mbps (v17; Q1 45Mbps; v14). Dec 20 Speedtest Global has AU at 59Mbps. Could be possible timing difference. Or spatial anonymising masking shaping highest speeds. Else potentially data inconsistent between national average and geospatial detail. Check in upcoming quarters.NextSteps:Histogram - compare Q220, Q121, Q122. per v1.4.ipynb.Versions:v43. Added revised NZ vs AUS graph for Q325 (NZ; Q2 25) since had NZ available from Github (link below). Calc using PlayNZ.ipynb notebook. See images in Twitter - https://x.com/ValueMgmt/status/1981607615496122814v42: Added AUS Q325 (97.6k lines avg d/l 165.5 Mbps (median d/l 150.8 Mbps) u/l 28.08 Mbps). Imported using v2 Jupyter notebook (MBP 16Gb). Mean tests: 24.5. Mean devices: 6.02. Download, extract and publish: UNK - not measured mins. Download avg is double Q423. Noting, NBN increased D/L speeds from Sept '25; 100 -> 500, 250 -> 750. For 1Gbps, upload speed only increased from 50Mbps to 100Mbps. New 2Gbps services introduced on FTTP and HFC networks.v41: Added AUS Q225 (96k lines avg d/l 130.5 Mbps (median d/l 108.4 Mbps) u/l 22.45 Mbps). Imported using v2 Jupyter notebook (MBP 16Gb). Mean tests: 17.2. Mean devices: 5.11. Download, extract and publish: 20 mins. Download avg is double Q422.v40: Added AUS Q125 (93k lines avg d/l 116.6 Mbps u/l 21.35 Mbps). Imported using v2 Jupyter notebook (MBP 16Gb). Mean tests: 16.9. Mean devices: 5.13. Download, extract and publish: 14 mins.v39: Added AUS Q424 (95k lines avg d/l 110.9 Mbps u/l 21.02 Mbps). Imported using v2 Jupyter notebook (MBP 16Gb). Mean tests: 17.2. Mean devices: 5.24. Download, extract and publish: 14 mins.v38: Added AUS Q324 (92k lines avg d/l 107.0 Mbps u/l 20.79 Mbps). Imported using v2 Jupyter notebook (iMac 32Gb). Mean tests: 17.7. Mean devices: 5.33.Added github speedtest-workflow-importv2vis.ipynb Jupyter added datavis code to colour code national map. (per Binder on Github; link below).v37: Added AUS Q224 (91k lines avg d/l 97.40 Mbps u/l 19.88 Mbps). Imported using speedtest-workflow-importv2 jupyter notebook. Mean tests:18.1. Mean devices: 5.4.v36 Load UK data, Q1-23 and compare to AUS and NZ Q123 data. Add compare image (au-nz-ukQ123.png), calc PlayNZUK.ipynb, data load import-UK.ipynb. UK data bit rough and ready as uses rectangle to mark out UK, but includes some EIRE and FR. Indicative only and to be definitively needs geo-clean to exclude neighbouring countries.v35 Load Melb geo-maps of speed quartiles (0-25, 25-50, 50-75, 75-100, 100-). Avg in 2020; 41Mbps. Avg in 2023; 86Mbps. MelbQ323.png, MelbQ320.png. Calc with Speedtest-incHist.ipynb code. Needed to install conda mapclassify. ax=melb.plot(column=...dict(bins[25,50,75,100]))v34 Added AUS Q124 (93k lines avg d/l 87.00 Mbps u/l 18.86 Mbps). Imported using speedtest-workflow-importv2 jupyter notebook. Mean tests:18.3. Mean devices: 5.5.v33 Added AUS Q423 (92k lines avg d/l 82.62 Mbps). Imported using speedtest-workflow-importv2 jupyter notebook. Mean tests:18.0. Mean devices: 5.6. Added link to Github.v32 Recalc Au vs NZ for upload performance; added image. using PlayNZ Jupyter. NZ approx 40% locations at or above 100Mbps. Aus
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Data was imported from the BAK file found here into SQL Server, and then individual tables were exported as CSV. Jupyter Notebook containing the code used to clean the data can be found here
Version 6 has a some more cleaning and structuring that was noticed after importing in Power BI. Changes were made by adding code in python notebook to export new cleaned dataset, such as adding MonthNumber for sorting by month number, similar for WeekDayNumber.
Cleaning was done in python while also using SQL Server to quickly find things. Headers were added separately, ensuring no data loss.Data was cleaned for NaN, garbage values and other columns.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This is the replication package for the paper "NeoModeling Framework: Leveraging Graph-Based Persistence for Large-Scale Model-Driven Engineering" where we present Neo Modeling Framework (NMF), an open-source set of tools primarily designed to manipulate ultra-large datasets in the Neo4j database.
The best way to run NMF is following the instructions at our GitHub repository. A copy of the Readme file is also present inside the zip file available here.
Make sure that you follow the instructions to run NMF.
The quantitative evaluation can be re-run by running RQ1Eval.kt, RQ2Eval.kt inside modelLoader/src/test/kotlin/evaluation and RQ2Eval.kt inside modelEditor/src/test/kotlin/evaluation.
Make sure that you have an empty instance of Neo4j running.
Results will be generated as CSV files, under Evaluation/results and the results can be plotted by running the Jupyter Notebooks at Evaluation/analysis.
Please note that due to differences in hardware, re-running the experiments will probably generate slightly different results than those reported in the paper.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Performance results of different scheduling algorithms used in the simulation of a modern game engine
These results are a companion to the paper entitled "Exploring scheduling algorithms for parallel task graphs: a modern game engine case study" by M. Regragui et al.
General information
This dataset contains raw outputs and scripts to visualize and analyze the scheduling results from our game engine simulator. The result analysis can be directly reproduced using the script run_analysis.sh. A series of Jupyter Notebook files are also available to help visualize the results.
File information
The results are organized in five folders:
Each result file (e.g., HLF_NonSorted_Random_1_200_10.txt) contains 200 lines representing information of the 200 frames that were simulated. Each line contains four values: the frame number, the duration of the frame (in microseconds), a critical path estimation for the previous frame (in microseconds), and the load parameter (value between 0 and 1).
The outputs of this analysis include some PDF files representing the figures in the paper (in order) and some CSV files representing the values shown in tables. The standard output shows the p-values computed in parts of the statistical analysis.
Software and hardware information
The simulation results were generated on an Intel Core i7-1185G7 processor, with 32 GB of LPDDR4 RAM (3200 MHz). The machine ran on Ubuntu 20.04.3 LTS (5.14.0-1034-oem), and g++ 9.4.0 was used for the simulator's compilation (-O3 flag).
The results were analyzed using Python 3.8.10, pip 20.0.2 and jupyter-notebook 6.0.3. The following packages and their respective versions were used:
Simulation information
Simulation results were generated from 4 to 20 resources. Each configuration was run with 50 different RNG seeds (1 up to 50).
Each simulation is composed of 200 frames. The load parameter (lag) starts at zero and increases by 0.01 with each frame up to a value equal to 100% in frame 101. After that, the load parameter starts to decrease in the same rhythm down to 0.01 in frame 200.
Algorithms abbreviation in presentation order
FIFO serves as the baseline for comparisons.
Metrics
Facebook
TwitterDetailed household load and solar generation in minutely to hourly resolution. This data package contains measured time series data for several small businesses and private households relevant for household- or low-voltage-level power system modeling. The data includes solar power generation as well as electricity consumption (load) in a resolution up to single device consumption. The starting point for the time series, as well as data quality, varies between households, with gaps spanning from a few minutes to entire hours. In general, data is adjusted to fit uniform, regular time intervals without changing its validity. Except for small gaps, filled using linear interpolation. The numbers are cumulative power consumption/generation over time. Hence overall energy consumption/generation is retained in case of data gaps. Measurements were initially conducted in 3-minute intervals, later in 1-minute intervals. Data for both measurement resolutions are published separately in large CSV files. Additionally, data in 15 and 60-minute resolution is provided for compatibility with other time series data. Data processing is conducted in Jupyter Notebooks/Python/pandas.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
df_force_kin_filtered.csv is the data sheet used for the DATA3 python notebook to analyse kinematics and dynamics combined. It contains the footfalls that hava data for both: kinematics and dynamics. To see how this file is generated, read the first half of the jupyter notebook