The BUTTER Empirical Deep Learning Dataset represents an empirical study of the deep learning phenomena on dense fully connected networks, scanning across thirteen datasets, eight network shapes, fourteen depths, twenty-three network sizes (number of trainable parameters), four learning rates, six minibatch sizes, four levels of label noise, and fourteen levels of L1 and L2 regularization each. Multiple repetitions (typically 30, sometimes 10) of each combination of hyperparameters were preformed, and statistics including training and test loss (using a 80% / 20% shuffled train-test split) are recorded at the end of each training epoch. In total, this dataset covers 178 thousand distinct hyperparameter settings ("experiments"), 3.55 million individual training runs (an average of 20 repetitions of each experiments), and a total of 13.3 billion training epochs (three thousand epochs were covered by most runs). Accumulating this dataset consumed 5,448.4 CPU core-years, 17.8 GPU-years, and 111.2 node-years.
This graffiti-centred change detection dataset was developed in the context of INDIGO, a research project focusing on the documentation, analysis and dissemination of graffiti along Vienna's Donaukanal. The dataset aims to support the development and assessment of change detection algorithms.
The dataset was collected from a test site approximately 50 meters in length along Vienna's Donaukanal during 11 days between 2022/10/21 and 2022/12/01. Various cameras with different settings were used, resulting in a total of 29 data collection sessions or "epochs" (see "EpochIDs.jpg" for details). Each epoch contains 17 images generated from 29 distinct 3D models with different textures. In total, the dataset comprises 6,902 unique image pairs, along with corresponding reference change maps. Additionally, exclusion masks are provided to ignore parts of the scene that might be irrelevant, such as the background.
To summarise, the dataset, labelled as "Data.zip," includes the following:
Image acquisition involved the use of two different camera setups. The first two datasets (ID 1 and 2; cf. "EpochIDs.jpg") were obtained using a Nikon Z 7II camera with a pixel count of 45.4 MP, paired with a Nikon NIKKOR Z 20 mm lens. For the remaining image datasets (ID 3-29), a triple GoPro setup was employed. This triple setup featured three GoPro cameras, comprising two GoPro HERO 10 cameras and one GoPro HERO 11, all securely mounted within a frame. This triple-camera setup was utilised on nine different days with varying camera settings, resulting in the acquisition of 27 image datasets in total (nine days with three datasets each).
The "Data.zip" file contains two subfolders:
A detailed dataset description (including detailed explanations of the data creation) is part of a journal paper currently in preparation. The paper will be linked here for further clarification as soon as it is available.
Due to the nature of the three image types, this dataset comes with two licenses:
Every synthetic image, change map and mask has this licensing information embedded as IPTC photo metadata. In addition, the images' IPTC metadata also provide a short image description, the image creator and the creator's identity (in the form of an ORCiD).
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
If there are any questions, problems or suggestions for the dataset or the description, please do not hesitate to contact the corresponding author, Benjamin Wild.
The SAGE-SMC pro ject is a Cycle 4 legacy program on the Spitzer Space Telescope, entitled, SAGE-SMC: Surveying the Agents of Galaxy Evolution in the Tidally-Disrupted, Low-Metallicity Small Magellanic Cloud, with Karl Gordon (STScI) as the PI. The project overview and initial results are described in a paper by Gordon et al. (2010, in prep). The SMC was mapped at two different epochs dubbed Epochs 1 and 2 separated by 3 (IRAC) and 9 (MIPS) months, as this provides a 90-degree roll angle in the orientation of the detectors, which optimally removes the striping artifacts in MIPS and artifacts along columns and rows in the IRAC image data. In addition, these two epochs are useful constraints of source variability expected for evolved stars and some young stellar ob jects (YSOs). The IRAC and MIPS observations from the S3MC pathfinder survey of the inner 3 sq. deg. of the SMC (PI: Bolatto, referred to as Epoch 0) have been reduced using the same software. To be included in each single epoch catalog, each 24 um source has to meet a number of criteria. The source had to be nearly point like with a correlation value >0.89. This removed approximately 2/3 of the entries in the single epoch source lists. In regions where there is a significant structure in the surrounding region (identified as having a sigma > 0.25 in a 120" width square box), the source had to have a correlation value >0.91. This requirement removed a small number of sources (70). Finally, all sources had to have signal-to-noise (S/N) values >5. The S/N used was that estimated from the StarFinder code using the mosaic uncertainty image added in quadrature with an 0.6% error due to the background subtraction. This removed 700 sources. The final Epoch 1 catalog likely has a few remaining unreliable sources, estimated to be at less than the 1% level. The Full List contains ALL sources extracted from the MIPS 24 um mosaics, thus a user should be aware that it contains spurious sources. For more details, see Section 4.1 of the SAGE-SMC Data Delivery Document.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Introduction
The 802.11 standard includes several management features and corresponding frame types. One of them are Probe Requests (PR), which are sent by mobile devices in an unassociated state to scan the nearby area for existing wireless networks. The frame part of PRs consists of variable-length fields, called Information Elements (IE), which represent the capabilities of a mobile device, such as supported data rates.
This dataset contains PRs collected over a seven-day period by four gateway devices in an uncontrolled urban environment in the city of Catania.
It can be used for various use cases, e.g., analyzing MAC randomization, determining the number of people in a given location at a given time or in different time periods, analyzing trends in population movement (streets, shopping malls, etc.) in different time periods, etc.
Related dataset
Same authors also produced the Labeled dataset of IEEE 802.11 probe requests with same data layout and recording equipment.
Measurement setup
The system for collecting PRs consists of a Raspberry Pi 4 (RPi) with an additional WiFi dongle to capture WiFi signal traffic in monitoring mode (gateway device). Passive PR monitoring is performed by listening to 802.11 traffic and filtering out PR packets on a single WiFi channel.
The following information about each received PR is collected: - MAC address - Supported data rates - extended supported rates - HT capabilities - extended capabilities - data under extended tag and vendor specific tag - interworking - VHT capabilities - RSSI - SSID - timestamp when PR was received.
The collected data was forwarded to a remote database via a secure VPN connection. A Python script was written using the Pyshark package to collect, preprocess, and transmit the data.
Data preprocessing
The gateway collects PRs for each successive predefined scan interval (10 seconds). During this interval, the data is preprocessed before being transmitted to the database. For each detected PR in the scan interval, the IEs fields are saved in the following JSON structure:
PR_IE_data = { 'DATA_RTS': {'SUPP': DATA_supp , 'EXT': DATA_ext}, 'HT_CAP': DATA_htcap, 'EXT_CAP': {'length': DATA_len, 'data': DATA_extcap}, 'VHT_CAP': DATA_vhtcap, 'INTERWORKING': DATA_inter, 'EXT_TAG': {'ID_1': DATA_1_ext, 'ID_2': DATA_2_ext ...}, 'VENDOR_SPEC': {VENDOR_1:{ 'ID_1': DATA_1_vendor1, 'ID_2': DATA_2_vendor1 ...}, VENDOR_2:{ 'ID_1': DATA_1_vendor2, 'ID_2': DATA_2_vendor2 ...} ...} }
Supported data rates and extended supported rates are represented as arrays of values that encode information about the rates supported by a mobile device. The rest of the IEs data is represented in hexadecimal format. Vendor Specific Tag is structured differently than the other IEs. This field can contain multiple vendor IDs with multiple data IDs with corresponding data. Similarly, the extended tag can contain multiple data IDs with corresponding data.
Missing IE fields in the captured PR are not included in PR_IE_DATA.
When a new MAC address is detected in the current scan time interval, the data from PR is stored in the following structure:
{'MAC': MAC_address, 'SSIDs': [ SSID ], 'PROBE_REQs': [PR_data] },
where PR_data is structured as follows:
{ 'TIME': [ DATA_time ], 'RSSI': [ DATA_rssi ], 'DATA': PR_IE_data }.
This data structure allows to store only 'TOA' and 'RSSI' for all PRs originating from the same MAC address and containing the same 'PR_IE_data'. All SSIDs from the same MAC address are also stored. The data of the newly detected PR is compared with the already stored data of the same MAC in the current scan time interval. If identical PR's IE data from the same MAC address is already stored, only data for the keys 'TIME' and 'RSSI' are appended. If identical PR's IE data from the same MAC address has not yet been received, then the PR_data structure of the new PR for that MAC address is appended to the 'PROBE_REQs' key. The preprocessing procedure is shown in Figure ./Figures/Preprocessing_procedure.png
At the end of each scan time interval, all processed data is sent to the database along with additional metadata about the collected data, such as the serial number of the wireless gateway and the timestamps for the start and end of the scan. For an example of a single PR capture, see the Single_PR_capture_example.json file.
Folder structure
For ease of processing of the data, the dataset is divided into 7 folders, each containing a 24-hour period. Each folder contains four files, each containing samples from that device.
The folders are named after the start and end time (in UTC). For example, the folder 2022-09-22T22-00-00_2022-09-23T22-00-00 contains samples collected between 23th of September 2022 00:00 local time, until 24th of September 2022 00:00 local time.
Files representing their location via mapping: - 1.json -> location 1 - 2.json -> location 2 - 3.json -> location 3 - 4.json -> location 4
Environments description
The measurements were carried out in the city of Catania, in Piazza Università and Piazza del Duomo The gateway devices (rPIs with WiFi dongle) were set up and gathering data before the start time of this dataset. As of September 23, 2022, the devices were placed in their final configuration and personally checked for correctness of installation and data status of the entire data collection system. Devices were connected either to a nearby Ethernet outlet or via WiFi to the access point provided.
Four Raspbery Pi-s were used: - location 1 -> Piazza del Duomo - Chierici building (balcony near Fontana dell’Amenano) - location 2 -> southernmost window in the building of Via Etnea near Piazza del Duomo - location 3 -> nothernmost window in the building of Via Etnea near Piazza Università - location 4 -> first window top the right of the entrance of the University of Catania
Locations were suggested by the authors and adjusted during deployment based on physical constraints (locations of electrical outlets or internet access) Under ideal circumstances, the locations of the devices and their coverage area would cover both squares and the part of Via Etna between them, with a partial overlap of signal detection. The locations of the gateways are shown in Figure ./Figures/catania.png.
Known dataset shortcomings
Due to technical and physical limitations, the dataset contains some identified deficiencies.
PRs are collected and transmitted in 10-second chunks. Due to the limited capabilites of the recording devices, some time (in the range of seconds) may not be accounted for between chunks if the transmission of the previous packet took too long or an unexpected error occurred.
Every 20 minutes the service is restarted on the recording device. This is a workaround for undefined behavior of the USB WiFi dongle, which can no longer respond. For this reason, up to 20 seconds of data will not be recorded in each 20-minute period.
The devices had a scheduled reboot at 4:00 each day which is shown as missing data of up to a few minutes.
Location 1 - Piazza del Duomo - Chierici
The gateway device (rPi) is located on the second floor balcony and is hardwired to the Ethernet port. This device appears to function stably throughout the data collection period. Its location is constant and is not disturbed, dataset seems to have complete coverage.
Location 2 - Via Etnea - Piazza del Duomo
The device is located inside the building. During working hours (approximately 9:00-17:00), the device was placed on the windowsill. However, the movement of the device cannot be confirmed. As the device was moved back and forth, power outages and internet connection issues occurred. The last three days in the record contain no PRs from this location.
Location 3 - Via Etnea - Piazza Università
Similar to Location 2, the device is placed on the windowsill and moved around by people working in the building. Similar behavior is also observed, e.g., it is placed on the windowsill and moved inside a thick wall when no people are present. This device appears to have been collecting data throughout the whole dataset period.
Location 4 - Piazza Università
This location is wirelessly connected to the access point. The device was placed statically on a windowsill overlooking the square. Due to physical limitations, the device had lost power several times during the deployment. The internet connection was also interrupted sporadically.
Recognitions
The data was collected within the scope of Resiloc project with the help of City of Catania and project partners.
The INDIGO Change Detection Reference Dataset Description This graffiti-centred change detection dataset was developed in the context of INDIGO, a research project focusing on the documentation, analysis and dissemination of graffiti along Vienna's Donaukanal. The dataset aims to support the development and assessment of change detection algorithms. The dataset was collected from a test site approximately 50 meters in length along Vienna's Donaukanal during 11 days between 2022/10/21 and 2022/12/01. Various cameras with different settings were used, resulting in a total of 29 data collection sessions or "epochs" (see "EpochIDs.jpg" for details). Each epoch contains 17 images generated from 29 distinct 3D models with different textures. In total, the dataset comprises 6,902 unique image pairs, along with corresponding reference change maps. Additionally, exclusion masks are provided to ignore parts of the scene that might be irrelevant, such as the background. To summarise, the dataset, labelled as "Data.zip," includes the following: Synthetic Images: These are colour images created within Agisoft Metashape Professional 1.8.4, generated by rendering views from 17 artificial cameras observing 29 differently textured versions of the same 3D surface model. Change Maps: Binary images that were manually and programmatically generated, using a Python script, from two synthetic graffiti images. These maps highlight the areas where changes have occurred. Exclusion Masks: Binary images are manually created from synthetic graffiti images to identify "no data" areas or irrelevant ground pixels. Image Acquisition Image acquisition involved the use of two different camera setups. The first two datasets (ID 1 and 2; cf. "EpochIDs.jpg") were obtained using a Nikon Z 7II camera with a pixel count of 45.4 MP, paired with a Nikon NIKKOR Z 20 mm lens. For the remaining image datasets (ID 3-29), a triple GoPro setup was employed. This triple setup featured three GoPro cameras, comprising two GoPro HERO 10 cameras and one GoPro HERO 11, all securely mounted within a frame. This triple-camera setup was utilised on nine different days with varying camera settings, resulting in the acquisition of 27 image datasets in total (nine days with three datasets each). Data Structure The "Data.zip" file contains two subfolders: 1_ImagesAndChangeMaps: This folder contains the primary dataset. Each subfolder corresponds to a specific epoch. Within each epoch folder resides a subfolder for every other epoch with which a distinct epoch pair can be created. It is important to note that the pairs "Epoch Y and Epoch Z" are equivalent to "Epoch Z and Epoch Y", so the latter combinations are not included in this dataset. Each sub-subfolder, organised by epoch, contains 17 more subfolders, which hold the image data. These subfolders consist of: Two synthetic images rendered from the same synthetic camera ("X_Y.jpg" and "X_Z.jpg") The corresponding binary reference change map depicting the graffiti-related differences between the two images ("X_YZ.png"). Black areas denote new graffiti (i.e. "change"), and white denotes "no change". "DataStructure.png" provides a visual explanation concerning the creation of the dataset. The filenames follow the following pattern: X - Is the ID number of the synthetic camera. In total, 17 synthetic cameras were placed along the test site Y - Corresponds to the reference epoch (i.e. the "older epoch") Z - Corresponds to the "new epoch" 2_ExclusionMasks: This folder contains the binary exclusion masks. They were manually created from synthetic graffiti images and identify "no data" areas or areas considered irrelevant, such as "ground pixels". Two exclusion masks were generated for each of the 17 synthetic cameras: "groundMasks": depict ground pixels which are usually irrelevant for the detection of graffiti "noDataMasks": depict "background" for which no data is available. A detailed dataset description (including detailed explanations of the data creation) is part of a journal paper currently in preparation. The paper will be linked here for further clarification as soon as it is available. Licensing Due to the nature of the three image types, this dataset comes with two licenses: Synthetic images: These come with an In Copyright license (for the rights usage terms, see https://rightsstatements.org/page/InC/1.0/?language=en). The copyright lies with: the Ludwig Boltzmann Gesellschaft (https://d-nb.info/gnd/1024204324) the TU Wien (https://d-nb.info/gnd/55426-1) One or more anonymous graffiti creator(s) upon whose work these images are based. The first two entities are also the licensor of these images. Change maps and masks: These are openly licensed via CC BY-SA 4.0 (https://creativecommons.org/licenses/by-sa/4.0) In this case, the copyright lies with: the Ludwig Boltzmann Gesellschaft (https://d-nb.info/gnd/1024204324) the TU Wien (https://d-nb.info/gnd/55426-1) Both institutes are also the licensor of these images. Every synthetic image, change map and mask has this licensing information embedded as IPTC photo metadata. In addition, the images' IPTC metadata also provide a short image description, the image creator and the creator's identity (in the form of an ORCiD).
The SAGE-SMC pro ject is a Cycle 4 legacy program on the Spitzer Space Telescope, entitled, SAGE-SMC: Surveying the Agents of Galaxy Evolution in the Tidally-Disrupted, Low-Metallicity Small Magellanic Cloud, with Karl Gordon (STScI) as the PI. The project overview and initial results are described in a paper by Gordon et al. (2010, in prep). The SMC was mapped at two different epochs dubbed Epochs 1 and 2 separated by 3 (IRAC) and 9 (MIPS) months, as this provides a 90-degree roll angle in the orientation of the detectors, which optimally removes the striping artifacts in MIPS and artifacts along columns and rows in the IRAC image data. In addition, these two epochs are useful constraints of source variability expected for evolved stars and some young stellar ob jects (YSOs). The IRAC and MIPS observations from the S3MC pathfinder survey of the inner 3 sq. deg. of the SMC (PI: Bolatto, referred to as Epoch 0) have been reduced using the same software. In comparison to the catalog, the archive has more source fluxes (fewer nulled wavelengths) and some more sources but these additions have more uncertainty associated with them. For the catalog, a source must be detected in at least 60% of the observations at that wavelength, at least 32% of the observations in an adjacent band (the confirming band), and the S/N must be greater than [5, 5, 5, 7] for IRAC bands [3.6um], [4.5um], [5.8um] and [8.0um]. The 2MASS K_s band is counted as a detection. For a typical source, extracted from 2x12 sec frametime images, the minimum detection criterion amounts to being detected twice in one band (usually band 1 or 2) and once in an adjacent band (sometimes referred to as the 2+1 criterion). For the catalog, sources with neighbors within a 2" radius are excluded (culled). For the archive, sources within a 0.5" are excluded. For more details, see Section 3.3 of the SAGE-SMC Data Delivery Document.
The SAGE-SMC pro ject is a Cycle 4 legacy program on the Spitzer Space Telescope, entitled, SAGE-SMC: Surveying the Agents of Galaxy Evolution in the Tidally-Disrupted, Low-Metallicity Small Magellanic Cloud, with Karl Gordon (STScI) as the PI. The project overview and initial results are described in a paper by Gordon et al. (2010, in prep). The SMC was mapped at two different epochs dubbed Epochs 1 and 2 separated by 3 (IRAC) and 9 (MIPS) months, as this provides a 90-degree roll angle in the orientation of the detectors, which optimally removes the striping artifacts in MIPS and artifacts along columns and rows in the IRAC image data. In addition, these two epochs are useful constraints of source variability expected for evolved stars and some young stellar ob jects (YSOs). The IRAC and MIPS observations from the S3MC pathfinder survey of the inner 3 sq. deg. of the SMC (PI: Bolatto, referred to as Epoch 0) have been reduced using the same software. To be included in each single epoch catalog, each 24 um source has to meet a number of criteria. The source had to be nearly point like with a correlation value >0.89. This removed approximately 2/3 of the entries in the single epoch source lists. In regions where there is a significant structure in the surrounding region (identified as having a sigma > 0.25 in a 120" width square box), the source had to have a correlation value >0.91. This requirement removed a small number of sources (70). Finally, all sources had to have signal-to-noise (S/N) values >5. The S/N used was that estimated from the StarFinder code using the mosaic uncertainty image added in quadrature with an 0.6% error due to the background subtraction. This removed 700 sources. The final Epoch 1 catalog likely has a few remaining unreliable sources, estimated to be at less than the 1% level. The Full List contains ALL sources extracted from the MIPS 24 um mosaics, thus a user should be aware that it contains spurious sources. For more details, see Section 4.1 of the SAGE-SMC Data Delivery Document.
The SAGE-SMC pro ject is a Cycle 4 legacy program on the Spitzer Space Telescope, entitled, SAGE-SMC: Surveying the Agents of Galaxy Evolution in the Tidally-Disrupted, Low-Metallicity Small Magellanic Cloud, with Karl Gordon (STScI) as the PI. The project overview and initial results are described in a paper by Gordon et al. (2010, in prep). The SMC was mapped at two different epochs dubbed Epochs 1 and 2 separated by 3 (IRAC) and 9 (MIPS) months, as this provides a 90-degree roll angle in the orientation of the detectors, which optimally removes the striping artifacts in MIPS and artifacts along columns and rows in the IRAC image data. In addition, these two epochs are useful constraints of source variability expected for evolved stars and some young stellar ob jects (YSOs). The IRAC and MIPS observations from the S3MC pathfinder survey of the inner 3 sq. deg. of the SMC (PI: Bolatto, referred to as Epoch 0) have been reduced using the same software. To be included in each single epoch catalog, each 24 um source has to meet a number of criteria. The source had to be nearly point like with a correlation value >0.89. This removed approximately 2/3 of the entries in the single epoch source lists. In regions where there is a significant structure in the surrounding region (identified as having a sigma > 0.25 in a 120" width square box), the source had to have a correlation value >0.91. This requirement removed a small number of sources (70). Finally, all sources had to have signal-to-noise (S/N) values >5. The S/N used was that estimated from the StarFinder code using the mosaic uncertainty image added in quadrature with an 0.6% error due to the background subtraction. This removed 700 sources. The final Epoch 1 catalog likely has a few remaining unreliable sources, estimated to be at less than the 1% level. The Full List contains ALL sources extracted from the MIPS 24 um mosaics, thus a user should be aware that it contains spurious sources. For more details, see Section 4.1 of the SAGE-SMC Data Delivery Document. To access this resource via TAP, issue ADQL queries on the table named sagesmc_mips24ep1c.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Models trained on the small (6cls) or large (9cls) dataset. The training method indicates which metric is used to evaluate when to stop the training process: mAP@.5:.95 (mAP), F1-score (F1) or stopping after a high number of epochs (epoch).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This repository contains the CNN Wild Park dataset from the paper:
Graph Neural Networks for Learning Equivariant Representations of Neural NetworksMiltiadis Kofinas*, Boris Knyazev, Yan Zhang, Yunlu Chen, Gertjan J. Burghouts, Efstratios Gavves, Cees G. M. Snoek, David W. Zhang*ICLR 2024 (oral)https://arxiv.org/abs/2403.12143https://github.com/mkofinas/neural-graphs*Joint first and last authors
We introduce a new dataset of CNNs, which we term CNN Wild Park.The dataset consists of 117,241 checkpoints from 2,800 CNNs, trained for up to 1,000 epochs on CIFAR10.The CNNs vary in the number of layers, kernel sizes, activation functions, and residual connections between arbitrary layers.
More specifically, we construct the CNN Wild Park dataset by training 2,800 small CNNs with different architectures for 200 to 1,000 epochs on CIFAR10. We retain a checkpoint of its parameters every 10 steps and also record the test accuracy. The CNNs vary by:
Number of layers L in 2, 3, 4, 5.
Number of channels per layer c_l in [4, 8, 16, 32].
Kernel size of each convolution k_l in [3, 5, 7].
Activation functions at each layer are one of ReLU, GeLU, tanh, sigmoid, leaky ReLU, or the identity function.
Skip connections between two layers with at least one layer in between. Each layer can have at most one incoming skip connection. We allow for skip connections even in the case when the number of channels differ, to increase the variety of architectures and ensure independence between different architectural choices. We enable this by adding the skip connection only to the min(c_n, c_m) nodes.
We divide the dataset into train/val/test splits such that checkpoints from the same run are not contained in both the train and test splits.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset contains the records of anonymised user interactions in seven online courses at a Higher Education institution in Brazil. For each course, the dataset covers a period spanning from 2017.1 to 2018.1 equivalent to three Brazilian academic periods. All online courses used the Moodle learning platform.The dataset covers the following courses:F - An introductory course in Philosophy - mandatory for all studentsC - An introductory course in Religion - mandatory for all studentsS - An introductory course in Political Theory - mandatory for students of the School of Humanities and Social SciencesM1 - Differential and Difference Equations course - mandatory for students of the School of Engineering and Exact SciencesM2 - Single Variable Calculus course - mandatory for students of the School of Engineering and Exact SciencesE9 - An introductory course in the Design of Control Systems - mandatory for students of the School of Industrial EngineeringE0 - Foundations of Engineering course - mandatory for all students of the School of EngineeringThe data is compressed in .zip format and can be uncompressed by standard compression utilities. Each course has three separate files grouped by user interactions from different academic periods. For example, the records for the course 'F' are split into F1, F2 and F3. F1 covers the records of the first academic period whereas F2 and F3 contain the records for the second and third academic periods respectively. Note that each instance of a course is independent and that the same student (identified by the same id) may only occur in the same course but in different academic periods iff s/he failed and opted to retake that course in one of the following courses covered by the data available here. The student id is preserved among the courses and academic periods.A description of the log fields contained in this dataset can be found at: https://docs.moodle.org/dev/Event_2#Information_contained_in_events
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
The National Forest Climate Change Maps project was developed by the Rocky Mountain Research Station (RMRS) and the Office of Sustainability and Climate to meet the needs of national forest managers for information on projected climate changes at a scale relevant to decision making processes, including forest plans. The maps use state-of-the-art science and are available for every national forest in the contiguous United States with relevant data coverage. Currently, the map sets include variables related to precipitation, air temperature, snow (including snow residence time and April 1 snow water equivalent), and stream flow.\Historical (1975-2005) and future (2071-2090) precipitation and temperature data for the contiguous United States are ensemble mean values across 20 global climate models from the CMIP5 experiment (https://journals.ametsoc.org/doi/abs/10.1175/BAMS-D-11-00094.1), downscaled to a 4 km grid. For more information on the downscaling method and to access the data, please see Abatzoglou and Brown, 2012 (https://rmets.onlinelibrary.wiley.com/doi/full/10.1002/joc.2312) and the Northwest Knowledge Network (https://climate.northwestknowledge.net/MACA/). We used the MACAv2- Metdata monthly dataset; monthly precipitation values (mm) were summed over the season of interest (annual, winter, or summer). Absolute and percent change were then calculated between the historical and future time periods.Historical (1975-2005) and future (2071-2090) precipitation and temperature data for the state of Alaska were developed by the Scenarios Network for Alaska and Arctic Planning (SNAP) (https://snap.uaf.edu). These datasets have several important differences from the MACAv2-Metdata (https://climate.northwestknowledge.net/MACA/) products, used in the contiguous U.S. They were developed using different global circulation models and different downscaling methods, and were downscaled to a different scale (771 m instead of 4 km). While these cover the same time periods and use broadly similar approaches, caution should be used when directly comparing values between Alaska and the contiguous United States.Raster data are also available for download from RMRS site (https://www.fs.usda.gov/rm/boise/AWAE/projects/NFS-regional-climate-change-maps/categories/us-raster-layers.html), along with pdf maps and detailed metadata (https://www.fs.usda.gov/rm/boise/AWAE/projects/NFS-regional-climate-change-maps/downloads/NationalForestClimateChangeMapsMetadata.pdf).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Here we share simulation results comparing three Particle-In-Cell (PIC) codes:
- EPOCH (using the normal random seed and 20 additional runs with different seeds),
- LSP (using both explicit and implicit modes labeled LSP_E and LSP_I),
- WarpX
Notes:
- Since the previous upload, the WarpX particle energy distributions have been updated to include the virtual (py) momenta
- LSP makes calculations using a virtual dimension of 1 cm while the other codes use 1 meter
- See the corresponding manuscript for full problem description and context
Explanation of Folders/Files:
-> Energy_Diagnostics
- This directory has files containing field and particle energy information for each code
- Various file formats are used, see headers for information on data format
- As mentioned above LSP uses a different convention for the virtual dimension size, so the outputs must be multiplied by 100 for comparison
-> Particle_Spectra
- This directory contains calculated energy distributions of forward going particles for the various runs.
- The file number multiplied by 10 fs gives the simulation time (e.g. 30_Spectrum.dat is 300 fs)
- In the files, the first column is the energy bin (MeV), the second is total ion charge (C), and third is electron charge (C)
- As mentioned above LSP uses a different convention for the virtual dimension size, so the outputs must be multiplied by 100 for comparison
->Simulation_Input_Files
- EPOCH and WarpX simulation input decks (input files) are included here
- LSP is a commercial code. If you have access to the code and are interested in the input files, please contact us.
-> EPOCH_20_Runs_Density_Map
- An array of the simulation space denoting whether at least one of the 20 additional EPOCH simulations had non-zero ion density for the corresponding plot step (every 10 fs)
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Eight data sets that differ in whether they are used to predict investor trading in the same or subsequent periods with observations about the trading of neighbors in the social network. The periods are either daily or weekly windows. Moreover, we investigate separately investor influence over purchase and sale transactions. These three differences lead to eight distinct data sets. The size of the data sets ranges from just below 2,400 to almost 22 thousand observations. The labels are positive (set equal to 1) if an investor on a given day traded specific security in the same direction as at least one of his neighbors and negative (set equal to 0) otherwise. We use a sliding window with the size corresponding to the prediction window. In each window, for each ego investor, we create observations of instances of social influence in the neighborhood, given that at least one of the neighbors is active. An ego investor can be understood as a tippee and her neighbors as tippers. We record the specific behavior of investors in their neighborhood and, depending on the prediction period, the ego investors' behavior in the same or subsequent period. Initially, the data sets were highly imbalanced in terms of labels and, for this reason, were re-sampled to achieve a 1:3 label balance ratio.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Please cite as follows: Valavi R, Williams KJ, Liu N, Levick S, Giljohann KM, Johnson S, Botha EJ, Munroe SEM, Lehmann EA, Collings S, Searle R, Van Niel TG, Newnham G, Paget M, Joehnk K, Hosack GR, Harwood TD, Malley C, Gunawardana D, Sivanandam P, Carlile P, Richards AE, Tetreault Campbell S, Schmidt RK and Ferrier S (2024) HCAS 3.0 (1988-2022) base model estimate of habitat condition (250m grid), National Connectivity Index (NCI) 2.0, 3-year average annually rolling epochs of HCAS and NCI from 1990 to 2022, trends and other derivatives for continental Australia. Data Collection 62484. CSIRO, Canberra, Australia. Landing page: https://data.csiro.au/collection/csiro:62484.
This data collection comprises, for continental Australia, the 250m gridded Habitat Condition Assessment System (HCAS) version 3.0 (1988-2022) base model (HCAS v3.0) estimation of habitat condition for terrestrial biodiversity, updated National Connectivity Index (NCI) v2.0 using the HCAS v3.0 base model, 3-year antecedent average rolling short-term epochs of HCAS, NCI, connectivity-adjusted condition (NCIC) and Ecosystem Site Condition (ESC) from 1990 to 2022, trends and other derivatives for continental Australia. The HCAS v3.0 short-term epochs are derived from the base model, and the NCI, NCIC and ESC short-term epochs use the corresponding HCAS v3.0 epochs as an input (production version 3-2). Several other datasets support use and interpretation of the base model, epochs and trends.
Spatial raster datasets are provided in GeoTIFF format (*.tif) at 250m grid resolution, Geographic Datum of Australia (GDA) 1994 (Australian Albers, EPSG:3577). The original data, in a grid of 0.0025 degrees of latitude and longitude (EPSG:4283, GDA 1994), was projected to Australian Albers. Continuous data were bilinearly resampled resulting in some smoothing across pixels. Therefore, the unprojected data (EPSG:4283) are also provided as an option available to users. All GeoTIFFs are provided as Cloud Optimised GeoTIFFs (COGs). COGs are regular GeoTIFF files with an internal organisation to enable HTTP GET range requests to ask for just the needed parts of a file. In most cases, a corresponding *.png map image is also provided for quick views.
The HCAS v3.0 base model and epoch datasets are Habitat Condition indices that vary continuously from a theoretical minimum of 0.0 (ecosystem integrity removed) to a maximum of 1.0 (ecosystem integrity in reference condition). The same definition applies to Ecosystem Site Condition (ESC). The NCI also ranges continuously from 0.0 (unconnected removed habitat) to a maximum of 1.0 (fully connected intact habitat), as does the NCIC: from 0.0 (ecosystem integrity functionally extinguished) to a maximum of 1.0 (ecosystem integrity functionally connected and in reference condition).
The data extent of the curated projected collection is defined by any data pixel that intersected a coastline polygon using the land fraction dataset developed by Liu (2024) and the extent of National Land use data (ABARES, 2024b). The data and no-data extent of each grid layer encompasses the area within the Australian Continental Exclusive Economic Zone (EEZ), excluding territories of Cocos, Christmas, Norfolk, Macquarie, Heard, and McDonald Islands, as well as Antarctica (Liu and Newnham, 2024).
This HCAS v3.0 product suite substantially improves on and extends the former HCAS series 2 outputs. HCAS v3.0 uses 14 remotely sensed ecosystem characteristics derived from the Digital Earth Australia Surface Reflectance NBART Landsat Analysis Ready Data Collection 3 derivative products (Commonwealth of Australia, 2021). HCAS v3.1 supersedes HCAS v3.0.
Data descriptions are summarised in the accompanying "Habitat Condition Assessment System version 3.0: A guide to the 250-metre data collection" documentation, which can be downloaded separately from CSIRO's publication repository (see related links). Lineage: The HCAS v3.0 product suite was developed at 0.0025° grid resolution (9 arcsecond, unprojected spatial reference system EPSG:4283, approximately 250m) using a combination of: a) Fourteen annualised time series of remotely sensed ecosystem characteristic variables, 1988 to 2022, sourced from the Digital Earth Australia Surface Reflectance NBART Landsat Analysis Ready Data Collection 3 derivative products (Commonwealth of Australia, 2021); b) 56 environmental covariates selected from more than 120 'non-anthropogenic' candidates; c) Spatially inferred reference sites as training data sampled to represent the most intact remaining examples of Australia’s varied ecosystems and their environments; d) Spatially inferred reference sites as benchmark data sampled to represent both remotely sensed ecosystem characteristic variables and their environments from among the most intact remaining examples of Australia’s ecosystems.
The HCAS v3.0 reference ecosystem model was developed using a generalised additive model (GAM) with 257,307 reference training sites, 56 environmental covariates, and 14 remotely sensed ecosystem characteristic variables summarised over 35 years (1988 to 2022). The base model was developed using the 1988–2022 long-term epoch and 433,575 reference benchmark sites. Short-term epochs for each of the 14 remotely sensed ecosystem characteristic variables were derived as 3-year antecedent rolling averages (the target year and the two prior years), 1990 to 2022 (33 epochs).
All data products were derived using a 0.0025° grid (EPSG:4283, GDA94). Datasets were subsequently projected to a 250 x 250 m geographic grid in Australian Albers (EPSG:3577, GDA94) using bilinear resampling for continuous data and neighbourhood mode or nearest for ordinal or categorical data, respectively.
The "Habitat Condition Assessment System version 3.0: A guide to the 250-metre data collection" documentation, accompanying this data collection (see related links), outlines how the HCAS v3.0 differs from the HCAS v2.3 (Harwood et al., 2023; Williams et al., 2023a).
This data collection also includes an update to the National Connectivity Index version 2.0 method (NCI v2) base model and time series using the corresponding HCAS v3.0 inputs. The NCI method has not changed, only the input condition data.
Technical reports (forthcoming, mid-2025) provide details about the inputs, processing methods, outputs and uncertainty quantification, applied in developing HCAS version series 3. For the latest publications see: https://research.csiro.au/biodiversity-knowledge/projects/hcas/.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Caveat: this dataset extends the annual epochs used in the HCAS v2.1 data collection but may not represent an improvement in the regional accuracy of condition assessments. Please read the accompanying technical report for method details. This dataset has a limited distribution because it is under revision and will be superseded by HCAS v3.0 with anticipated publication in mid-2024.
This data collection comprises, for continental Australia, the 9-arcsecond gridded Habitat Condition Assessment System (HCAS) version 2.4 (2001-2018) base model (HCAS v2.4) estimation of habitat condition for terrestrial biodiversity, a series of 18-year epochs (including updated NCI v2.0), annual epochs between 2001 and 2021 derived from the base model, and a series of trend analyses derived from the annual epochs. Several other datasets support use and interpretation of the base model, epochs and trends.
The core collection in the HCAS v2.4 product suite comprises 32 datasets (four 18-year epochs, 21 annual epochs, seven trends), and a larger number of input and supporting datasets to inform use and interpretation, including four corresponding 18-year epochs of the NCI v2.0. The majority are raster datasets in GeoTIFF format (*.tif) at 0.0025° resolution (approx. 250 m grids), Geographic Datum of Australia (GDA) 1994 (EPSG:4283). All GeoTIFFs are also provided as Cloud Optimised GeoTIFFs (COGs) in relevant subfolders labelled ‘COG’. COGs are regular GeoTIFF files with an internal organisation to enable HTTP GET range requests to ask for just the needed parts of a file.
The HCAS v2.4 base model and epochs datasets are habitat condition indices that vary continuously from a theoretical minimum of 0.0 (ecosystem integrity removed) to a maximum of 1.0 (ecosystem integrity in reference condition). The index represents the contribution that a given site (grid cell) makes to the effective area of ecosystem integrity remaining within any given spatial reporting unit, expressed as a proportion of the contribution made by a site in reference condition.
This HCAS v2.4 product suite was developed to extend the annual epochs through to 2021, using remote sensing generated from the Moderate Resolution Imaging Spectroradiometer (MODIS) satellite. The MODIS satellite is reaching its end of life and future HCAS updates will be developed using Landsat satellite imagery, as well as other satellite inputs. The MODIS satellite will continue to operate, in a declining orbit, through to December 2025.
Note: The remote sensing data originally derived for these analyses included the calendar year 2022. However, investigations into reasons for a downward trend in the continental average of the 2022 condition epoch compared with the 2021 epoch, led to the discovery of a data download error in the remote sensing imagery. Missing tiles across some regions of Australia led to erroneous habitat condition estimates using the 2022 data. This affected the 2022 annual epoch, and the 2005–22 and 2001–2022 long-term epochs). The effect is muted in the continental average of the long-term epochs but obvious in analyses using the annual epoch. Therefore, results using the 2022 data were removed from this data collection. Because the analyses using these data for the EN01 measure in the DCCEEW 2022-23 Annual Report were presented as continental and land zonal averages, the data bias error in 2022 does not alter that overall result.
Technical report: Williams KJ, Valavi R, Lehmann EA, Van Niel T, Harwood T, Newnham G, Paget M, Donohue R, Giljohann KM, Liu N, Lyon P, Pinner L, Malley C, Deb D and Ferrier S (2023) Habitat Condition Assessment System (HCAS): developing HCAS v2.4 with annual epochs updated to 2021. Publication number EP2023-4902. CSIRO, Canberra, Australia. https://publications.csiro.au/publications/publication/PIcsiro:EP2023-4902.
Details are summarised in the product brief provided with this data collection. Lineage: The HCAS v2.4 product suite was developed using a combination of: a) time series remotely sensed ecosystem characteristic variables, derived from the CSIRO MODIS fractional cover dataset (Guerschman and Hill, 2018) and MODIS persistent and annual vegetation cover (Donohue et al., 2014; Donohue et al., 2009), derived from MODIS Collection 6.0 ; b) gridded environmental covariates, including climate from the 9s climatology for continental Australia 1976-2005 (Harwood et al., 2016a), soil and landscape data from the Soil and Landscape Grid of Australia aggregated to 9s (Gallant et al., 2018), used to predict reference ecosystem condition; and c) Spatially inferred reference sites sampled to represent Australia’s varied environments, used as training data to build the model of reference ecosystem condition and as benchmarks for calculating proximity to reference condition for all test sites.
Specifically, the HCAS v2.4 reference ecosystem model was developed using a generalised additive model (GAM) with c.239,000 reference sites, 27 environmental covariates and seven remotely sensed ecosystem characteristic variables summarised over 18 years (2001 to 2018). The base model was developed using the 2001–18 long term epoch for comparability with the original HCAS v2.1 (Harwood et al., 2021). The same seven ecosystem characteristic fractional cover summary variables derived from the MODIS satellite were used. This is the last year in which the MODIS satellite can provide reliable Earth observations, as it reaches its end of life. Future HCAS versions will transition to alternative ongoing Earth observation platforms (e.g. Landsat).
The HCAS v2.4 differs from the original HCAS v2.1 in the following ways: • The number, location and ecological representativeness of the reference sites (including expert nominated inclusions and exclusions) • The algorithm used to partition remotely sensed persistent and recurrent green cover fractions (the algorithm used to derive litter and bare fractions remained the same) • The number of environmental covariates used to model reference ecosystem patterns • Corrected variance standardisation of remotely sensed ecosystem characteristic variables for principal components • The use of univariate generalised additive modelling (GAM) to model the remotely sensed ecosystem characteristic principal components and predict the ecosystem reference state • The same reference sites used as training data were also used as benchmarks • The scaling method applied to the uncalibrated output to obtain the 0-1 score • The method of summarising remotely sensed ecosystem characteristic variables for use in annual condition epochs of habitat condition (summary variables for the long term epochs remained the same) • Annual epochs were extended by three years to 2021 (21 in total, calendar years) • Additional 18-year epochs were generated: 2002–19, 2003–20, 2004–21.
Technical report: Williams KJ, Valavi R, Lehmann EA, Van Niel T, Harwood T, Newnham G, Paget M, Donohue R, Giljohann KM, Liu N, Lyon P, Pinner L, Malley C, Deb D and Ferrier S (2023) Habitat Condition Assessment System (HCAS): developing HCAS v2.4 with annual epochs updated to 2021. Publication number EP2023-4902. CSIRO, Canberra, Australia. https://publications.csiro.au/publications/publication/PIcsiro:EP2023-4902.
This technical report provides details about the inputs, processing methods and outputs, with a focus on work leading to development of HCAS v2.4 and comparison with the original HCAS v2.1.
This data collection also includes an update to the National Connectivity Index version 2.0 (NCI v2) using the HCAS v2.4 as an input.
Details are summarised in the product brief provided with this data collection.
Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
License information was derived automatically
Abstract: This dataset provides the forecast output csv files for the relevant forecast location points for Periods 4 to 9 of the Tidal and Storm Surge Forecasting Service for the Coast of Ireland. They do not cover all points for which a visual forecast was produced and the exact points vary from period to period to reflect development of the system. It is important to emphasise that for period 10, the forecast point naming was changed and the reference used in Periods 4 to 9 may differ from that used in Period 10 for the same point. The csv files were produced daily for each forecast point in the am and extend for a period of 72 hours from their start point (00:00UTC). The files contain forecasts of surge in m, total water level in m OD Malin and astronomic tide in m (relative to Chat Datum) at 15 minute intervals. They also contain details of the forecast point location and the date and time of forecast generation. The forecast data was produced from a hydrodynamic model using best available bathymetry, output from a tidal model as a boundary condition and operational weather forecast data as a forcing condition. Hence the forecast output will be affected by deviations between the weather forecast and actual weather. The Tidal and Storm Surge Forecasting Service was intended to provide information for Local Authorities, and other relevant stakeholders, to enable them to make informed decisions on the management of coastal flood risk. Three forecasts were produced daily, two morning forecasts looking 72 and 144 hours ahead and an evening forecast looking 72 hours ahead. Lineage: This dataset provides the forecast output csv files for the relevant forecast location points for Periods 1 to 9 of the Tidal and Storm Surge Forecasting Service for the Coast of Ireland. The csv files were produced directly from the forecast model runs and are based on the available weather forecast data at time of run. It should be noted that as each forecast is 72 hours long and new csv files were produced every 24 hours, any single period will be covered by 3 forecast files. It would be expected that forecasts produced closer to the time of interest will be more accurate to reflect improved weather forecasts. Purpose: The Tidal and Storm Surge Forecasting Service for the Coast of Ireland was intended to provide information for Local Authorities and other relevant stakeholders to enable them to make informed decisions on the management of coastal flood risk. Three forecasts were produced daily, two morning forecasts looking 72 and 144 hours ahead and an evening forecast looking 72 hours ahead. Forecasts were provided at up to 57 forecasting points, 15 National Points around the coast of Ireland and five more detailed local forecasting areas at Dundalk Bay, Wexford Harbour, Cork Harbour, Shannon Estuary and Galway Bay, with the exact numbers at any time depending on the level of model development.
https://data.linz.govt.nz/license/attribution-4-0-international/https://data.linz.govt.nz/license/attribution-4-0-international/
For further information about this dataset, see the Kaikoura earthquake information.
It is likely that many of these coordinates will be updated multiple times as marks move due to aftershocks and ongoing post-seismic deformation. It is therefore critical that the datum version and coordinate epoch date are recorded with any coordinates sourced from this dataset, along with the date the coordinates were accessed or downloaded.
These coordinates are computed from Continuously Operating Reference Station (CORS) data and geodetic surveys undertaken after the 14 November 2016 Kaikoura earthquake. They reflect earthquake movements up until the epoch date that is associated with each coordinate. Where possible, coordinates sourced from this dataset for use as control or calibration points in a project should be at the same or similar epochs. If not, post-seismic deformation may mean that new observations or coordinates do not fit well with these coordinates. Coordinates used as control or calibration points should also be well-distributed over the project area, so that any discrepancies resulting from the survey date being significantly different from the coordinate epoch date can be identified. If such discrepancies are identified, it may be necessary to use the LINZ PositioNZ-PP online processing service to generate control coordinates at the same (or nearly the same) epoch as the survey date.
Coordinates were calculated using SNAP v2.5.48. The origin of non-CORS coordinates is PositioNZ CORS that have been updated to include earthquake movements.
The 95% confidence interval uncertainties of coordinates are 0.02m horizontally and 0.03m vertically, relative to the PositioNZ network, at the epoch specified. In areas experiencing significant ongoing seismic activity, coordinates at the same mark at other epochs may differ by more than these uncertainties.
These coordinates are suitable for use in surveys and other geospatial positioning activities in the area impacted by the Kaikoura earthquake.
Arabidopsis Positional History DatasetInformation includes chromosome start/stop, strand, duplicate state (parent=P, duplicate=AT number, or interrupter=I), TAIR9 gene description, the chromosomal position for each outgroup (-, S, G, FN, FB, F), the numerical synteny value for each chromosomal position (1=S except for grape, where 2=S; 0.2=FB or FN, 0.1=G or -, and 0=F), total synteny value (the average of all synteny values for all outgroups), whether or not the gene had any exons (TAIR9 data), TAIR8 chromosome start/stop, functional gene space, the difference between gene space and functional gene space. Functional gene space, CNS data, GO terms, and homeolog data is from Version 2 as annotated by S. Subramaniam in this lab, unpublished. The CNS data was originally from TAIR8 but has been merged with TAIR9. Following this are TAIR9 GEvo link, and best hit data in A. lyrata and poplar for all transposed genes.Supplemental Dataset 1.xls
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The output dataset as described in the article in title, extracted from COMET LiCSAR dataset in March 2021.
We also provide a snapshot of the python3 codes used to generate the output dataset.
Contents and description of the dataset:
uaz_values.csv
==============
Contains u_az values and other relevant data in columns:
frame - ID of related frame (same as in frame_values.csv)
esd_master - reference acquisition epoch
epoch - date of acquisition epoch
daz_total_wrt_orbits - original extracted azimuth shift w.r.t. orbits
daz_cc_wrt_orbits - original extracted azimuth shift w.r.t. orbits from intensity cross-correlation (prior to spectral diversity)
drg_wrt_orbits - original extracted range shift w.r.t. orbits
orbits_precision - precision of applied orbits (P..precise, R..restituted)
version - orbits version
daz_iono_grad_mm - u_az from ionosphere propagation
tecs_A - estimated TECs at centre of hyphotetical burst A
tecs_A - estimated TECs at centre of hyphotetical burst B
daz_mm_notide - u_az after correction of solid-earth tides
daz_mm_notide_noiono_grad - u_az after correction of solid-earth tides and ionospheric gradient propagation
is_outlier_* - flag of outlier datapoint, as identified through Huber loss function (related to velocity estimates in frame_values.csv)
frame_values.csv
================
Contains along-track velocity estimates and other relevant data in columns:
frame - ID of related frame
master - reference acquisition epoch
center_lon - longitude coordinate of the frame centre
center_lat - latitude coordinate of the frame centre
heading - satellite heading angle (from the geograpic north)
azimuth_resolution - extracted azimuth pixel spacing (in metres)
range_resolution - extracted range pixel spacing (in metres)
avg_incidence_angle - average incidence angle of the frame
centre_range_m - approximate slant distance between the satellite and centre of the frame
centre_time - acquisition time (UTC) of centre of the frame at the reference epoch (appliable to other epochs)
s1AorB - flag of S-1A or B of the reference epoch
slope_plates_vel_azi_itrf2014 - along-track velocity estimated from ITRF2014 plate motion model
slope_daz_mm_mmyear - estimated along-track velocity from the original u_az values (in mm/year)
slope_daz_mm_notide_mmyear - estimated along-track velocity from u_az values after correction on solid-earth tides
slope_daz_mm_notide_noiono_grad_mmyear - estimated along-track velocity from u_az values after correction on solid-earth tides and ionosphere
intercept_* - corresponding intercept (in mm)
*_RMSE_selection - RMSE of outlier-free u_az data samples
*_count_selection - count of outlier-free u_az data samples used to estimate corresponding velocity
*_RMSE_mmy_full - RMSE from all u_az data samples applying corresponding velocity and intercept (in mm/year)
decomposed_grid.csv
===================
Contains decomposed velocities (in 250x250 km spacing grid) and other relevant data in columns:
count - count of frames used for the decomposition
opass - orbital pass codes of the input frames (D..descending, A..ascending)
centroid_lon - longitude coordinate of the grid cell centre
centroid_lat - latitude coordinate of the grid cell centre
VEL_N_noTI - northward velocity component from data corrected for solid-earth tides and ionosphere
VEL_E_noTI - eastward velocity component from data corrected for solid-earth tides and ionosphere
VEL_N_noT - northward velocity component from data corrected for solid-earth tides
VEL_E_noT - eastward velocity component from data corrected for solid-earth tides
ITRF_N - northward velocity component from averaged ITRF2014 plate motion model
ITRF_E - eastward velocity component from averaged ITRF2014 plate motion model
*RMSE_* - RMSE of corresponding data
The BUTTER Empirical Deep Learning Dataset represents an empirical study of the deep learning phenomena on dense fully connected networks, scanning across thirteen datasets, eight network shapes, fourteen depths, twenty-three network sizes (number of trainable parameters), four learning rates, six minibatch sizes, four levels of label noise, and fourteen levels of L1 and L2 regularization each. Multiple repetitions (typically 30, sometimes 10) of each combination of hyperparameters were preformed, and statistics including training and test loss (using a 80% / 20% shuffled train-test split) are recorded at the end of each training epoch. In total, this dataset covers 178 thousand distinct hyperparameter settings ("experiments"), 3.55 million individual training runs (an average of 20 repetitions of each experiments), and a total of 13.3 billion training epochs (three thousand epochs were covered by most runs). Accumulating this dataset consumed 5,448.4 CPU core-years, 17.8 GPU-years, and 111.2 node-years.