Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
## Overview
Golf Ball Distance Calculation is a dataset for object detection tasks - it contains Golf Balls annotations for 318 images.
## Getting Started
You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
## License
This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset contains the data to calculate the spatial distribution of the dissipation as well as the absorption efficiencies of both Gold and Silicon designs, as presented in the article "Time-domain topology optimization of power dissipation in dispersive dielectric and plasmonic nanostructures". This includes the electric field distribution in 3D for multiple wavelengths (netCDF), the final density (netCDF), the design (STL) and material and simulation parameters (JSON) used in the optimization. The evaluation of this data can be performed using the code published on https://github.com/JoGed/dissipation-calculation
Weights are calculated with time step of 1 hour and memory of 4320 hours for two cases. The source is parametrized by spherical harmonics up to degree 4 (MB_SHS_modes_up_to_24_at_obs.h5 ) The source is parametrized by so-called periodic modes, obtained from the output of TIEGCM (MB_periodic_modes_16_24_at_obs.h5) The detailed desciption of the weights meaning and calculation can be found in Kruglyakov & Kuvshinov, 2024. Datasets M_B_r stand for radial (upward) component of the magnetic field at the ground; Datasets M_B_theta stand for southward component of the magnetic field at the ground; Datasets M_B_phi stand for eastward component of the magnetic field at the ground; Datasets Theta stand for the co-latitudes of the observatories in the geo-centric cooridnate system Datasets Phi stand for the longitudes of the observatories in the geo-centric cooridnate system
Announcement Beginning October 20, 2022, CDC will report and publish aggregate case and death data from jurisdictional and state partners on a weekly basis rather than daily. As a result, community transmission levels data reported on data.cdc.gov will be updated weekly on Thursdays, typically by 8 PM ET, instead of daily. This public use dataset has 7 data elements reflecting community transmission levels for all available counties. This dataset contains reported daily transmission level at the county level and contains the same values used to display transmission maps on the COVID Data Tracker. Each day, the dataset is appended to contain the most recent day's data. Transmission level is set to low, moderate, substantial, or high using the calculation rules below. Currently, CDC provides the public with two versions of COVID-19 county-level community transmission level data: this dataset with the levels as originally posted (Originally Posted dataset), updated daily with the most recent day’s data, and an historical dataset with the county-level transmission data from January 1, 2021 (Historical Changes dataset). Methods for calculating county level of community transmission indicator The County Level of Community Transmission indicator uses two metrics: (1) total new COVID-19 cases per 100,000 persons in the last 7 days and (2) percentage of positive SARS-CoV-2 diagnostic nucleic acid amplification tests (NAAT) in the last 7 days. For each of these metrics, CDC classifies transmission values as low, moderate, substantial, or high (below and here). If the values for each of these two metrics differ (e.g., one indicates moderate and the other low), then the higher of the two should be used for decision-making. CDC core metrics of and thresholds for community transmission levels of SARS-CoV-2 Total New Case Rate Metric: "New cases per 100,000 persons in the past 7 days" is calculated by adding the number of new cases in the county (or other administrative level) in the last 7 days divided by the population in the county (or other administrative level) and multiplying by 100,000. "New cases per 100,000 persons in the past 7 days" is considered to have a transmission level of Low (0-9.99); Moderate (10.00-49.99); Substantial (50.00-99.99); and High (greater than or equal to 100.00). Test Percent Positivity Metric: "Percentage of positive NAAT in the past 7 days" is calculated by dividing the number of positive tests in the county (or other administrative level) during the last 7 days by the total number of tests conducted over the last 7 days. "Percentage of positive NAAT in the past 7 days" is considered to have a transmission level of Low (less than 5.00); Moderate (5.00-7.99); Substantial (8.00-9.99); and High (greater than or equal to 10.00). If the two metrics suggest different transmission levels, the higher level is selected. Transmission categories include: Low Transmission Threshold: Counties with fewer than 10 total cases per 100,000 population in the past 7 days, and a NAAT percent test positivity in the past 7 days below 5%; Moderate Transmission Threshold: Counties with 10-49 total cases per 100,000 population in the past 7 days or a NAAT test percent positivity in the past 7 days of 5.0-7.99%; Substantial Transmission Threshold: Counties with 50-99 total cases per 100,000 population in the past 7 days or a NAAT test percent positivity in the past 7 days of 8.0-9.99%; High Transmission Threshold: Counties with 100 or more total cases per 100,000
This Location Data & Foot traffic dataset available for all countries include enriched raw mobility data and visitation at POIs to answer questions such as:
-How often do people visit a location? (daily, monthly, absolute, and averages).
-What type of places do they visit ? (parks, schools, hospitals, etc)
-Which social characteristics do people have in a certain POI? - Breakdown by type: residents, workers, visitors.
-What's their mobility like enduring night hours & day hours?
-What's the frequency of the visits partition by day of the week and hour of the day?
Extra insights -Visitors´ relative income Level. -Visitors´ preferences as derived by their visits to shopping, parks, sports facilities, churches, among others.
Overview & Key Concepts Each record corresponds to a ping from a mobile device, at a particular moment in time and at a particular latitude and longitude. We procure this data from reliable technology partners, which obtain it through partnerships with location-aware apps. All the process is compliant with applicable privacy laws.
We clean and process these massive datasets with a number of complex, computer-intensive calculations to make them easier to use in different data science and machine learning applications, especially those related to understanding customer behavior.
Featured attributes of the data Device speed: based on the distance between each observation and the previous one, we estimate the speed at which the device is moving. This is particularly useful to differentiate between vehicles, pedestrians, and stationery observations.
Night base of the device: we calculate the approximated location of where the device spends the night, which is usually their home neighborhood.
Day base of the device: we calculate the most common daylight location during weekdays, which is usually their work location.
Income level: we use the night neighborhood of the device, and intersect it with available socioeconomic data, to infer the device’s income level. Depending on the country, and the availability of good census data, this figure ranges from a relative wealth index to a currency-calculated income.
POI visited: we intersect each observation with a number of POI databases, to estimate check-ins to different locations. POI databases can vary significantly, in scope and depth, between countries.
Category of visited POI: for each observation that can be attributable to a POI, we also include a standardized location category (park, hospital, among others). Coverage: Worldwide.
Delivery schemas We can deliver the data in three different formats:
Full dataset: one record per mobile ping. These datasets are very large, and should only be consumed by experienced teams with large computing budgets.
Visitation stream: one record per attributable visit. This dataset is considerably smaller than the full one but retains most of the more valuable elements in the dataset. This helps understand who visited a specific POI, characterize and understand the consumer's behavior.
Audience profiles: one record per mobile device in a given period of time (usually monthly). All the visitation stream is aggregated by category. This is the most condensed version of the dataset and is very useful to quickly understand the types of consumers in a particular area and to create cohorts of users.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
"We believe that by accounting for the inherent uncertainty in the system during each measurement, the relationship between cause and effect can be assessed more accurately, potentially reducing the duration of research."
Short description
This dataset was created as part of a research project investigating the efficiency and learning mechanisms of a Bayesian adaptive search algorithm supported by the Imprecision Entropy Indicator (IEI) as a novel method. It includes detailed statistical results, posterior probability values, and the weighted averages of IEI across multiple simulations aimed at target localization within a defined spatial environment. Control experiments, including random search, random walk, and genetic algorithm-based approaches, were also performed to benchmark the system's performance and validate its reliability.
The task involved locating a target area centered at (100; 100) within a radius of 10 units (Research_area.png), inside a circular search space with a radius of 100 units. The search process continued until 1,000 successful target hits were achieved.
To benchmark the algorithm's performance and validate its reliability, control experiments were conducted using alternative search strategies, including random search, random walk, and genetic algorithm-based approaches. These control datasets serve as baselines, enabling comprehensive comparisons of efficiency, randomness, and convergence behavior across search methods, thereby demonstrating the effectiveness of our novel approach.
Uploaded files
The first dataset contains the average IEI values, generated by randomly simulating 300 x 1 hits for 10 bins per quadrant (4 quadrants in total) using the Python programming language, and calculating the corresponding IEI values. This resulted in a total of 4 x 10 x 300 x 1 = 12,000 data points. The summary of the IEI values by quadrant and bin is provided in the file results_1_300.csv. The calculation of IEI values for averages is based on likelihood, using an absolute difference-based approach for the likelihood probability computation. IEI_Likelihood_Based_Data.zip
The weighted IEI average values for likelihood calculation (Bayes formula) are provided in the file Weighted_IEI_Average_08_01_2025.xlsx
This dataset contains the results of a simulated target search experiment using Bayesian posterior updates and Imprecision Entropy Indicators (IEI). Each row represents a hit during the search process, including metrics such as Shannon entropy (H), Gini index (G), average distance, angular deviation, and calculated IEI values. The dataset also includes bin-specific posterior probability updates and likelihood calculations for each iteration. The simulation explores adaptive learning and posterior penalization strategies to optimize the search efficiency. Our Bayesian adaptive searching system source code (search algorithm, 1000 target searches): IEI_Self_Learning_08_01_2025.pyThis dataset contains the results of 1,000 iterations of a successful target search simulation. The simulation runs until the target is successfully located for each iteration. The dataset includes further three main outputs: a) Results files (results{iteration_number}.csv): Details of each hit during the search process, including entropy measures, Gini index, average distance and angle, Imprecision Entropy Indicators (IEI), coordinates, and the bin number of the hit. b) Posterior updates (Pbin_all_steps_{iter_number}.csv): Tracks the posterior probability updates for all bins during the search process acrosations multiple steps. c) Likelihoodanalysis(likelihood_analysis_{iteration_number}.csv): Contains the calculated likelihood values for each bin at every step, based on the difference between the measured IEI and pre-defined IE bin averages. IEI_Self_Learning_08_01_2025.py
Based on the mentioned Python source code (see point 3, Bayesian adaptive searching method with IEI values), we performed 1,000 successful target searches, and the outputs were saved in the:Self_learning_model_test_output.zip file.
Bayesian Search (IEI) from different quadrant. This dataset contains the results of Bayesian adaptive target search simulations, including various outputs that represent the performance and analysis of the search algorithm. The dataset includes: a) Heatmaps (Heatmap_I_Quadrant, Heatmap_II_Quadrant, Heatmap_III_Quadrant, Heatmap_IV_Quadrant): These heatmaps represent the search results and the paths taken from each quadrant during the simulations. They indicate how frequently the system selected each bin during the search process. b) Posterior Distributions (All_posteriors, Probability_distribution_posteriors_values, CDF_posteriors_values): Generated based on posterior values, these files track the posterior probability updates, including cumulative distribution functions (CDF) and probability distributions. c) Macro Summary (summary_csv_macro): This file aggregates metrics and key statistics from the simulation. It summarizes the results from the individual results.csv files. d) Heatmap Searching Method Documentation (Bayesian_Heatmap_Searching_Method_05_12_2024): This document visualizes the search algorithm's path, showing how frequently each bin was selected during the 1,000 successful target searches. e) One-Way ANOVA Analysis (Anova_analyze_dataset, One_way_Anova_analysis_results): This includes the database and SPSS calculations used to examine whether the starting quadrant influences the number of search steps required. The analysis was conducted at a 5% significance level, followed by a Games-Howell post hoc test [43] to identify which target-surrounding quadrants differed significantly in terms of the number of search steps. Results were saved in the Self_learning_model_test_results.zip
This dataset contains randomly generated sequences of bin selections (1-40) from a control search algorithm (random search) used to benchmark the performance of Bayesian-based methods. The process iteratively generates random numbers until a stopping condition is met (reaching target bins 1, 11, 21, or 31). This dataset serves as a baseline for analyzing the efficiency, randomness, and convergence of non-adaptive search strategies. The dataset includes the following: a) The Python source code of the random search algorithm. b) A file (summary_random_search.csv) containing the results of 1000 successful target hits. c) A heatmap visualizing the frequency of search steps for each bin, providing insight into the distribution of steps across the bins. Random_search.zip
This dataset contains the results of a random walk search algorithm, designed as a control mechanism to benchmark adaptive search strategies (Bayesian-based methods). The random walk operates within a defined space of 40 bins, where each bin has a set of neighboring bins. The search begins from a randomly chosen starting bin and proceeds iteratively, moving to a randomly selected neighboring bin, until one of the stopping conditions is met (bins 1, 11, 21, or 31). The dataset provides detailed records of 1,000 random walk iterations, with the following key components: a) Individual Iteration Results: Each iteration's search path is saved in a separate CSV file (random_walk_results_.csv), listing the sequence of steps taken and the corresponding bin at each step. b) Summary File: A combined summary of all iterations is available in random_walk_results_summary.csv, which aggregates the step-by-step data for all 1,000 random walks. c) Heatmap Visualization: A heatmap file is included to illustrate the frequency distribution of steps across bins, highlighting the relative visit frequencies of each bin during the random walks. d) Python Source Code: The Python script used to generate the random walk dataset is provided, allowing reproducibility and customization for further experiments. Random_walk.zip
This dataset contains the results of a genetic search algorithm implemented as a control method to benchmark adaptive Bayesian-based search strategies. The algorithm operates in a 40-bin search space with predefined target bins (1, 11, 21, 31) and evolves solutions through random initialization, selection, crossover, and mutation over 1000 successful runs. Dataset Components: a) Run Results: Individual run data is stored in separate files (genetic_algorithm_run_.csv), detailing: Generation: The generation number. Fitness: The fitness score of the solution. Steps: The path length in bins. Solution: The sequence of bins visited. b) Summary File: summary.csv consolidates the best solutions from all runs, including their fitness scores, path lengths, and sequences. c) All Steps File: summary_all_steps.csv records all bins visited during the runs for distribution analysis. d) A heatmap was also generated for the genetic search algorithm, illustrating the frequency of bins chosen during the search process as a representation of the search pathways.Genetic_search_algorithm.zip
Technical Information
The dataset files have been compressed into a standard ZIP archive using Total Commander (version 9.50). The ZIP format ensures compatibility across various operating systems and tools.
The XLSX files were created using Microsoft Excel Standard 2019 (Version 1808, Build 10416.20027)
The Python program was developed using Visual Studio Code (Version 1.96.2, user setup), with the following environment details: Commit fabd6a6b30b49f79a7aba0f2ad9df9b399473380f, built on 2024-12-19. The Electron version is 32.6, and the runtime environment includes Chromium 128.0.6263.186, Node.js 20.18.1, and V8 12.8.374.38-electron.0. The operating system is Windows NT x64 10.0.19045.
The statistical analysis included in this dataset was partially conducted using IBM SPSS Statistics, Version 29.0.1.0
The CSV files in this dataset were created following European standards, using a semicolon (;) as the delimiter instead of a comma, encoded in UTF-8 to ensure compatibility with a wide
Description:
The Linear Equation Image Dataset is designed to help solve high school-level algebraic problems using machine learning (ML). It provides extensive visual data, ideal for training models in equation recognition and solving.
Download Dataset
What’s New
Expanded Image Dataset: The dataset now contains over 30,000 images, covering a wide array of linear equations with varying complexities. The generation of equations follows multiple randomization techniques, ensuring diversity in the visual representation.
Data Diversity: Equations include both simple and complex forms, with some involving fractional coefficients, inequalities, or multi-variable formats to increase the challenge. The images also come in different resolutions, fonts, and formats (handwritten and digitally rendered) to further test ML algorithms’ robustness.
Possible Use Cases
Symbolic Equation Recognition: Train models to visually recognize equations and convert them into symbolic form.
Equation Solving: Create ML models capable of solving linear equations through image recognition.
Handwritten Recognition: Use this dataset for handwriting recognition, helping machines interpret handwritten linear equations.
Educational Tools: Develop AI tutors or mobile apps that assist students in solving linear equations by merely taking a photo of the problem.
Algorithm Training: Useful for those researching symbolic computation, this dataset allows for testing and improving various image-to-text and equation-solving algorithms.
Enhanced Research Opportunities
This dataset can be particularly useful for educational institutions, research teams, and AI developers focusing on enhancing problem-solving capabilities via machine learning and symbolic computation models.
This dataset is sourced from Kaggle.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
China
Attribution 3.0 (CC BY 3.0)https://creativecommons.org/licenses/by/3.0/
License information was derived automatically
The SPEED dataset is the official dataset of ESA's Kelvins "Pose Estimation challenge" in collaboration with Stanford Universitiy's Space Rendezvous Lab (SLAB). It features images and poses of the Tango spacecraft (PRISMA mission), 12000 of them generated by SLAB's Optical Simulator using a high fidelity texture model and 300 images from the TRON facility, using a physical mock-up model of Tango.
The goal of the competition was estimate the relative pose (distance and orientation) from pixel images only.
Detailed information about the original competition can be found at https://kelvins.esa.int/satellite-pose-estimation-challenge/
A follow-up competition with a larger and improved dataset (SPEED+) is available on Zenodo as well: https://zenodo.org/record/5588480
A publication about the results of the pose estimation challenge has been published as
Kisantal, Mate, et al. "Satellite pose estimation challenge: Dataset, competition design, and results." IEEE Transactions on Aerospace and Electronic Systems 56.5 (2020): 4083-4098.
This technical report documents the acquisition of source data, and calculation of land cover summary statistics datasets for Antietam National Battlefield. The source data and land cover calculations are available for use within the National Park Service (NPS) Inventory and Monitoring Program. Land cover summary statistics datasets can be calculated for all geographic regions within the extent of the NPS; this report includes statistics calculated for the conterminous United States. The land cover summary statistics datasets are calculated from multiple sources, including Multi-Resolution Land Characteristics Consortium products in the National Land Cover Database (NLCD) and United States Geological Survey’s (USGS) Earth Resources Observation and Science (EROS) Center products in the Land Change Monitoring, Assessment, and Projection (LCMAP) raster dataset. These summary statistics calculate land cover at up to three classification scales: Level 1, modified Anderson Level 2, and Natural versus Converted land cover. The output land cover summary statistics datasets produced here for Antietam National Battlefield utilize the most recent versions of the source datasets (NLCD and LCMAP). These land cover summary statistics datasets are used in the NPS Inventory and Monitoring Program, including the NPS Environmental Settings Monitoring Protocol and may be used by networks and parks for additional efforts.
This report documents the acquisition of source data, and calculation of land cover summary statistics datasets for four National Park Service Greater Yellowstone Network park units and six custom areas of analysis: Bighorn Canyon National Recreation Area, Grand Teton National Park, John D. Rockefeller Jr. Memorial Parkway, Yellowstone National Park, and the six custom areas of analysis. The source data and land cover calculations are available for use within the National Park Service (NPS) Inventory and Monitoring Program. Land cover summary statistics datasets can be calculated for all geographic regions within the extent of the NPS; this report includes statistics calculated for the conterminous United States. The land cover summary statistics datasets are calculated from multiple sources, including Multi-Resolution Land Characteristics Consortium products in the National Land Cover Database (NLCD) and the United States Geological Survey’s (USGS) Earth Resources Observation and Science (EROS) Center products in the Land Change Monitoring, Assessment, and Projection (LCMAP) raster dataset. These summary statistics calculate land cover at up to three classification scales: Level 1, modified Anderson Level 2, and Natural versus Converted land cover. The output land cover summary statistics datasets produced here for the four Greater Yellowstone Network park units and six custom areas of analysis utilize the most recent versions of the source datasets (NLCD and LCMAP). These land cover summary statistics datasets are used in the NPS Inventory and Monitoring Program, including the NPS Environmental Settings Monitoring Protocol and may be used by networks and parks for additional efforts.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Pol Febrer (pol.febrer@icn2.cat, ORCID 0000-0003-0904-2234) Peter Bjorn Jorgensen (peterbjorgensen@gmail.com, ORCID 0000-0003-4404-7276) Arghya Bhowmik (arbh@dtu.dk, ORCID 0000-0003-3198-5116)
The dataset is published as part of the paper: "GRAPH2MAT: UNIVERSAL GRAPH TO MATRIX CONVERSION FOR ELECTRON DENSITY PREDICTION" (https://doi.org/10.26434/chemrxiv-2024-j4g21) https://github.com/BIG-MAP/graph2mat
This dataset contains the Hamiltonian, Overlap, Density and Energy Density matrices from SIESTA calculations of a subset of the MD17 aspirin dataset. The subset is taken from the third split in (https://doi.org/10.6084/m9.figshare.12672038.v3).
SIESTA 5.0.0 was used to compute the dataset.
The dataset has two directories:
And then, three directories containing the calculations with different basis sets: - matrix_dataset_defsplit: Uses the default split-valence DZP basis in SIESTA. - matrix_dataset_optimsplit: Uses a split-valence DZP basis optimized for aspirin. - matrix_dataset_defnodes: Uses the default nodes DZP basis in SIESTA.
Each of the basis directories has two subdirectories: - basis: Contains the files specifying the basis used for each atom. - runs: The results of running the SIESTA simulations. Contents are discussed next.
The "runs" directory contains one directory for each run, named with the index of the run. Each directory contains: - RUN.fdf, geom.fdf: The input files used for the SIESTA calculation. - RUN.out: The log of the SIESTA run, which apar - siesta.TSDE: Contains the Density and Energy Density matrices. - siesta.TSHS: Contains the Hamiltonian and Overlap matrices.
Each matrix can be read using the sisl python package (https://github.com/zerothi/sisl) like:
import sisl
matrix = sisl.get_sile("RUN.fdf").read_X()
where X is hamiltonian, overlap, density_matrix or energy_density_matrix.
To reproduce the results presented in the paper, follow the documentation of the graph2mat package (https://github.com/BIG-MAP/graph2mat).
https://doi.org/10.11583/DTU.c.7310005 © 2024 Technical University of Denmark
This dataset is published under the CC BY 4.0 license. This license allows reusers to distribute, remix, adapt, and build upon the material in any medium or format, so long as attribution is given to the creator.
https://object-store.os-api.cci2.ecmwf.int:443/cci2-prod-catalogue/licences/cc-by/cc-by_f24dc630aa52ab8c52a0ac85c03bc35e0abc850b4d7453bdc083535b41d5a5c3.pdfhttps://object-store.os-api.cci2.ecmwf.int:443/cci2-prod-catalogue/licences/cc-by/cc-by_f24dc630aa52ab8c52a0ac85c03bc35e0abc850b4d7453bdc083535b41d5a5c3.pdf
ERA5 is the fifth generation ECMWF reanalysis for the global climate and weather for the past 8 decades. Data is available from 1940 onwards. ERA5 replaces the ERA-Interim reanalysis. Reanalysis combines model data with observations from across the world into a globally complete and consistent dataset using the laws of physics. This principle, called data assimilation, is based on the method used by numerical weather prediction centres, where every so many hours (12 hours at ECMWF) a previous forecast is combined with newly available observations in an optimal way to produce a new best estimate of the state of the atmosphere, called analysis, from which an updated, improved forecast is issued. Reanalysis works in the same way, but at reduced resolution to allow for the provision of a dataset spanning back several decades. Reanalysis does not have the constraint of issuing timely forecasts, so there is more time to collect observations, and when going further back in time, to allow for the ingestion of improved versions of the original observations, which all benefit the quality of the reanalysis product. This catalogue entry provides post-processed ERA5 hourly single-level data aggregated to daily time steps. In addition to the data selection options found on the hourly page, the following options can be selected for the daily statistic calculation:
The daily aggregation statistic (daily mean, daily max, daily min, daily sum*) The sub-daily frequency sampling of the original data (1 hour, 3 hours, 6 hours) The option to shift to any local time zone in UTC (no shift means the statistic is computed from UTC+00:00)
*The daily sum is only available for the accumulated variables (see ERA5 documentation for more details). Users should be aware that the daily aggregation is calculated during the retrieval process and is not part of a permanently archived dataset. For more details on how the daily statistics are calculated, including demonstrative code, please see the documentation. For more details on the hourly data used to calculate the daily statistics, please refer to the ERA5 hourly single-level data catalogue entry and the documentation found therein.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Helsinki Region Travel Time Matrix contains travel time and distance information for routes between all 250 m x 250 m grid cell centroids (n = 13231) in the Helsinki Region, Finland by walking, cycling, public transportation and car. The grid cells are compatible with the statistical grid cells used by Statistics Finland and the YKR (yhdyskuntarakenteen seurantajärjestelmä) data set. The Helsinki Region Travel Time Matrix is available for three different years:
The data consists of travel time and distance information of the routes that have been calculated between all statistical grid cell centroids (n = 13231) by walking, cycling, public transportation and car.
The data have been calculated for two different times of the day: 1) midday and 2) rush hour.
The data may be used freely (under Creative Commons 4.0 licence). We do not take any responsibility for any mistakes, errors or other deficiencies in the data.
Organization of data
The data have been divided into 13231 text files according to destinations of the routes. The data files have been organized into sub-folders that contain multiple (approx. 4-150) Travel Time Matrix result files. Individual folders consist of all the Travel Time Matrices that have same first four digits in their filename (e.g. 5785xxx).
In order to visualize the data on a map, the result tables can be joined with the MetropAccess YKR-grid shapefile (attached here). The data can be joined by using the field ‘from_id’ in the text files and the field ‘YKR_ID’ in MetropAccess-YKR-grid shapefile as a common key.
Data structure
The data have been divided into 13231 text files according to destinations of the routes. One file includes the routes from all statistical grid cells to a particular destination grid cell. All files have been named according to the destination grid cell code and each file includes 13231 rows.
NODATA values have been stored as value -1.
Each file consists of 17 attribute fields: 1) from_id, 2) to_id, 3) walk_t, 4) walk_d, 5) bike_f_t, 6) bike_s_t, 7) bike_d, 8) pt_r_tt, 9) pt_r_t, 10) pt_r_d, 11) pt_m_tt, 12) pt_m_t, 13) pt_m_d, 14) car_r_t, 15) car_r_d, 16) car_m_t, 17) car_m_d, 18) car_sl_t
The fields are separated by semicolon in the text files.
Attributes
METHODS
For detailed documentation and how to reproduce the data, see HelsinkiRegionTravelTimeMatrix2018 GitHub repository.
THE ROUTE BY CAR have been calculated with a dedicated open source tool called DORA (DOor-to-door Routing Analyst) developed for this project. DORA uses PostgreSQL database with PostGIS extension and is based on the pgRouting toolkit. MetropAccess-Digiroad (modified from the original Digiroad data provided by Finnish Transport Agency) has been used as a street network in which the travel times of the road segments are made more realistic by adding crossroad impedances for different road classes.
The calculations have been repeated for two times of the day using 1) the “midday impedance” (i.e. travel times outside rush hour) and 2) the “rush hour impendance” as impedance in the calculations. Moreover, there is 3) the “speed limit impedance” calculated in the matrix (i.e. using speed limit without any additional impedances).
The whole travel chain (“door-to-door approach”) is taken into account in the calculations:
1) walking time from the real origin to the nearest network location (based on Euclidean distance),
2) average walking time from the origin to the parking lot,
3) travel time from parking lot to destination,
4) average time for searching a parking lot,
5) walking time from parking lot to nearest network location of the destination and
6) walking time from network location to the real destination (based on Euclidean distance).
THE ROUTES BY PUBLIC TRANSPORTATION have been calculated by using the MetropAccess-Reititin tool which also takes into account the whole travel chains from the origin to the destination:
1) possible waiting at home before leaving,
2) walking from home to the transit stop,
3) waiting at the transit stop,
4) travel time to next transit stop,
5) transport mode change,
6) travel time to next transit stop and
7) walking to the destination.
Travel times by public transportation have been optimized using 10 different departure times within the calculation hour using so called Golomb ruler. The fastest route from these calculations are selected for the final travel time matrix.
THE ROUTES BY CYCLING are also calculated using the DORA tool. The network dataset underneath is MetropAccess-CyclingNetwork, which is a modified version from the original Digiroad data provided by Finnish Transport Agency. In the dataset the travel times for the road segments have been modified to be more realistic based on Strava sports application data from the Helsinki region from 2016 and the bike sharing system data from Helsinki from 2017.
For each road segment a separate speed value was calculated for slow and fast cycling. The value for fast cycling is based on a percentual difference between segment specific Strava speed value and the average speed value for the whole Strava data. This same percentual difference has been applied to calculate the slower speed value for each road segment. The speed value is then the average speed value of bike sharing system users multiplied by the percentual difference value.
The reference value for faster cycling has been 19km/h, which is based on the average speed of Strava sports application users in the Helsinki region. The reference value for slower cycling has been 12km/, which has been the average travel speed of bike sharing system users in Helsinki. Additional 1 minute have been added to the travel time to consider the time for taking (30s) and returning (30s) bike on the origin/destination.
More information of the Strava dataset that was used can be found from the Cycling routes and fluency report, which was published by us and the city of Helsinki.
THE ROUTES BY WALKING were also calculated using the MetropAccess-Reititin by disabling all motorized transport modesin the calculation. Thus, all routes are based on the Open Street Map geometry.
The walking speed has been adjusted to 70 meters per minute, which is the default speed in the HSL Journey Planner (also in the calculations by public transportation).
All calculations were done using the computing resources of CSC-IT Center for Science (https://www.csc.fi/home).
http://reference.data.gov.uk/id/open-government-licencehttp://reference.data.gov.uk/id/open-government-licence
Excel Age-Range creator for Office for National Statistics (ONS) Mid year population estimates (MYE) covering each year between 1999 and 2014
https://londondatastore-upload.s3.amazonaws.com/mye-custom-tool.JPG" alt="" />
These files take into account the revised estimates for 2002-2010 released in April 2013 down to Local Authority level and the post 2011 estimates based on the Census results. Scotland and Northern Ireland data has not been revised, so Great Britain and United Kingdom totals comprise the original data for these plus revised England and Wales figures.
This Excel based tool enables users to query the single year of age raw data so that any age range can easily be calculated without having to carry out often complex, and time consuming formulas that could also be open to human error. Simply select the lower and upper age range for both males and females and the spreadsheet will return the total population for the range. Please adhere to the terms and conditions of supply contained within the file.
Tip: You can copy and paste the rows you are interested in to another worksheet by using the filters at the top of the columns and then select all by pressing Ctrl+A. Then simply copy and paste the cells to a new location.
ONS Mid year population estimates
Open Excel tool (London Boroughs, Regions and National, 1999-2014)
Also available is a custom-age tool for all geographies in the UK. Open the tool for all UK geographies (local authority and above) for: 2010, 2011, 2012, 2013, and 2014.
This full MYE dataset by single year of age (SYA) age and gender is available as a Datastore package here.
Ward Level Population estimates
Excel single year of age population tool for 2002 to 2013 for all wards in London.
New 2014 Ward boundary estimates
This data is only for wards in the three London boroughs that changed their ward boundaries in May 2014. The estimates in this spreadsheet have been calculated by the GLA by taking the proportion of a the old ward that falls within the new ward based on the proportion of population living in each area at the 2011 Census. Therefore, these estimates are purely indicative and are not official statistics and not endorsed by ONS.
Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
License information was derived automatically
Summary:
Estimated stand-off distance between ADS-B equipped aircraft and obstacles. Obstacle information was sourced from the FAA Digital Obstacle File and the FHWA National Bridge Inventory. Aircraft tracks were sourced from processed data curated from the OpenSky Network. Results are presented as histograms organized by aircraft type and distance away from runways.
Description:
For many aviation safety studies, aircraft behavior is represented using encounter models, which are statistical models of how aircraft behave during close encounters. They are used to provide a realistic representation of the range of encounter flight dynamics where an aircraft collision avoidance system would be likely to alert. These models currently and have historically have been limited to interactions between aircraft; they have not represented the specific interactions between obstacles and aircraft equipped transponders. In response, we calculated the standoff distance between obstacles and ADS-B equipped manned aircraft.
For robustness, this assessment considered two different datasets of manned aircraft tracks and two datasets of obstacles. For robustness, MIT LL calculated the standoff distance using two different datasets of aircraft tracks and two datasets of obstacles. This approach aligned with the foundational research used to support the ASTM F3442/F3442M-20 well clear criteria of 2000 feet laterally and 250 feet AGL vertically.
The two datasets of processed tracks of ADS-B equipped aircraft curated from the OpenSky Network. It is likely that rotorcraft were underrepresented in these datasets. There were also no considerations for aircraft equipped only with Mode C or not equipped with any transponders. The first dataset was used to train the v1.3 uncorrelated encounter models and referred to as the “Monday” dataset. The second dataset is referred to as the “aerodrome” dataset and was used to train the v2.0 and v3.x terminal encounter model. The Monday dataset consisted of 104 Mondays across North America. The other dataset was based on observations at least 8 nautical miles within Class B, C, D aerodromes in the United States for the first 14 days of each month from January 2019 through February 2020. Prior to any processing, the datasets required 714 and 847 Gigabytes of storage. For more details on these datasets, please refer to "Correlated Bayesian Model of Aircraft Encounters in the Terminal Area Given a Straight Takeoff or Landing" and “Benchmarking the Processing of Aircraft Tracks with Triples Mode and Self-Scheduling.”
Two different datasets of obstacles were also considered. First was point obstacles defined by the FAA digital obstacle file (DOF) and consisted of point obstacle structures of antenna, lighthouse, meteorological tower (met), monument, sign, silo, spire (steeple), stack (chimney; industrial smokestack), transmission line tower (t-l tower), tank (water; fuel), tramway, utility pole (telephone pole, or pole of similar height, supporting wires), windmill (wind turbine), and windsock. Each obstacle was represented by a cylinder with the height reported by the DOF and a radius based on the report horizontal accuracy. We did not consider the actual width and height of the structure itself. Additionally, we only considered obstacles at least 50 feet tall and marked as verified in the DOF.
The other obstacle dataset, termed as “bridges,” was based on the identified bridges in the FAA DOF and additional information provided by the National Bridge Inventory. Due to the potential size and extent of bridges, it would not be appropriate to model them as point obstacles; however, the FAA DOF only provides a point location and no information about the size of the bridge. In response, we correlated the FAA DOF with the National Bridge Inventory, which provides information about the length of many bridges. Instead of sizing the simulated bridge based on horizontal accuracy, like with the point obstacles, the bridges were represented as circles with a radius of the longest, nearest bridge from the NBI. A circle representation was required because neither the FAA DOF or NBI provided sufficient information about orientation to represent bridges as rectangular cuboid. Similar to the point obstacles, the height of the obstacle was based on the height reported by the FAA DOF. Accordingly, the analysis using the bridge dataset should be viewed as risk averse and conservative. It is possible that a manned aircraft was hundreds of feet away from an obstacle in actuality but the estimated standoff distance could be significantly less. Additionally, all obstacles are represented with a fixed height, the potentially flat and low level entrances of the bridge are assumed to have the same height as the tall bridge towers. The attached figure illustrates an example simulated bridge.
It would had been extremely computational inefficient to calculate the standoff distance for all possible track points. Instead, we define an encounter between an aircraft and obstacle as when an aircraft flying 3069 feet AGL or less comes within 3000 feet laterally of any obstacle in a 60 second time interval. If the criteria were satisfied, then for that 60 second track segment we calculate the standoff distance to all nearby obstacles. Vertical separation was based on the MSL altitude of the track and the maximum MSL height of an obstacle.
For each combination of aircraft track and obstacle datasets, the results were organized seven different ways. Filtering criteria were based on aircraft type and distance away from runways. Runway data was sourced from the FAA runways of the United States, Puerto Rico, and Virgin Islands open dataset. Aircraft type was identified as part of the em-processing-opensky workflow.
License
This dataset is licensed under Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International(CC BY-NC-ND 4.0).
This license requires that reusers give credit to the creator. It allows reusers to copy and distribute the material in any medium or format in unadapted form and for noncommercial purposes only. Only noncommercial use of your work is permitted. Noncommercial means not primarily intended for or directed towards commercial advantage or monetary compensation. Exceptions are given for the not for profit standards organizations of ASTM International and RTCA.
MIT is releasing this dataset in good faith to promote open and transparent research of the low altitude airspace. Given the limitations of the dataset and a need for more research, a more restrictive license was warranted. Namely it is based only on only observations of ADS-B equipped aircraft, which not all aircraft in the airspace are required to employ; and observations were source from a crowdsourced network whose surveillance coverage has not been robustly characterized.
As more research is conducted and the low altitude airspace is further characterized or regulated, it is expected that a future version of this dataset may have a more permissive license.
Distribution Statement
DISTRIBUTION STATEMENT A. Approved for public release. Distribution is unlimited.
© 2021 Massachusetts Institute of Technology.
Delivered to the U.S. Government with Unlimited Rights, as defined in DFARS Part 252.227-7013 or 7014 (Feb 2014). Notwithstanding any copyright notice, U.S. Government rights in this work are defined by DFARS 252.227-7013 or DFARS 252.227-7014 as detailed above. Use of this work other than as specifically authorized by the U.S. Government may violate any copyrights that exist in this work.
This material is based upon work supported by the Federal Aviation Administration under Air Force Contract No. FA8702-15-D-0001. Any opinions, findings, conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the Federal Aviation Administration.
This document is derived from work done for the FAA (and possibly others); it is not the direct product of work done for the FAA. The information provided herein may include content supplied by third parties. Although the data and information contained herein has been produced or processed from sources believed to be reliable, the Federal Aviation Administration makes no warranty, expressed or implied, regarding the accuracy, adequacy, completeness, legality, reliability or usefulness of any information, conclusions or recommendations provided herein. Distribution of the information contained herein does not constitute an endorsement or warranty of the data or information provided herein by the Federal Aviation Administration or the U.S. Department of Transportation. Neither the Federal Aviation Administration nor the U.S. Department of
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
## Overview
Mathmatic Formula is a dataset for object detection tasks - it contains Math Formula annotations for 703 images.
## Getting Started
You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
## License
This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
The data includes measured data from Ecoregions 69 and 70 in West Virginia. Paired biological and chemical grab samples are included. These data were used to estimate SC extirpation concentration (XC95) for benthic invertebrate genera. Also included are cumulative frequency distribution plots, scatter plots fitted with generalized additive models, and biogeographical maps of observations of each genus. The metadata and full data set is available in Supplemental Appendices S4 and S5, respectively. The output of 176 XC95 values from Ecoregions 69 and 70 are provided in Supplemental Appendix S6. Supplemental Appendix S7 depicts the probability of observing a genus for discrete ranges of SC. Supplemental Appendix S8 depicts the proportion of occurrence of a genus for discrete ranges of SC. Supplemental Appendix S9 shows the biogeographic distributions of the genera included in the data set. We also discuss limitations of this method to help avoid misinterpretations and inferential errors. A data dictionary is provided in Cond_DataFileColumnMetada-20161221. This dataset is associated with the following publication: Cormier, S., L. Zheng, E. Leppo, and A. Hamilton. Step-by-Step Calculation and Spreadsheet Tools for Predicting Stressor Levels that Extirpate Genera and Species. Integrated Environmental Assessment and Management. Allen Press, Inc., Lawrence, KS, USA, 14(2): 174-180, (2018).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
File Name: Web Page, url: https://data.mendeley.com/datasets/2jxj4k32m2/1 Khoury, Colin K.; Amariles, Daniel; Soto, Jonatan; Diaz, Maria Victoria; Sotelo, Steven; Sosa, Chrystian C.; Ramírez-Villegas , Julian; Achicanoy, Harold; Castañeda-Álvarez , Nora P.; León, Blanca; Wiersema, John H. (2018), Data for the calculation of an indicator of the comprehensiveness of conservation of useful wild plants, Mendeley Data, v1. http://dx.doi.org/10.17632/2jxj4k32m2.1 The datasets presented here are related to the research article entitled “Comprehensiveness of conservation of useful wild plants: an operational indicator for biodiversity and sustainable development targets” (Khoury et al., 2019). The indicator methodology includes five main steps, each requiring and producing data, which are fully described and available here. These data include: species taxonomy, uses, and general geographic information (dataset 1); species occurrence data (dataset 2); global administrative areas data (dataset 3); eco-geographic predictors used in species distribution modeling (dataset 4); a world map raster file (dataset 5); species spatial distribution modeling outputs (dataset 6); ecoregion spatial data used in conservation analyses (dataset 7); protected area spatial data used in conservation analyses (dataset 8); and countries, sub-regions, and regions classifications data (dataset 9). These data are available at http://dx.doi.org/10.17632/2jxj4k32m2.1. In combination with the openly accessible methodology code (https://github.com/CIAT-DAPA/UsefulPlants-Indicator), these data facilitate indicator assessments and serve as a baseline against which future calculations of the indicator can be measured. The data can also contribute to other species distribution modeling, ecological research, and conservation analysis purposes.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Longitudinal binomial data are frequently generated from multiple questionnaires and assessments in various scientific settings for which the binomial data are often over-dispersed. The standard generalized linear mixed effects model (GLMM) may result in severe underestimation of standard errors of estimated regression parameters in such cases and hence potentially bias the statistical inference. In this paper, we propose a longitudinal beta-binomial model for over-dispersed binomial data and estimate the regression parameters under a probit model using the Generalized Estimating Equation (GEE) method. A hybrid algorithm of the Fisher Scoring and the Method of Moments is implemented for computing the method. Extensive simulation studies are conducted to justify the validity of the proposed method. Finally the proposed method is applied to analyze functional impairment in subjects who are at-risk of Huntington disease (HD) from a multi-site observational study of prodromal HD.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
## Overview
Golf Ball Distance Calculation is a dataset for object detection tasks - it contains Golf Balls annotations for 318 images.
## Getting Started
You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
## License
This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).