100+ datasets found

500 CITIES DISTANCE DATASET
kaggle.com
zip
Updated Sep 29, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
ANSHIKA SHARMA (2025). 500 CITIES DISTANCE DATASET [Dataset]. https://www.kaggle.com/datasets/anshikasharmacseai/500-cities-distance-daatset
Explore at:
zip(9653 bytes)Available download formats
Dataset updated
Sep 29, 2025
Authors
ANSHIKA SHARMA
Description
This dataset contains pairwise distances between cities represented as an undirected weighted graph. Each row is an edge describing the travel distance between two cities. It is ideal for experiments in graph algorithms (shortest path, MST), combinatorial optimization (TSP), route planning, and educational demonstrations.

Columns:

From — source city (string)

To — destination city (string)

Distance — numerical distance (edge weight)

Quick stats (from provided data):

Number of distinct cities: 8 (City1 .. City8)

Number of edges (rows): 17

Graph type: undirected, weighted (assumed symmetric)

Use cases

Benchmarking shortest-path algorithms (Dijkstra, Bellman-Ford, Floyd–Warshall)

Minimum Spanning Tree (Kruskal/Prim) experiments

Traveling Salesman Problem (TSP) solvers and heuristics

Route planning and logistics toy problems

Teaching graph theory and visualization with networkx
R
Distance Calculation Dataset
universe.roboflow.com
zip
Updated Mar 2, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
jatin-rane (2023). Distance Calculation Dataset [Dataset]. https://universe.roboflow.com/jatin-rane/distance-calculation/model/3
Explore at:
zipAvailable download formats
Dataset updated
Mar 2, 2023
Dataset authored and provided by
jatin-rane
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Variables measured
Vehicles Bounding Boxes
Description
Distance Calculation

## Overview Distance Calculation is a dataset for object detection tasks - it contains Vehicles annotations for 4,056 images. ## Getting Started You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model. ## License This dataset is available under the [Public Domain license](https://creativecommons.org/licenses/Public Domain).
Indian Cities Distance Dataset
kaggle.com
zip
Updated Mar 1, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
K.B. Dharun Krishna (2024). Indian Cities Distance Dataset [Dataset]. https://www.kaggle.com/datasets/kbdharun/a-star-algorithm-route-planning-dataset/code
Explore at:
zip(804 bytes)Available download formats
Dataset updated
Mar 1, 2024
Authors
K.B. Dharun Krishna
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Area covered
India
Description
The "Indian Cities Distance Dataset" is a comprehensive collection of distance data between major cities in India, designed to facilitate pathfinding and optimization tasks.

This connected dataset includes information about the distances (in kilometres) between pairs of cities, allowing users to calculate the shortest paths and optimize routes for various purposes.

Key features of this dataset

City Pairings: The dataset provides connectivity information between pairs of prominent Indian cities, enabling users to calculate the shortest paths and travel distances between any two cities included in the dataset. It is an excellent resource for delving into programming route planning, navigation, and logistics optimization programs.

Distance Data: Each entry in the dataset includes the distance in kilometres between two cities. The distances have been curated to reflect the actual road distances between these locations.

A* Search Algorithm: This dataset is ideal for use with the A* (A-star) search algorithm, a widely used optimization and pathfinding algorithm. The A* algorithm can help find the shortest and most efficient routes between cities, making it suitable for transportation, tourism, and urban planning applications.

Beginner friendly: This dataset contains a minimum number of features for better processing and analyzing of data making it suitable for beginners.
d
Data from: Mining Distance-Based Outliers in Near Linear Time
catalog.data.gov
datasets.ai
Updated Apr 11, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dashlink (2025). Mining Distance-Based Outliers in Near Linear Time [Dataset]. https://catalog.data.gov/dataset/mining-distance-based-outliers-in-near-linear-time
Explore at:
Dataset updated
Apr 11, 2025
Dataset provided by
Dashlink
Description
Full title: Mining Distance-Based Outliers in Near Linear Time with Randomization and a Simple Pruning Rule Abstract: Defining outliers by their distance to neighboring examples is a popular approach to finding unusual examples in a data set. Recently, much work has been conducted with the goal of finding fast algorithms for this task. We show that a simple nested loop algorithm that in the worst case is quadratic can give near linear time performance when the data is in random order and a simple pruning rule is used. We test our algorithm on real high-dimensional data sets with millions of examples and show that the near linear scaling holds over several orders of magnitude. Our average case analysis suggests that much of the efficiency is because the time to process non-outliers, which are the majority of examples, does not depend on the size of the data set.
Z
ANN development + final testing datasets
data.niaid.nih.gov
resodate.org
+1more
Updated Jan 24, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Authors (2020). ANN development + final testing datasets [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_1445865
Explore at:
Dataset updated
Jan 24, 2020
Authors
Authors
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
File name definitions:

'...v_50_175_250_300...' - dataset for velocity ranges [50, 175] + [250, 300] m/s

'...v_175_250...' - dataset for velocity range [175, 250] m/s

'ANNdevelop...' - used to perform 9 parametric sub-analyses where, in each one, many ANNs are developed (trained, validated and tested) and the one yielding the best results is selected

'ANNtest...' - used to test the best ANN from each aforementioned parametric sub-analysis, aiming to find the best ANN model; this dataset includes the 'ANNdevelop...' counterpart

Where to find the input (independent) and target (dependent) variable values for each dataset/excel ?

input values in 'IN' sheet

target values in 'TARGET' sheet

Where to find the results from the best ANN model (for each target/output variable and each velocity range)?

open the corresponding excel file and the expected (target) vs ANN (output) results are written in 'TARGET vs OUTPUT' sheet

Check reference below (to be added when the paper is published)

https://www.researchgate.net/publication/328849817_11_Neural_Networks_-_Max_Disp_-_Railway_Beams
Mining Distance-Based Outliers in Near Linear Time - Dataset - NASA Open...
data.nasa.gov
Updated Mar 31, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
nasa.gov (2025). Mining Distance-Based Outliers in Near Linear Time - Dataset - NASA Open Data Portal [Dataset]. https://data.nasa.gov/dataset/mining-distance-based-outliers-in-near-linear-time
Explore at:
Dataset updated
Mar 31, 2025
Dataset provided by
NASAhttp://nasa.gov/
Description
Full title: Mining Distance-Based Outliers in Near Linear Time with Randomization and a Simple Pruning Rule Abstract: Defining outliers by their distance to neighboring examples is a popular approach to finding unusual examples in a data set. Recently, much work has been conducted with the goal of finding fast algorithms for this task. We show that a simple nested loop algorithm that in the worst case is quadratic can give near linear time performance when the data is in random order and a simple pruning rule is used. We test our algorithm on real high-dimensional data sets with millions of examples and show that the near linear scaling holds over several orders of magnitude. Our average case analysis suggests that much of the efficiency is because the time to process non-outliers, which are the majority of examples, does not depend on the size of the data set.
R
Golf Ball Distance Calculation Dataset
universe.roboflow.com
zip
Updated Jun 14, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Awais Ahmad (2025). Golf Ball Distance Calculation Dataset [Dataset]. https://universe.roboflow.com/awais-ahmad-dtpcl/golf-ball-distance-calculation/dataset/1
Explore at:
zipAvailable download formats
Dataset updated
Jun 14, 2025
Dataset authored and provided by
Awais Ahmad
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Variables measured
Golf Balls Bounding Boxes
Description
Golf Ball Distance Calculation

## Overview Golf Ball Distance Calculation is a dataset for object detection tasks - it contains Golf Balls annotations for 318 images. ## Getting Started You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model. ## License This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
Fused Image dataset for convolutional neural Network-based crack Detection...
zenodo.org
data.niaid.nih.gov
zip
Updated Apr 20, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Shanglian Zhou; Shanglian Zhou; Carlos Canchila; Carlos Canchila; Wei Song; Wei Song (2023). Fused Image dataset for convolutional neural Network-based crack Detection (FIND) [Dataset]. http://doi.org/10.5281/zenodo.6383044
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.6383044
Dataset updated
Apr 20, 2023
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Shanglian Zhou; Shanglian Zhou; Carlos Canchila; Carlos Canchila; Wei Song; Wei Song
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The “Fused Image dataset for convolutional neural Network-based crack Detection” (FIND) is a large-scale image dataset with pixel-level ground truth crack data for deep learning-based crack segmentation analysis. It features four types of image data including raw intensity image, raw range (i.e., elevation) image, filtered range image, and fused raw image. The FIND dataset consists of 2500 image patches (dimension: 256x256 pixels) and their ground truth crack maps for each of the four data types.

The images contained in this dataset were collected from multiple bridge decks and roadways under real-world conditions. A laser scanning device was adopted for data acquisition such that the captured raw intensity and raw range images have pixel-to-pixel location correspondence (i.e., spatial co-registration feature). The filtered range data were generated by applying frequency domain filtering to eliminate image disturbances (e.g., surface variations, and grooved patterns) from the raw range data [1]. The fused image data were obtained by combining the raw range and raw intensity data to achieve cross-domain feature correlation [2,3]. Please refer to [4] for a comprehensive benchmark study performed using the FIND dataset to investigate the impact from different types of image data on deep convolutional neural network (DCNN) performance.

If you share or use this dataset, please cite [4] and [5] in any relevant documentation.

In addition, an image dataset for crack classification has also been published at [6].

References:

[1] Shanglian Zhou, & Wei Song. (2020). Robust Image-Based Surface Crack Detection Using Range Data. Journal of Computing in Civil Engineering, 34(2), 04019054. https://doi.org/10.1061/(asce)cp.1943-5487.0000873

[2] Shanglian Zhou, & Wei Song. (2021). Crack segmentation through deep convolutional neural networks and heterogeneous image fusion. Automation in Construction, 125. https://doi.org/10.1016/j.autcon.2021.103605

[3] Shanglian Zhou, & Wei Song. (2020). Deep learning–based roadway crack classification with heterogeneous image data fusion. Structural Health Monitoring, 20(3), 1274-1293. https://doi.org/10.1177/1475921720948434

[4] Shanglian Zhou, Carlos Canchila, & Wei Song. (2023). Deep learning-based crack segmentation for civil infrastructure: data types, architectures, and benchmarked performance. Automation in Construction, 146. https://doi.org/10.1016/j.autcon.2022.104678

[5] (This dataset) Shanglian Zhou, Carlos Canchila, & Wei Song. (2022). Fused Image dataset for convolutional neural Network-based crack Detection (FIND) [Data set]. Zenodo. https://doi.org/10.5281/zenodo.6383044

[6] Wei Song, & Shanglian Zhou. (2020). Laser-scanned roadway range image dataset (LRRD). Laser-scanned Range Image Dataset from Asphalt and Concrete Roadways for DCNN-based Crack Classification, DesignSafe-CI. https://doi.org/10.17603/ds2-bzv3-nc78
N
South Range, MI Annual Population and Growth Analysis Dataset: A...
neilsberg.com
csv, json
Updated Jul 30, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Neilsberg Research (2024). South Range, MI Annual Population and Growth Analysis Dataset: A Comprehensive Overview of Population Changes and Yearly Growth Rates in South Range from 2000 to 2023 // 2024 Edition [Dataset]. https://www.neilsberg.com/insights/south-range-mi-population-by-year/
Explore at:
json, csvAvailable download formats
Dataset updated
Jul 30, 2024
Dataset authored and provided by
Neilsberg Research
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Michigan, South Range
Variables measured
Annual Population Growth Rate, Population Between 2000 and 2023, Annual Population Growth Rate Percent
Measurement technique
The data presented in this dataset is derived from the 20 years data of U.S. Census Bureau Population Estimates Program (PEP) 2000 - 2023. To measure the variables, namely (a) population and (b) population change in ( absolute and as a percentage ), we initially analyzed and tabulated the data for each of the years between 2000 and 2023. For further information regarding these estimates, please feel free to reach out to us via email at research@neilsberg.com.
Dataset funded by
Neilsberg Research
Description
About this dataset

Context

The dataset tabulates the South Range population over the last 20 plus years. It lists the population for each year, along with the year on year change in population, as well as the change in percentage terms for each year. The dataset can be utilized to understand the population change of South Range across the last two decades. For example, using this dataset, we can identify if the population is declining or increasing. If there is a change, when the population peaked, or if it is still growing and has not reached its peak. We can also compare the trend with the overall trend of United States population over the same period of time.

Key observations

In 2023, the population of South Range was 741, a 0.27% decrease year-by-year from 2022. Previously, in 2022, South Range population was 743, an increase of 0.13% compared to a population of 742 in 2021. Over the last 20 plus years, between 2000 and 2023, population of South Range increased by 17. In this period, the peak population was 760 in the year 2010. The numbers suggest that the population has already reached its peak and is showing a trend of decline. Source: U.S. Census Bureau Population Estimates Program (PEP).

Content

When available, the data consists of estimates from the U.S. Census Bureau Population Estimates Program (PEP).

Data Coverage:

From 2000 to 2023

Variables / Data Columns

Year: This column displays the data year (Measured annually and for years 2000 to 2023)

Population: The population for the specific year for the South Range is shown in this column.

Year on Year Change: This column displays the change in South Range population for each year compared to the previous year.

Change in Percent: This column displays the year on year change as a percentage. Please note that the sum of all percentages may not equal one due to rounding of values.

Good to know

Margin of Error

Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.

Custom data

If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.

Inspiration

Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.

Recommended for further research

This dataset is a part of the main dataset for South Range Population by Year. You can refer the same here
Dataset for the paper "Observation of Acceleration and Deceleration Periods...
zenodo.org
Updated Mar 26, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yide Qian; Yide Qian (2025). Dataset for the paper "Observation of Acceleration and Deceleration Periods at Pine Island Ice Shelf from 1997–2023 " [Dataset]. http://doi.org/10.5281/zenodo.15022854
Explore at:
Unique identifier
https://doi.org/10.5281/zenodo.15022854
Dataset updated
Mar 26, 2025
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Yide Qian; Yide Qian
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Pine Island Glacier
Description
Dataset and codes for "Observation of Acceleration and Deceleration Periods at Pine Island Ice Shelf from 1997–2023 "

Description of the data and file structure

The MATLAB codes and related datasets are used for generating the figures for the paper "Observation of Acceleration and Deceleration Periods at Pine Island Ice Shelf from 1997–2023".

Files and variables

File 1: Data_and_Code.zip

Directory: Main_function

**Description:****Include MATLAB scripts and functions. Each script include discriptions that guide the user how to used it and how to find the dataset that used for processing.

MATLAB Main Scripts: Include the whole steps to process the data, output figures, and output videos.

Script_1_Ice_velocity_process_flow.m

Script_2_strain_rate_process_flow.m

Script_3_DROT_grounding_line_extraction.m

Script_4_Read_ICESat2_h5_files.m

Script_5_Extraction_results.m

MATLAB functions: Five Files that includes MATLAB functions that support the main script:

1_Ice_velocity_code: Include MATLAB functions related to ice velocity post-processing, includes remove outliers, filter, correct for atmospheric and tidal effect, inverse weited averaged, and error estimate.

2_strain_rate: Include MATLAB functions related to strain rate calculation.

3_DROT_extract_grounding_line_code: Include MATLAB functions related to convert range offset results output from GAMMA to differential vertical displacement and used the result extract grounding line.

4_Extract_data_from_2D_result: Include MATLAB functions that used for extract profiles from 2D data.

5_NeRD_Damage_detection: Modified code fom Izeboud et al. 2023. When apply this code please also cite Izeboud et al. 2023 (https://www.sciencedirect.com/science/article/pii/S0034425722004655).

6_Figure_plotting_code:Include MATLAB functions related to Figures in the paper and support information.

Director: data_and_result

Description:**Include directories that store the results output from MATLAB. user only neeed to modify the path in MATLAB script to their own path.

1_origin : Sample data ("PS-20180323-20180329", “PS-20180329-20180404”, “PS-20180404-20180410”) output from GAMMA software in Geotiff format that can be used to calculate DROT and velocity. Includes displacment, theta, phi, and ccp.

2_maskccpN: Remove outliers by ccp < 0.05 and change displacement to velocity (m/day).

3_rockpoint: Extract velocities at non-moving region

4_constant_detrend: removed orbit error

5_Tidal_correction: remove atmospheric and tidal induced error

6_rockpoint: Extract non-aggregated velocities at non-moving region

6_vx_vy_v: trasform velocities from va/vr to vx/vy

7_rockpoint: Extract aggregated velocities at non-moving region

7_vx_vy_v_aggregate_and_error_estimate: inverse weighted average of three ice velocity maps and calculate the error maps

8_strain_rate: calculated strain rate from aggregate ice velocity

9_compare: store the results before and after tidal correction and aggregation.

10_Block_result: times series results that extrac from 2D data.

11_MALAB_output_png_result: Store .png files and time serties result

12_DROT: Differential Range Offset Tracking results

13_ICESat_2: ICESat_2 .h5 files and .mat files can put here (in this file only include the samples from tracks 0965 and 1094)

14_MODIS_images: you can store MODIS images here

shp: grounding line, rock region, ice front, and other shape files.

File 2 : PIG_front_1947_2023.zip

Includes Ice front positions shape files from 1947 to 2023, which used for plotting figure.1 in the paper.

File 3 : PIG_DROT_GL_2016_2021.zip

Includes grounding line positions shape files from 1947 to 2023, which used for plotting figure.1 in the paper.

Data was derived from the following sources:
Those links can be found in MATLAB scripts or in the paper "**Open Research" **section.
Traveling Salesman Computer Vision
kaggle.com
zip
Updated Apr 20, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jeff Heaton (2022). Traveling Salesman Computer Vision [Dataset]. https://www.kaggle.com/datasets/jeffheaton/traveling-salesman-computer-vision
Explore at:
zip(2977884049 bytes)Available download formats
Dataset updated
Apr 20, 2022
Authors
Jeff Heaton
License
http://www.gnu.org/licenses/lgpl-3.0.htmlhttp://www.gnu.org/licenses/lgpl-3.0.html
Description
The Traveling Salesperson Problem (TSP) is a class problem of computer science that seeks to find the shortest route between a group of cities. It is an NP-hard problem in combinatorial optimization, important in theoretical computer science and operations research.

https://data.heatonresearch.com/images/wustl/kaggle/tsp/world-tsp.png" alt="World Map">

In this Kaggle competition, your goal is not to find the shortest route among cities. Rather, you must attempt to determine the route labeled on a map.

Calculating Line Distances

The data for this competition is not made up of real-world maps, but rather randomly generated maps of varying attributes of size, city count, and optimality of the routes. The following image demonstrates a relatively small map, with few cities, and an optimal route.

https://data.heatonresearch.com/images/wustl/kaggle/tsp/1.jpg" alt="Small Map">

Not all maps are this small, or contain this optimal a route. Consider the following map, which is much larger.

https://data.heatonresearch.com/images/wustl/kaggle/tsp/6.jpg" alt="Larger Map">

The following attributes were randomly selected to generate each image.

Height

Width

City count

Cycles of Simulated Annealing optimization of initial random path

The path distance is based on the sum of the Euclidean distance of all segments in the path. The distance units are in pixels.

Dataset Challenges

This is a regression problem, you are to estimate the total path length. Several challenges to consider.

If you indiscriminately scale the maps, you will lose size information.

Paths might overlap, causing the ration of total pixels to total length to become misleading.

As paths overlap bot other path segments and cities, the resulting color becomes brighter.

The following picture shows a section from one map zoomed to the pixel-level:

https://data.heatonresearch.com/images/wustl/kaggle/tsp/tsp_zoom.jpg" alt="TSP Zoom">

CSV Files

The following CSV files are provided, in addition to the images.

train.csv - Training data, with distance labels.

test.csv - Test data without distance labels.

tsp-all.csv - Training and test data combined with complete labels and additional information about each generated map.

CSV File Format

The tsp-all.csv file contains the following data.

id,filename,distance,key 0,0.jpg,83110,503x673-270-83110.jpg 1,1.jpg,1035,906x222-10-1035.jpg 2,2.jpg,20756,810x999-299-20756.jpg 3,3.jpg,13286,781x717-272-13286.jpg 4,4.jpg,13924,609x884-312-13924.jpg

The columns:

id - A unique ID that allows linking across all three CSV files.

filename - The name of each map's image file.

distance - The total distance through the cities, this is the y/label.

key - The generator filename, provides the dimensions, city count, & distance.
m
Dataset for The effects of a number line intervention on calculation skills
figshare.mq.edu.au
researchdata.edu.au
txt
Updated May 12, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Carola Ruiz Hornblas; Saskia Kohnen; Rebecca Bull (2023). Dataset for The effects of a number line intervention on calculation skills [Dataset]. http://doi.org/10.25949/22799717.v1
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.25949/22799717.v1
Dataset updated
May 12, 2023
Dataset provided by
Macquarie University
Authors
Carola Ruiz Hornblas; Saskia Kohnen; Rebecca Bull
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Study information The sample included in this dataset represents five children who participated in a number line intervention study. Originally six children were included in the study, but one of them fulfilled the criterion for exclusion after missing several consecutive sessions. Thus, their data is not included in the dataset. All participants were currently attending Year 1 of primary school at an independent school in New South Wales, Australia. For children to be able to eligible to participate they had to present with low mathematics achievement by performing at or below the 25th percentile in the Maths Problem Solving and/or Numerical Operations subtests from the Wechsler Individual Achievement Test III (WIAT III A & NZ, Wechsler, 2016). Participants were excluded from participating if, as reported by their parents, they have any other diagnosed disorders such as attention deficit hyperactivity disorder, autism spectrum disorder, intellectual disability, developmental language disorder, cerebral palsy or uncorrected sensory disorders. The study followed a multiple baseline case series design, with a baseline phase, a treatment phase, and a post-treatment phase. The baseline phase varied between two and three measurement points, the treatment phase varied between four and seven measurement points, and all participants had 1 post-treatment measurement point. The number of measurement points were distributed across participants as follows: Participant 1 – 3 baseline, 6 treatment, 1 post-treatment Participant 3 – 2 baseline, 7 treatment, 1 post-treatment Participant 5 – 2 baseline, 5 treatment, 1 post-treatment Participant 6 – 3 baseline, 4 treatment, 1 post-treatment Participant 7 – 2 baseline, 5 treatment, 1 post-treatment In each session across all three phases children were assessed in their performance on a number line estimation task, a single-digit computation task, a multi-digit computation task, a dot comparison task and a number comparison task. Furthermore, during the treatment phase, all children completed the intervention task after these assessments. The order of the assessment tasks varied randomly between sessions.

Measures Number Line Estimation. Children completed a computerised bounded number line task (0-100). The number line is presented in the middle of the screen, and the target number is presented above the start point of the number line to avoid signalling the midpoint (Dackermann et al., 2018). Target numbers included two non-overlapping sets (trained and untrained) of 30 items each. Untrained items were assessed on all phases of the study. Trained items were assessed independent of the intervention during baseline and post-treatment phases, and performance on the intervention is used to index performance on the trained set during the treatment phase. Within each set, numbers were equally distributed throughout the number range, with three items within each ten (0-10, 11-20, 21-30, etc.). Target numbers were presented in random order. Participants did not receive performance-based feedback. Accuracy is indexed by percent absolute error (PAE) [(number estimated - target number)/ scale of number line] x100.

Single-Digit Computation. The task included ten additions with single-digit addends (1-9) and single-digit results (2-9). The order was counterbalanced so that half of the additions present the lowest addend first (e.g., 3 + 5) and half of the additions present the highest addend first (e.g., 6 + 3). This task also included ten subtractions with single-digit minuends (3-9), subtrahends (1-6) and differences (1-6). The items were presented horizontally on the screen accompanied by a sound and participants were required to give a verbal response. Participants did not receive performance-based feedback. Performance on this task was indexed by item-based accuracy.

Multi-digit computational estimation. The task included eight additions and eight subtractions presented with double-digit numbers and three response options. None of the response options represent the correct result. Participants were asked to select the option that was closest to the correct result. In half of the items the calculation involved two double-digit numbers, and in the other half one double and one single digit number. The distance between the correct response option and the exact result of the calculation was two for half of the trials and three for the other half. The calculation was presented vertically on the screen with the three options shown below. The calculations remained on the screen until participants responded by clicking on one of the options on the screen. Participants did not receive performance-based feedback. Performance on this task is measured by item-based accuracy.

Dot Comparison and Number Comparison. Both tasks included the same 20 items, which were presented twice, counterbalancing left and right presentation. Magnitudes to be compared were between 5 and 99, with four items for each of the following ratios: .91, .83, .77, .71, .67. Both quantities were presented horizontally side by side, and participants were instructed to press one of two keys (F or J), as quickly as possible, to indicate the largest one. Items were presented in random order and participants did not receive performance-based feedback. In the non-symbolic comparison task (dot comparison) the two sets of dots remained on the screen for a maximum of two seconds (to prevent counting). Overall area and convex hull for both sets of dots is kept constant following Guillaume et al. (2020). In the symbolic comparison task (Arabic numbers), the numbers remained on the screen until a response was given. Performance on both tasks was indexed by accuracy.

The Number Line Intervention During the intervention sessions, participants estimated the position of 30 Arabic numbers in a 0-100 bounded number line. As a form of feedback, within each item, the participants’ estimate remained visible, and the correct position of the target number appeared on the number line. When the estimate’s PAE was lower than 2.5, a message appeared on the screen that read “Excellent job”, when PAE was between 2.5 and 5 the message read “Well done, so close! and when PAE was higher than 5 the message read “Good try!” Numbers were presented in random order.

Variables in the dataset Age = age in ‘years, months’ at the start of the study Sex = female/male/non-binary or third gender/prefer not to say (as reported by parents) Math_Problem_Solving_raw = Raw score on the Math Problem Solving subtest from the WIAT III (WIAT III A & NZ, Wechsler, 2016). Math_Problem_Solving_Percentile = Percentile equivalent on the Math Problem Solving subtest from the WIAT III (WIAT III A & NZ, Wechsler, 2016). Num_Ops_Raw = Raw score on the Numerical Operations subtest from the WIAT III (WIAT III A & NZ, Wechsler, 2016). Math_Problem_Solving_Percentile = Percentile equivalent on the Numerical Operations subtest from the WIAT III (WIAT III A & NZ, Wechsler, 2016).

The remaining variables refer to participants’ performance on the study tasks. Each variable name is composed by three sections. The first one refers to the phase and session. For example, Base1 refers to the first measurement point of the baseline phase, Treat1 to the first measurement point on the treatment phase, and post1 to the first measurement point on the post-treatment phase.

The second part of the variable name refers to the task, as follows: DC = dot comparison SDC = single-digit computation NLE_UT = number line estimation (untrained set) NLE_T= number line estimation (trained set) CE = multidigit computational estimation NC = number comparison The final part of the variable name refers to the type of measure being used (i.e., acc = total correct responses and pae = percent absolute error).

Thus, variable Base2_NC_acc corresponds to accuracy on the number comparison task during the second measurement point of the baseline phase and Treat3_NLE_UT_pae refers to the percent absolute error on the untrained set of the number line task during the third session of the Treatment phase.
z
mmWave-based Fitness Activity Recognition Dataset
zenodo.org
png, zip
Updated Jul 12, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yucheng Xie; Xiaonan Guo; Yan Wang; Jerry Cheng; Yingying Chen; Yucheng Xie; Xiaonan Guo; Yan Wang; Jerry Cheng; Yingying Chen (2024). mmWave-based Fitness Activity Recognition Dataset [Dataset]. http://doi.org/10.5281/zenodo.7793613
Explore at:
zip, pngAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.7793613
Dataset updated
Jul 12, 2024
Dataset provided by
Zenodo
Authors
Yucheng Xie; Xiaonan Guo; Yan Wang; Jerry Cheng; Yingying Chen; Yucheng Xie; Xiaonan Guo; Yan Wang; Jerry Cheng; Yingying Chen
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Description:
This mmWave Datasets are used for fitness activity identification. This dataset (FA Dataset) contains 14 common fitness daily activities. The data are captured by the mmWave radar TI-AWR1642. The dataset can be used by fellow researchers to reproduce the original work or to further explore other machine-learning problems in the domain of mmWave signals.
Format: .png format
Section 1: Device Configuration
A commodity mmWave radar TI AWR1642, which integrates a 2 × 4 antenna array. The detailed information of it can be found at https://www.ti.com/product/AWR1642#:~:text=The%20AWR1642%20is%20an%20ideal,of%2076%20to%2081%20GHz.
A TI DCA1000EVM data capture card is used to collect data from the mmWave device and send data to a laptop. The detailed information can be found at https://www.ti.com/tool/DCA1000EVM?keyMatch=DCA1000EVM.
mmWave radar work at the frequency in the range of 77~81GHz. The sampling rate is fixed at 100 frames per second and each frame has 17 chirps.
Section 2: Data Format
We provide our mmWave data in heatmaps for this dataset. The data file is in the png format. The details are shown in the following:
14 activities are included in the FA Dataset.
2 participants are included in the FA Dataset.
FA_d_p_i_u_j.png:
d represents the date to collect the fitness data.
p represents the environment to collect the fitness data.
i represents fitness activity type index
u represents user id
j represents sample index
Example:
FA_20220101_lab_1_2_3 represents the 3rd data sample of user 2 of activity 1 collected in the lab
Section 3: Experimental Setup
We place the mmWave device on a table with a height of 60cm.
The participants are asked to perform fitness activity in front of a mmWave device with a distance of 2m.
The data are collected at an lab with a size of (5.0m×3.0m).
Section 4: Data Description
We develop a spatial-temporal heatmap to integrates multiple activity features, including the range of movement, velocity, and time duration of each activity repetition.

We first derive the Doppler-range map of the users' activity by calculating Range-FFT and Doppler-FFT. Then, we generate the spatial-temporal heatmap by accumulating the velocity of every distance in every Doppler-range map together. Next, we normalize the derived velocity information and present the velocity-distance relationship in time dimension. In this way, we transfer the original instantaneous velocity-distance relationship to a more comprehensive spatial-temporal heatmap which describes the process of a whole activity.

As shown in Figure attached, in each spatial-temporal heatmap, the horizontal axis represents the time duration of an activity repetition while the vertical axis represents the range of movement. The velocity is represented by color.

We create 14 zip files to store the the dataset. There are 14 zip files starting with "FA", each contains repetitions from the same fitness activity.
14 common daily activities and their corresponding files
File Name Activity Type File Name Activity Type
FA1 Crunches FA8 Squats
FA2 Elbow plank and reach FA9 Burpees
FA3 Leg raise FA10 Chest squeezes
FA4 Lunges FA11 High knees
FA5 Mountain climber FA12 Side leg raise
FA6 Punches FA13 Side to side chops
FA7 Push ups FA14 Turning kicks

Section 5: Raw Data and Data Processing Algorithms
We also provide the mmWave raw data (.mat format) stored in the same zip file corresponding to the heatmap datasets. Each .mat file can store one set of activity repetitions (e.g., 4 repetations) from a same user.
For example: FA_d_p_i_u_j.mat:
d represents the data to collect the data.
p represents the environment to collect the data.
i represents the activity type index
u represents the user id
j represents the set index
We plan to provide the data processing algorithms (heatmap_generation.py) to load the mmWave raw data and generate the corresponding heatmap data.
Section 6: Citations
If your paper is related to our works, please cite our papers as follows.
https://ieeexplore.ieee.org/document/9868878/
Xie, Yucheng, Ruizhe Jiang, Xiaonan Guo, Yan Wang, Jerry Cheng, and Yingying Chen. "mmFit: Low-Effort Personalized Fitness Monitoring Using Millimeter Wave." In 2022 International Conference on Computer Communications and Networks (ICCCN), pp. 1-10. IEEE, 2022.
Bibtex:
@inproceedings{xie2022mmfit,
title={mmFit: Low-Effort Personalized Fitness Monitoring Using Millimeter Wave},
author={Xie, Yucheng and Jiang, Ruizhe and Guo, Xiaonan and Wang, Yan and Cheng, Jerry and Chen, Yingying},
booktitle={2022 International Conference on Computer Communications and Networks (ICCCN)},
pages={1--10},
year={2022},
organization={IEEE}
}
N
South Range, MI Age Group Population Dataset: A Complete Breakdown of South...
neilsberg.com
csv, json
Updated Feb 22, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Neilsberg Research (2025). South Range, MI Age Group Population Dataset: A Complete Breakdown of South Range Age Demographics from 0 to 85 Years and Over, Distributed Across 18 Age Groups // 2025 Edition [Dataset]. https://www.neilsberg.com/research/datasets/45476f67-f122-11ef-8c1b-3860777c1fe6/
Explore at:
json, csvAvailable download formats
Dataset updated
Feb 22, 2025
Dataset authored and provided by
Neilsberg Research
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Michigan, South Range
Variables measured
Population Under 5 Years, Population over 85 years, Population Between 5 and 9 years, Population Between 10 and 14 years, Population Between 15 and 19 years, Population Between 20 and 24 years, Population Between 25 and 29 years, Population Between 30 and 34 years, Population Between 35 and 39 years, Population Between 40 and 44 years, and 9 more
Measurement technique
The data presented in this dataset is derived from the latest U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates. To measure the two variables, namely (a) population and (b) population as a percentage of the total population, we initially analyzed and categorized the data for each of the age groups. For age groups we divided it into roughly a 5 year bucket for ages between 0 and 85. For over 85, we aggregated data into a single group for all ages. For further information regarding these estimates, please feel free to reach out to us via email at research@neilsberg.com.
Dataset funded by
Neilsberg Research
Description
About this dataset

Context

The dataset tabulates the South Range population distribution across 18 age groups. It lists the population in each age group along with the percentage population relative of the total population for South Range. The dataset can be utilized to understand the population distribution of South Range by age. For example, using this dataset, we can identify the largest age group in South Range.

Key observations

The largest age group in South Range, MI was for the group of age 20 to 24 years years with a population of 99 (16.87%), according to the ACS 2019-2023 5-Year Estimates. At the same time, the smallest age group in South Range, MI was the 80 to 84 years years with a population of 3 (0.51%). Source: U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates

Content

When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates

Age groups:

Under 5 years

5 to 9 years

10 to 14 years

15 to 19 years

20 to 24 years

25 to 29 years

30 to 34 years

35 to 39 years

40 to 44 years

45 to 49 years

50 to 54 years

55 to 59 years

60 to 64 years

65 to 69 years

70 to 74 years

75 to 79 years

80 to 84 years

85 years and over

Variables / Data Columns

Age Group: This column displays the age group in consideration

Population: The population for the specific age group in the South Range is shown in this column.

% of Total Population: This column displays the population of each age group as a proportion of South Range total population. Please note that the sum of all percentages may not equal one due to rounding of values.

Good to know

Margin of Error

Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.

Custom data

If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.

Inspiration

Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.

Recommended for further research

This dataset is a part of the main dataset for South Range Population by Age. You can refer the same here
Estimated stand-off distance between ADS-B equipped aircraft and obstacles
zenodo.org
data.niaid.nih.gov
+1more
jpeg, zip
Updated Jul 12, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Andrew Weinert; Andrew Weinert (2024). Estimated stand-off distance between ADS-B equipped aircraft and obstacles [Dataset]. http://doi.org/10.5281/zenodo.7741273
Explore at:
zip, jpegAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.7741273
Dataset updated
Jul 12, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Andrew Weinert; Andrew Weinert
License
Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
License information was derived automatically
Description
Summary:

Estimated stand-off distance between ADS-B equipped aircraft and obstacles. Obstacle information was sourced from the FAA Digital Obstacle File and the FHWA National Bridge Inventory. Aircraft tracks were sourced from processed data curated from the OpenSky Network. Results are presented as histograms organized by aircraft type and distance away from runways.

Description:

For many aviation safety studies, aircraft behavior is represented using encounter models, which are statistical models of how aircraft behave during close encounters. They are used to provide a realistic representation of the range of encounter flight dynamics where an aircraft collision avoidance system would be likely to alert. These models currently and have historically have been limited to interactions between aircraft; they have not represented the specific interactions between obstacles and aircraft equipped transponders. In response, we calculated the standoff distance between obstacles and ADS-B equipped manned aircraft.

For robustness, this assessment considered two different datasets of manned aircraft tracks and two datasets of obstacles. For robustness, MIT LL calculated the standoff distance using two different datasets of aircraft tracks and two datasets of obstacles. This approach aligned with the foundational research used to support the ASTM F3442/F3442M-20 well clear criteria of 2000 feet laterally and 250 feet AGL vertically.

The two datasets of processed tracks of ADS-B equipped aircraft curated from the OpenSky Network. It is likely that rotorcraft were underrepresented in these datasets. There were also no considerations for aircraft equipped only with Mode C or not equipped with any transponders. The first dataset was used to train the v1.3 uncorrelated encounter models and referred to as the “Monday” dataset. The second dataset is referred to as the “aerodrome” dataset and was used to train the v2.0 and v3.x terminal encounter model. The Monday dataset consisted of 104 Mondays across North America. The other dataset was based on observations at least 8 nautical miles within Class B, C, D aerodromes in the United States for the first 14 days of each month from January 2019 through February 2020. Prior to any processing, the datasets required 714 and 847 Gigabytes of storage. For more details on these datasets, please refer to "Correlated Bayesian Model of Aircraft Encounters in the Terminal Area Given a Straight Takeoff or Landing" and “Benchmarking the Processing of Aircraft Tracks with Triples Mode and Self-Scheduling.”

Two different datasets of obstacles were also considered. First was point obstacles defined by the FAA digital obstacle file (DOF) and consisted of point obstacle structures of antenna, lighthouse, meteorological tower (met), monument, sign, silo, spire (steeple), stack (chimney; industrial smokestack), transmission line tower (t-l tower), tank (water; fuel), tramway, utility pole (telephone pole, or pole of similar height, supporting wires), windmill (wind turbine), and windsock. Each obstacle was represented by a cylinder with the height reported by the DOF and a radius based on the report horizontal accuracy. We did not consider the actual width and height of the structure itself. Additionally, we only considered obstacles at least 50 feet tall and marked as verified in the DOF.

The other obstacle dataset, termed as “bridges,” was based on the identified bridges in the FAA DOF and additional information provided by the National Bridge Inventory. Due to the potential size and extent of bridges, it would not be appropriate to model them as point obstacles; however, the FAA DOF only provides a point location and no information about the size of the bridge. In response, we correlated the FAA DOF with the National Bridge Inventory, which provides information about the length of many bridges. Instead of sizing the simulated bridge based on horizontal accuracy, like with the point obstacles, the bridges were represented as circles with a radius of the longest, nearest bridge from the NBI. A circle representation was required because neither the FAA DOF or NBI provided sufficient information about orientation to represent bridges as rectangular cuboid. Similar to the point obstacles, the height of the obstacle was based on the height reported by the FAA DOF. Accordingly, the analysis using the bridge dataset should be viewed as risk averse and conservative. It is possible that a manned aircraft was hundreds of feet away from an obstacle in actuality but the estimated standoff distance could be significantly less. Additionally, all obstacles are represented with a fixed height, the potentially flat and low level entrances of the bridge are assumed to have the same height as the tall bridge towers. The attached figure illustrates an example simulated bridge.

It would had been extremely computational inefficient to calculate the standoff distance for all possible track points. Instead, we define an encounter between an aircraft and obstacle as when an aircraft flying 3069 feet AGL or less comes within 3000 feet laterally of any obstacle in a 60 second time interval. If the criteria were satisfied, then for that 60 second track segment we calculate the standoff distance to all nearby obstacles. Vertical separation was based on the MSL altitude of the track and the maximum MSL height of an obstacle.

For each combination of aircraft track and obstacle datasets, the results were organized seven different ways. Filtering criteria were based on aircraft type and distance away from runways. Runway data was sourced from the FAA runways of the United States, Puerto Rico, and Virgin Islands open dataset. Aircraft type was identified as part of the em-processing-opensky workflow.

All: No filter, all observations that satisfied encounter conditions

nearRunway: Aircraft within or at 2 nautical miles of a runway

awayRunway: Observations more than 2 nautical miles from a runway

glider: Observations when aircraft type is a glider

fwme: Observations when aircraft type is a fixed-wing multi-engine

fwse: Observations when aircraft type is a fixed-wing single engine

rotorcraft: Observations when aircraft type is a rotorcraft

License

This dataset is licensed under Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International(CC BY-NC-ND 4.0).

This license requires that reusers give credit to the creator. It allows reusers to copy and distribute the material in any medium or format in unadapted form and for noncommercial purposes only. Only noncommercial use of your work is permitted. Noncommercial means not primarily intended for or directed towards commercial advantage or monetary compensation. Exceptions are given for the not for profit standards organizations of ASTM International and RTCA.

MIT is releasing this dataset in good faith to promote open and transparent research of the low altitude airspace. Given the limitations of the dataset and a need for more research, a more restrictive license was warranted. Namely it is based only on only observations of ADS-B equipped aircraft, which not all aircraft in the airspace are required to employ; and observations were source from a crowdsourced network whose surveillance coverage has not been robustly characterized.

As more research is conducted and the low altitude airspace is further characterized or regulated, it is expected that a future version of this dataset may have a more permissive license.

Distribution Statement

DISTRIBUTION STATEMENT A. Approved for public release. Distribution is unlimited.

© 2021 Massachusetts Institute of Technology.

Delivered to the U.S. Government with Unlimited Rights, as defined in DFARS Part 252.227-7013 or 7014 (Feb 2014). Notwithstanding any copyright notice, U.S. Government rights in this work are defined by DFARS 252.227-7013 or DFARS 252.227-7014 as detailed above. Use of this work other than as specifically authorized by the U.S. Government may violate any copyrights that exist in this work.

This material is based upon work supported by the Federal Aviation Administration under Air Force Contract No. FA8702-15-D-0001. Any opinions, findings, conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the Federal Aviation Administration.

This document is derived from work done for the FAA (and possibly others); it is not the direct product of work done for the FAA. The information provided herein may include content supplied by third parties. Although the data and information contained herein has been produced or processed from sources believed to be reliable, the Federal Aviation Administration makes no warranty, expressed or implied, regarding the accuracy, adequacy, completeness, legality, reliability or usefulness of any information, conclusions or recommendations provided herein. Distribution of the information contained herein does not constitute an endorsement or warranty of the data or information provided herein by the Federal Aviation Administration or the U.S. Department of Transportation. Neither the Federal Aviation Administration nor the U.S. Department of
p
Data from: MIMIC-IV-Ext-Instr: A Dataset of 450K+ EHR-Grounded...
physionet.org
Updated Sep 9, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Zhenbang Wu; Anant Dadu; Mike Nalls; Faraz Faghri; Jimeng Sun (2025). MIMIC-IV-Ext-Instr: A Dataset of 450K+ EHR-Grounded Instruction-Following Examples [Dataset]. http://doi.org/10.13026/e5bq-pr14
Explore at:
Unique identifier
https://doi.org/10.13026/e5bq-pr14
Dataset updated
Sep 9, 2025
Authors
Zhenbang Wu; Anant Dadu; Mike Nalls; Faraz Faghri; Jimeng Sun
License
https://github.com/MIT-LCP/license-and-dua/tree/master/draftshttps://github.com/MIT-LCP/license-and-dua/tree/master/drafts
Description
Large language models (LLMs) have shown impressive capabilities in solving a wide range of tasks based on human instructions. However, developing a conversational AI assistant for electronic health record (EHR) data remains challenging due to the lack of large-scale instruction-following datasets. To address this, we present MIMIC-IV-Ext-Instr, a dataset containing over 450K open-ended, instruction-following examples generated using GPT-3.5 on a HIPAA-compliant platform. Derived from the MIMIC-IV EHR database, MIMIC-IV-Ext-Instr spans a wide range of topics and is specifically designed to support instruction-tuning of general-purpose LLMs for diverse clinical applications.
d
Data from: Native ranges of freshwater fishes of North America
catalog.data.gov
data.usgs.gov
+1more
Updated Nov 27, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. Geological Survey (2025). Native ranges of freshwater fishes of North America [Dataset]. https://catalog.data.gov/dataset/native-ranges-of-freshwater-fishes-of-north-america
Explore at:
Dataset updated
Nov 27, 2025
Dataset provided by
United States Geological Surveyhttp://www.usgs.gov/
Area covered
North America
Description
Background: The Nonindigenous Aquatic Species (NAS) Database functions as a repository and clearinghouse for the occurrence of nonindigenous aquatic species information from across the United States. The Database contains locality information on more than 1,300 species introduced as early as 1800, including freshwater vertebrates and invertebrates, aquatic plants, and marine fishes. Taxa include both foreign species and North American native species that have been translocated outside of their natural range. Locality data are derived from many sources, including scientific literature; Federal, State, and local natural resource monitoring programs; museum collections; news agencies; and direct submission through online reporting forms. To effectively identify and record new introductions for North American native taxa, a robust estimate of their natural native ranges is required. Previously, the NAS Database has used native range information for fishes provided by NatureServe, which was collected from State natural heritage program inventory data and published State fish books. Although these range maps represent an essential first step in assembling native range data, the NatureServe data has varied for many species due to initial data assumptions (i.e., species presence = nativity). Additionally, NatureServe native ranges exhibit watershed gaps for many species. NAS program staff members have made thousands of corrections to these data internally and periodically communicate these changes back to NatureServe. Methods: Native ranges were developed from several data sources. Dr. Dana Infante, Michigan State University, provided the NAS program with occurrence (presence) data from 40-50 Federal, State, museum, and university data providers gathered during her work on the National Fish Habitat Partnership (NFHP). Although many data providers have offered datasets with no restrictions, some have restrictions on redistribution. In addition to the NFHP data, we utilized occurrence datasets for United States museum collections from Biodiversity Information Serving Our Nation (BISON), National Science Foundation's VertNet, FishNet 2 (fish collections in natural history museums, universities, and other institutions), Multistate Aquatic Resources Information System (MARIS) data and Global Biodiversity Information Facility (GBIF), along with a review of State fish books and other primary literature, to complete native range data maintained locally in the NAS Database. Occurrence datasets will be combined into larger, species-specific datasets for further processing at a hydrologic unit code (HUC). We will use GIS analyses to identify watershed occurrence at the eight-digit (HUC8) and twelve-digit (HUC12) level, using the 2015 version of the Watershed Boundary Dataset. HUCs containing known nonindigenous occurrences will be removed from the native range. Watershed gaps (i.e., a HUC that lies between two that are identified as part of the native range) will be investigated using historical literature to identify data gaps from actual range gaps. We will supply native range data by HUC8 (and HUC12 where possible) for 320 species listed below. These data will be provided as a comma-separated values (CSV) file and be made available on the NAS website via web services application programming interface (API).

mmWave-based Activity Recognition Dataset

zenodo.org

png, zip

Updated Jul 12, 2024

Facebook

Twitter

Click to copy link

Link copied

Cite

Yucheng Xie; Ruizhe Jiang; Xiaonan Guo; Yan Wang; Jerry Cheng; Yingying Chen; Yucheng Xie; Ruizhe Jiang; Xiaonan Guo; Yan Wang; Jerry Cheng; Yingying Chen (2024). mmWave-based Activity Recognition Dataset [Dataset]. http://doi.org/10.5281/zenodo.7678020

Explore at:

png, zipAvailable download formats

Unique identifier

https://doi.org/10.5281/zenodo.7678020

Dataset updated

Jul 12, 2024

Dataset provided by

Zenodo

Authors

Yucheng Xie; Ruizhe Jiang; Xiaonan Guo; Yan Wang; Jerry Cheng; Yingying Chen; Yucheng Xie; Ruizhe Jiang; Xiaonan Guo; Yan Wang; Jerry Cheng; Yingying Chen

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Description:

This mmWave Datasets are used for activity verification. It contains two datasets. The first dataset (FA Dataset) contains 14 common daily activities. This second one (EA Dataset) contains 5 kinds of eating activities. The data are captured by the mmWave radar TI-AWR1642. The dataset can be used by fellow researchers to reproduce the original work or to further explore other machine-learning problems in the domain of mmWave signals.

Format: .png format

Section 1: Device Configuration

A commodity mmWave radar TI AWR1642, which integrates a 2 × 4 antenna array. The detailed information of it can be found at https://www.ti.com/product/AWR1642#:~:text=The%20AWR1642%20is%20an%20ideal,of%2076%20to%2081%20GHz.
A TI DCA1000EVM data capture card is used to collect data from the mmWave device and send data to a laptop. The detailed information can be found at https://www.ti.com/tool/DCA1000EVM?keyMatch=DCA1000EVM.
mmWave radar work at the frequency in the range of 77~81GHz. The sampling rate is fixed at 100 frames per second and each frame has 17 chirps.

Section 2: Data Format

We provide our mmWave data in heatmaps for the two datasets. The data file is in the png format. The details are shown in the following:

FA Dataset

2 participants are included in the FA Dataset.
14 activities are included in the FA Dataset.
FA_d_p_i_u_j.png:
- d represents the data to collect the data.
- p represents the environment to collect the data.
- i represents activity type index
- u represents user id
- j represents sample index
Example:
- FA_20220101_lab_1_2_3 represents the 3rd data sample of user 2 of activity 1 collected in the lab

EA Dataset

2 participants are included in the EA Dataset.
5 activities are included in the EA Dataset.
EA_d_p_i_u_j.png:
- d represents the data to collect the data.
- p represents the environment to collect the data.
- i represents the activity type index
- u represents the user id
- j represents the sample index

Section 3: Experimental Setup

FA Dataset

We place the mmWave device on a table with a height of 60cm.
The participants are asked to perform fitness activity in front of a mmWave device with a distance of 2m.
The data are collected at an lab with a size of (5.0m×3.0m).

EA Dataset

We place the mmWave device on a table with a height of 60cm.
The participants are asked to eat with different utensils (i.e., fork, fork&knife, spoon, chopsticks, bare hand) in front of a mmWave device with a distance of 1m.
The data are collected at an lab with a size of (5.0m×3.0m).

Section 4: Data Description

We develop a spatial-temporal heatmap to integrates multiple activity features, including the range of movement, velocity, and time duration of each activity repetition.

We first derive the Doppler-range map of the users’ activity by calculating Range-FFT and Doppler-FFT. Then, we generate the spatial-temporal heatmap by accumulating the velocity of every distance in every Doppler-range map together. Next, we normalize the derived velocity information and present the velocity-distance relationship in time dimension. In this way, we transfer the original instantaneous velocity-distance relationship to a more comprehensive spatial-temporal heatmap which describes the process of a whole activity.

As shown in Figure attached, in each spatial-temporal heatmap, the horizontal axis represents the time duration of an activity repetition while the vertical axis represents the range of movement. The velocity is represented by color.

We create 2 folders to store two dataset respectively. In FA folder, there are 14 subfolders, each contains repetitions from the same fitness activity. In EA folder, there are 5 subfolders, each contains repetitions with different utensils.

14 common daily activities and their corresponding folders
Folder Name	Activity Type	Folder Name	Activity Type
FA1	Crunches	FA8	Squats
FA2	Elbow plank and reach	FA9	Burpees
FA3	Leg raise	FA10	Chest squeezes
FA4	Lunges	FA11	High knees
FA5	Mountain climber	FA12	Side leg raise
FA6	Punches	FA13	Side to side chops
FA7	Push ups	FA14	Turning kicks

5 eating activities and their corresponding folders
Folder Name	Activity Type
EA1	Eating with chopsticks
EA2	Eating with fork
EA3	Eating with bare hand
EA4	Eating with fork&knife
EA5	Eating with spoon

Section 5: Raw Data and Data Processing Algorithms

We also provide the mmWave raw data (.mat format) stored in the same folder corresponding to the heatmap datasets. Each .mat file can store one set of activity repetitions (e.g., 4 repetations) from a same user.
- For example: EA_d_p_i_u_j.mat:
  - d represents the data to collect the data.
  - p represents the environment to collect the data.
  - i represents the activity type index
  - u represents the user id
  - j represents the set index
We plan to provide the data processing algorithms (heatmap_generation.py) to load the mmWave raw data and generate the corresponding heatmap data.

Section 6: Citations

If your paper is related to our works, please cite our papers as follows.

https://ieeexplore.ieee.org/document/9868878/

Xie, Yucheng, Ruizhe Jiang, Xiaonan Guo, Yan Wang, Jerry Cheng, and Yingying Chen. "mmFit: Low-Effort Personalized Fitness Monitoring Using Millimeter Wave." In 2022 International Conference on Computer Communications and Networks (ICCCN), pp. 1-10. IEEE, 2022.

Bibtex:

@inproceedings{xie2022mmfit,

title={mmFit: Low-Effort Personalized Fitness Monitoring Using Millimeter Wave},

author={Xie, Yucheng and Jiang, Ruizhe and Guo, Xiaonan and Wang, Yan and Cheng, Jerry and Chen, Yingying},

booktitle={2022 International Conference on Computer Communications and Networks (ICCCN)},

pages={1--10},

year={2022},

organization={IEEE}

}

https://www.sciencedirect.com/science/article/abs/pii/S2352648321000532

Xie, Yucheng, Ruizhe Jiang, Xiaonan Guo, Yan Wang, Jerry Cheng, and Yingying Chen. "mmEat: Millimeter wave-enabled environment-invariant eating behavior monitoring." Smart Health 23 (2022): 100236.

Bibtex:

@article{xie2022mmeat,

title={mmEat: Millimeter wave-enabled environment-invariant eating behavior monitoring},

author={Xie, Yucheng and Jiang, Ruizhe and Guo, Xiaonan and Wang, Yan and Cheng, Jerry and Chen, Yingying},

journal={Smart Health},

volume={23},

pages={100236},

year={2022},

publisher={Elsevier}

}

Collection of example datasets used for the book - R Programming -...
figshare.com
txt
Updated Dec 4, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kingsley Okoye; Samira Hosseini (2023). Collection of example datasets used for the book - R Programming - Statistical Data Analysis in Research [Dataset]. http://doi.org/10.6084/m9.figshare.24728073.v1
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.24728073.v1
Dataset updated
Dec 4, 2023
Dataset provided by
Figsharehttp://figshare.com/
Authors
Kingsley Okoye; Samira Hosseini
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This book is written for statisticians, data analysts, programmers, researchers, teachers, students, professionals, and general consumers on how to perform different types of statistical data analysis for research purposes using the R programming language. R is an open-source software and object-oriented programming language with a development environment (IDE) called RStudio for computing statistics and graphical displays through data manipulation, modelling, and calculation. R packages and supported libraries provides a wide range of functions for programming and analyzing of data. Unlike many of the existing statistical softwares, R has the added benefit of allowing the users to write more efficient codes by using command-line scripting and vectors. It has several built-in functions and libraries that are extensible and allows the users to define their own (customized) functions on how they expect the program to behave while handling the data, which can also be stored in the simple object system.For all intents and purposes, this book serves as both textbook and manual for R statistics particularly in academic research, data analytics, and computer programming targeted to help inform and guide the work of the R users or statisticians. It provides information about different types of statistical data analysis and methods, and the best scenarios for use of each case in R. It gives a hands-on step-by-step practical guide on how to identify and conduct the different parametric and non-parametric procedures. This includes a description of the different conditions or assumptions that are necessary for performing the various statistical methods or tests, and how to understand the results of the methods. The book also covers the different data formats and sources, and how to test for reliability and validity of the available datasets. Different research experiments, case scenarios and examples are explained in this book. It is the first book to provide a comprehensive description and step-by-step practical hands-on guide to carrying out the different types of statistical analysis in R particularly for research purposes with examples. Ranging from how to import and store datasets in R as Objects, how to code and call the methods or functions for manipulating the datasets or objects, factorization, and vectorization, to better reasoning, interpretation, and storage of the results for future use, and graphical visualizations and representations. Thus, congruence of Statistics and Computer programming for Research.
F
Z+F Imager 5016 Distance Uncertainty
data.uni-hannover.de
jpeg, pdf, ply, txt
Updated Sep 3, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Geodätisches Institut Hannover (2025). Z+F Imager 5016 Distance Uncertainty [Dataset]. https://data.uni-hannover.de/dataset/z-f-imager-5016-distance-uncertainty
Explore at:
ply, pdf, txt, jpegAvailable download formats
Dataset updated
Sep 3, 2025
Dataset authored and provided by
Geodätisches Institut Hannover
License
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Description
This dataset presents a comparative analysis between a high accurate reference point cloud acquired using the Leica ATR 960 (Laser tracker) and Leica LAS XL (Hand-held scanner), and a total of 51 laser scans point clouds using Z+F Imager 5016. The comparisons were carried out at the Hitec Laboratory of the Geodetic Institute Hannover, where controlled scanning conditions were maintained while capturing various objects.

Throughout the entire measurement process, great care was taken to ensure constant temperature and air pressure. The deviations observed through backward modeling are reflected in the distance measurements. Additionally, to explore potential factors influencing TLS distance measurements, feature engineering was conducted. The dataset is exceptionally well-suited for understanding and potentially modeling the uncertainties associated with TLS distance measurements.

Measurement process and backward modelling

https://data.uni-hannover.de/dataset/e0dd7c6c-de06-4c44-8848-e1d7f9757a1a/resource/93a1a7a0-0704-406c-a58b-0d0181cbe6ec/download/measurement_process.jpg" alt="">

Feature engineering

The formulas used for feature engineering are displayed in the following document: Feature engineering

Object describtion & Viewpoints

The definitions of individual objects can be extracted from the following figures. It can be observed that some objects exhibit similar characteristics. https://data.uni-hannover.de/dataset/e0dd7c6c-de06-4c44-8848-e1d7f9757a1a/resource/4a305c9d-00db-4107-82d6-e58dafb37ada/download/objects.jpg" alt="Objects inside the Hitec Laboratory">

The TLS viewpoints were distributed throughout the entire space of the laboratory. The 3D coordinates of the viewpoints as well as the corresponding standard deviations of the translation parameters, derived from the georeferencing process are given in document. Viewpoint overview

Moreover, it should be mentioned that some TLS viewpoints have duplicate scans taken in the first and second phase.

https://data.uni-hannover.de/dataset/e0dd7c6c-de06-4c44-8848-e1d7f9757a1a/resource/af7eb4e9-fb96-43c2-b3a6-00d0bcd3cbc6/download/environment.jpg" alt="">

Data set description

Each object in the dataset has its own individual data stored as a PLY file. These PLY files contain not only the XYZ coordinates but also the features and residuals. A comprehensive description of the dataset can be found in the associated documentation. Data description

Facebook

Twitter

Click to copy link

Link copied

Cite

ANSHIKA SHARMA (2025). 500 CITIES DISTANCE DATASET [Dataset]. https://www.kaggle.com/datasets/anshikasharmacseai/500-cities-distance-daatset

500 CITIES DISTANCE DATASET

City-to-City Distance Dataset (Undirected Weighted Graph) for TSP,MST PROBLEMS

Explore at:

zip(9653 bytes)Available download formats

Dataset updated

Sep 29, 2025

Authors

ANSHIKA SHARMA

Description

This dataset contains pairwise distances between cities represented as an undirected weighted graph. Each row is an edge describing the travel distance between two cities. It is ideal for experiments in graph algorithms (shortest path, MST), combinatorial optimization (TSP), route planning, and educational demonstrations.

Columns:

From — source city (string)

To — destination city (string)

Distance — numerical distance (edge weight)

Quick stats (from provided data):

Number of distinct cities: 8 (City1 .. City8)

Number of edges (rows): 17

Graph type: undirected, weighted (assumed symmetric)

Use cases

Benchmarking shortest-path algorithms (Dijkstra, Bellman-Ford, Floyd–Warshall)

Minimum Spanning Tree (Kruskal/Prim) experiments

Traveling Salesman Problem (TSP) solvers and heuristics

Route planning and logistics toy problems

Teaching graph theory and visualization with networkx

Clear search

Close search

Google apps

Main menu

500 CITIES DISTANCE DATASET

Distance Calculation Dataset

Distance Calculation

Indian Cities Distance Dataset

Key features of this dataset

Data from: Mining Distance-Based Outliers in Near Linear Time

ANN development + final testing datasets

Mining Distance-Based Outliers in Near Linear Time - Dataset - NASA Open...

Golf Ball Distance Calculation Dataset

Golf Ball Distance Calculation

Fused Image dataset for convolutional neural Network-based crack Detection...

South Range, MI Annual Population and Growth Analysis Dataset: A...

About this dataset

Content

Inspiration

Recommended for further research

Dataset for the paper "Observation of Acceleration and Deceleration Periods...

Traveling Salesman Computer Vision

Calculating Line Distances

Dataset Challenges

CSV Files

CSV File Format

Dataset for The effects of a number line intervention on calculation skills

mmWave-based Fitness Activity Recognition Dataset

South Range, MI Age Group Population Dataset: A Complete Breakdown of South...

About this dataset

Content

Inspiration

Recommended for further research

Estimated stand-off distance between ADS-B equipped aircraft and obstacles

Data from: MIMIC-IV-Ext-Instr: A Dataset of 450K+ EHR-Grounded...

Data from: Native ranges of freshwater fishes of North America

mmWave-based Activity Recognition Dataset

Collection of example datasets used for the book - R Programming -...

Z+F Imager 5016 Distance Uncertainty

Measurement process and backward modelling

Feature engineering

Object describtion & Viewpoints

Data set description

500 CITIES DISTANCE DATASET

City-to-City Distance Dataset (Undirected Weighted Graph) for TSP,MST PROBLEMS