100+ datasets found
  1. 500 CITIES DISTANCE DATASET

    • kaggle.com
    zip
    Updated Sep 29, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    ANSHIKA SHARMA (2025). 500 CITIES DISTANCE DATASET [Dataset]. https://www.kaggle.com/datasets/anshikasharmacseai/500-cities-distance-daatset
    Explore at:
    zip(9653 bytes)Available download formats
    Dataset updated
    Sep 29, 2025
    Authors
    ANSHIKA SHARMA
    Description

    This dataset contains pairwise distances between cities represented as an undirected weighted graph. Each row is an edge describing the travel distance between two cities. It is ideal for experiments in graph algorithms (shortest path, MST), combinatorial optimization (TSP), route planning, and educational demonstrations.

    Columns:

    From — source city (string)

    To — destination city (string)

    Distance — numerical distance (edge weight)

    Quick stats (from provided data):

    Number of distinct cities: 8 (City1 .. City8)

    Number of edges (rows): 17

    Graph type: undirected, weighted (assumed symmetric)

    Use cases

    Benchmarking shortest-path algorithms (Dijkstra, Bellman-Ford, Floyd–Warshall)

    Minimum Spanning Tree (Kruskal/Prim) experiments

    Traveling Salesman Problem (TSP) solvers and heuristics

    Route planning and logistics toy problems

    Teaching graph theory and visualization with networkx

  2. R

    Distance Calculation Dataset

    • universe.roboflow.com
    zip
    Updated Mar 2, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    jatin-rane (2023). Distance Calculation Dataset [Dataset]. https://universe.roboflow.com/jatin-rane/distance-calculation/model/3
    Explore at:
    zipAvailable download formats
    Dataset updated
    Mar 2, 2023
    Dataset authored and provided by
    jatin-rane
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Variables measured
    Vehicles Bounding Boxes
    Description

    Distance Calculation

    ## Overview
    
    Distance Calculation is a dataset for object detection tasks - it contains Vehicles annotations for 4,056 images.
    
    ## Getting Started
    
    You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
    
      ## License
    
      This dataset is available under the [Public Domain license](https://creativecommons.org/licenses/Public Domain).
    
  3. Indian Cities Distance Dataset

    • kaggle.com
    zip
    Updated Mar 1, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    K.B. Dharun Krishna (2024). Indian Cities Distance Dataset [Dataset]. https://www.kaggle.com/datasets/kbdharun/a-star-algorithm-route-planning-dataset/code
    Explore at:
    zip(804 bytes)Available download formats
    Dataset updated
    Mar 1, 2024
    Authors
    K.B. Dharun Krishna
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Area covered
    India
    Description

    The "Indian Cities Distance Dataset" is a comprehensive collection of distance data between major cities in India, designed to facilitate pathfinding and optimization tasks.

    This connected dataset includes information about the distances (in kilometres) between pairs of cities, allowing users to calculate the shortest paths and optimize routes for various purposes.

    Key features of this dataset

    City Pairings: The dataset provides connectivity information between pairs of prominent Indian cities, enabling users to calculate the shortest paths and travel distances between any two cities included in the dataset. It is an excellent resource for delving into programming route planning, navigation, and logistics optimization programs.

    Distance Data: Each entry in the dataset includes the distance in kilometres between two cities. The distances have been curated to reflect the actual road distances between these locations.

    A* Search Algorithm: This dataset is ideal for use with the A* (A-star) search algorithm, a widely used optimization and pathfinding algorithm. The A* algorithm can help find the shortest and most efficient routes between cities, making it suitable for transportation, tourism, and urban planning applications.

    Beginner friendly: This dataset contains a minimum number of features for better processing and analyzing of data making it suitable for beginners.

  4. d

    Data from: Mining Distance-Based Outliers in Near Linear Time

    • catalog.data.gov
    • datasets.ai
    Updated Apr 11, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dashlink (2025). Mining Distance-Based Outliers in Near Linear Time [Dataset]. https://catalog.data.gov/dataset/mining-distance-based-outliers-in-near-linear-time
    Explore at:
    Dataset updated
    Apr 11, 2025
    Dataset provided by
    Dashlink
    Description

    Full title: Mining Distance-Based Outliers in Near Linear Time with Randomization and a Simple Pruning Rule Abstract: Defining outliers by their distance to neighboring examples is a popular approach to finding unusual examples in a data set. Recently, much work has been conducted with the goal of finding fast algorithms for this task. We show that a simple nested loop algorithm that in the worst case is quadratic can give near linear time performance when the data is in random order and a simple pruning rule is used. We test our algorithm on real high-dimensional data sets with millions of examples and show that the near linear scaling holds over several orders of magnitude. Our average case analysis suggests that much of the efficiency is because the time to process non-outliers, which are the majority of examples, does not depend on the size of the data set.

  5. Z

    ANN development + final testing datasets

    • data.niaid.nih.gov
    • resodate.org
    • +1more
    Updated Jan 24, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Authors (2020). ANN development + final testing datasets [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_1445865
    Explore at:
    Dataset updated
    Jan 24, 2020
    Authors
    Authors
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    File name definitions:

    '...v_50_175_250_300...' - dataset for velocity ranges [50, 175] + [250, 300] m/s

    '...v_175_250...' - dataset for velocity range [175, 250] m/s

    'ANNdevelop...' - used to perform 9 parametric sub-analyses where, in each one, many ANNs are developed (trained, validated and tested) and the one yielding the best results is selected

    'ANNtest...' - used to test the best ANN from each aforementioned parametric sub-analysis, aiming to find the best ANN model; this dataset includes the 'ANNdevelop...' counterpart

    Where to find the input (independent) and target (dependent) variable values for each dataset/excel ?

    input values in 'IN' sheet

    target values in 'TARGET' sheet

    Where to find the results from the best ANN model (for each target/output variable and each velocity range)?

    open the corresponding excel file and the expected (target) vs ANN (output) results are written in 'TARGET vs OUTPUT' sheet

    Check reference below (to be added when the paper is published)

    https://www.researchgate.net/publication/328849817_11_Neural_Networks_-_Max_Disp_-_Railway_Beams

  6. Mining Distance-Based Outliers in Near Linear Time - Dataset - NASA Open...

    • data.nasa.gov
    Updated Mar 31, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    nasa.gov (2025). Mining Distance-Based Outliers in Near Linear Time - Dataset - NASA Open Data Portal [Dataset]. https://data.nasa.gov/dataset/mining-distance-based-outliers-in-near-linear-time
    Explore at:
    Dataset updated
    Mar 31, 2025
    Dataset provided by
    NASAhttp://nasa.gov/
    Description

    Full title: Mining Distance-Based Outliers in Near Linear Time with Randomization and a Simple Pruning Rule Abstract: Defining outliers by their distance to neighboring examples is a popular approach to finding unusual examples in a data set. Recently, much work has been conducted with the goal of finding fast algorithms for this task. We show that a simple nested loop algorithm that in the worst case is quadratic can give near linear time performance when the data is in random order and a simple pruning rule is used. We test our algorithm on real high-dimensional data sets with millions of examples and show that the near linear scaling holds over several orders of magnitude. Our average case analysis suggests that much of the efficiency is because the time to process non-outliers, which are the majority of examples, does not depend on the size of the data set.

  7. R

    Golf Ball Distance Calculation Dataset

    • universe.roboflow.com
    zip
    Updated Jun 14, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Awais Ahmad (2025). Golf Ball Distance Calculation Dataset [Dataset]. https://universe.roboflow.com/awais-ahmad-dtpcl/golf-ball-distance-calculation/dataset/1
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jun 14, 2025
    Dataset authored and provided by
    Awais Ahmad
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Variables measured
    Golf Balls Bounding Boxes
    Description

    Golf Ball Distance Calculation

    ## Overview
    
    Golf Ball Distance Calculation is a dataset for object detection tasks - it contains Golf Balls annotations for 318 images.
    
    ## Getting Started
    
    You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
    
      ## License
    
      This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
    
  8. Fused Image dataset for convolutional neural Network-based crack Detection...

    • zenodo.org
    • data.niaid.nih.gov
    zip
    Updated Apr 20, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Shanglian Zhou; Shanglian Zhou; Carlos Canchila; Carlos Canchila; Wei Song; Wei Song (2023). Fused Image dataset for convolutional neural Network-based crack Detection (FIND) [Dataset]. http://doi.org/10.5281/zenodo.6383044
    Explore at:
    zipAvailable download formats
    Dataset updated
    Apr 20, 2023
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Shanglian Zhou; Shanglian Zhou; Carlos Canchila; Carlos Canchila; Wei Song; Wei Song
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The “Fused Image dataset for convolutional neural Network-based crack Detection” (FIND) is a large-scale image dataset with pixel-level ground truth crack data for deep learning-based crack segmentation analysis. It features four types of image data including raw intensity image, raw range (i.e., elevation) image, filtered range image, and fused raw image. The FIND dataset consists of 2500 image patches (dimension: 256x256 pixels) and their ground truth crack maps for each of the four data types.

    The images contained in this dataset were collected from multiple bridge decks and roadways under real-world conditions. A laser scanning device was adopted for data acquisition such that the captured raw intensity and raw range images have pixel-to-pixel location correspondence (i.e., spatial co-registration feature). The filtered range data were generated by applying frequency domain filtering to eliminate image disturbances (e.g., surface variations, and grooved patterns) from the raw range data [1]. The fused image data were obtained by combining the raw range and raw intensity data to achieve cross-domain feature correlation [2,3]. Please refer to [4] for a comprehensive benchmark study performed using the FIND dataset to investigate the impact from different types of image data on deep convolutional neural network (DCNN) performance.

    If you share or use this dataset, please cite [4] and [5] in any relevant documentation.

    In addition, an image dataset for crack classification has also been published at [6].

    References:

    [1] Shanglian Zhou, & Wei Song. (2020). Robust Image-Based Surface Crack Detection Using Range Data. Journal of Computing in Civil Engineering, 34(2), 04019054. https://doi.org/10.1061/(asce)cp.1943-5487.0000873

    [2] Shanglian Zhou, & Wei Song. (2021). Crack segmentation through deep convolutional neural networks and heterogeneous image fusion. Automation in Construction, 125. https://doi.org/10.1016/j.autcon.2021.103605

    [3] Shanglian Zhou, & Wei Song. (2020). Deep learning–based roadway crack classification with heterogeneous image data fusion. Structural Health Monitoring, 20(3), 1274-1293. https://doi.org/10.1177/1475921720948434

    [4] Shanglian Zhou, Carlos Canchila, & Wei Song. (2023). Deep learning-based crack segmentation for civil infrastructure: data types, architectures, and benchmarked performance. Automation in Construction, 146. https://doi.org/10.1016/j.autcon.2022.104678

    [5] (This dataset) Shanglian Zhou, Carlos Canchila, & Wei Song. (2022). Fused Image dataset for convolutional neural Network-based crack Detection (FIND) [Data set]. Zenodo. https://doi.org/10.5281/zenodo.6383044

    [6] Wei Song, & Shanglian Zhou. (2020). Laser-scanned roadway range image dataset (LRRD). Laser-scanned Range Image Dataset from Asphalt and Concrete Roadways for DCNN-based Crack Classification, DesignSafe-CI. https://doi.org/10.17603/ds2-bzv3-nc78

  9. N

    South Range, MI Annual Population and Growth Analysis Dataset: A...

    • neilsberg.com
    csv, json
    Updated Jul 30, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Neilsberg Research (2024). South Range, MI Annual Population and Growth Analysis Dataset: A Comprehensive Overview of Population Changes and Yearly Growth Rates in South Range from 2000 to 2023 // 2024 Edition [Dataset]. https://www.neilsberg.com/insights/south-range-mi-population-by-year/
    Explore at:
    json, csvAvailable download formats
    Dataset updated
    Jul 30, 2024
    Dataset authored and provided by
    Neilsberg Research
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Michigan, South Range
    Variables measured
    Annual Population Growth Rate, Population Between 2000 and 2023, Annual Population Growth Rate Percent
    Measurement technique
    The data presented in this dataset is derived from the 20 years data of U.S. Census Bureau Population Estimates Program (PEP) 2000 - 2023. To measure the variables, namely (a) population and (b) population change in ( absolute and as a percentage ), we initially analyzed and tabulated the data for each of the years between 2000 and 2023. For further information regarding these estimates, please feel free to reach out to us via email at research@neilsberg.com.
    Dataset funded by
    Neilsberg Research
    Description
    About this dataset

    Context

    The dataset tabulates the South Range population over the last 20 plus years. It lists the population for each year, along with the year on year change in population, as well as the change in percentage terms for each year. The dataset can be utilized to understand the population change of South Range across the last two decades. For example, using this dataset, we can identify if the population is declining or increasing. If there is a change, when the population peaked, or if it is still growing and has not reached its peak. We can also compare the trend with the overall trend of United States population over the same period of time.

    Key observations

    In 2023, the population of South Range was 741, a 0.27% decrease year-by-year from 2022. Previously, in 2022, South Range population was 743, an increase of 0.13% compared to a population of 742 in 2021. Over the last 20 plus years, between 2000 and 2023, population of South Range increased by 17. In this period, the peak population was 760 in the year 2010. The numbers suggest that the population has already reached its peak and is showing a trend of decline. Source: U.S. Census Bureau Population Estimates Program (PEP).

    Content

    When available, the data consists of estimates from the U.S. Census Bureau Population Estimates Program (PEP).

    Data Coverage:

    • From 2000 to 2023

    Variables / Data Columns

    • Year: This column displays the data year (Measured annually and for years 2000 to 2023)
    • Population: The population for the specific year for the South Range is shown in this column.
    • Year on Year Change: This column displays the change in South Range population for each year compared to the previous year.
    • Change in Percent: This column displays the year on year change as a percentage. Please note that the sum of all percentages may not equal one due to rounding of values.

    Good to know

    Margin of Error

    Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.

    Custom data

    If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.

    Inspiration

    Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.

    Recommended for further research

    This dataset is a part of the main dataset for South Range Population by Year. You can refer the same here

  10. Dataset for the paper "Observation of Acceleration and Deceleration Periods...

    • zenodo.org
    Updated Mar 26, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yide Qian; Yide Qian (2025). Dataset for the paper "Observation of Acceleration and Deceleration Periods at Pine Island Ice Shelf from 1997–2023 " [Dataset]. http://doi.org/10.5281/zenodo.15022854
    Explore at:
    Dataset updated
    Mar 26, 2025
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Yide Qian; Yide Qian
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Pine Island Glacier
    Description

    Dataset and codes for "Observation of Acceleration and Deceleration Periods at Pine Island Ice Shelf from 1997–2023 "

    • Description of the data and file structure

    The MATLAB codes and related datasets are used for generating the figures for the paper "Observation of Acceleration and Deceleration Periods at Pine Island Ice Shelf from 1997–2023".

    Files and variables

    File 1: Data_and_Code.zip

    Directory: Main_function

    **Description:****Include MATLAB scripts and functions. Each script include discriptions that guide the user how to used it and how to find the dataset that used for processing.

    MATLAB Main Scripts: Include the whole steps to process the data, output figures, and output videos.

    Script_1_Ice_velocity_process_flow.m

    Script_2_strain_rate_process_flow.m

    Script_3_DROT_grounding_line_extraction.m

    Script_4_Read_ICESat2_h5_files.m

    Script_5_Extraction_results.m

    MATLAB functions: Five Files that includes MATLAB functions that support the main script:

    1_Ice_velocity_code: Include MATLAB functions related to ice velocity post-processing, includes remove outliers, filter, correct for atmospheric and tidal effect, inverse weited averaged, and error estimate.

    2_strain_rate: Include MATLAB functions related to strain rate calculation.

    3_DROT_extract_grounding_line_code: Include MATLAB functions related to convert range offset results output from GAMMA to differential vertical displacement and used the result extract grounding line.

    4_Extract_data_from_2D_result: Include MATLAB functions that used for extract profiles from 2D data.

    5_NeRD_Damage_detection: Modified code fom Izeboud et al. 2023. When apply this code please also cite Izeboud et al. 2023 (https://www.sciencedirect.com/science/article/pii/S0034425722004655).

    6_Figure_plotting_code:Include MATLAB functions related to Figures in the paper and support information.

    Director: data_and_result

    Description:**Include directories that store the results output from MATLAB. user only neeed to modify the path in MATLAB script to their own path.

    1_origin : Sample data ("PS-20180323-20180329", “PS-20180329-20180404”, “PS-20180404-20180410”) output from GAMMA software in Geotiff format that can be used to calculate DROT and velocity. Includes displacment, theta, phi, and ccp.

    2_maskccpN: Remove outliers by ccp < 0.05 and change displacement to velocity (m/day).

    3_rockpoint: Extract velocities at non-moving region

    4_constant_detrend: removed orbit error

    5_Tidal_correction: remove atmospheric and tidal induced error

    6_rockpoint: Extract non-aggregated velocities at non-moving region

    6_vx_vy_v: trasform velocities from va/vr to vx/vy

    7_rockpoint: Extract aggregated velocities at non-moving region

    7_vx_vy_v_aggregate_and_error_estimate: inverse weighted average of three ice velocity maps and calculate the error maps

    8_strain_rate: calculated strain rate from aggregate ice velocity

    9_compare: store the results before and after tidal correction and aggregation.

    10_Block_result: times series results that extrac from 2D data.

    11_MALAB_output_png_result: Store .png files and time serties result

    12_DROT: Differential Range Offset Tracking results

    13_ICESat_2: ICESat_2 .h5 files and .mat files can put here (in this file only include the samples from tracks 0965 and 1094)

    14_MODIS_images: you can store MODIS images here

    shp: grounding line, rock region, ice front, and other shape files.

    File 2 : PIG_front_1947_2023.zip

    Includes Ice front positions shape files from 1947 to 2023, which used for plotting figure.1 in the paper.

    File 3 : PIG_DROT_GL_2016_2021.zip

    Includes grounding line positions shape files from 1947 to 2023, which used for plotting figure.1 in the paper.

    Data was derived from the following sources:
    Those links can be found in MATLAB scripts or in the paper "**Open Research" **section.

  11. Traveling Salesman Computer Vision

    • kaggle.com
    zip
    Updated Apr 20, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jeff Heaton (2022). Traveling Salesman Computer Vision [Dataset]. https://www.kaggle.com/datasets/jeffheaton/traveling-salesman-computer-vision
    Explore at:
    zip(2977884049 bytes)Available download formats
    Dataset updated
    Apr 20, 2022
    Authors
    Jeff Heaton
    License

    http://www.gnu.org/licenses/lgpl-3.0.htmlhttp://www.gnu.org/licenses/lgpl-3.0.html

    Description

    The Traveling Salesperson Problem (TSP) is a class problem of computer science that seeks to find the shortest route between a group of cities. It is an NP-hard problem in combinatorial optimization, important in theoretical computer science and operations research.

    https://data.heatonresearch.com/images/wustl/kaggle/tsp/world-tsp.png" alt="World Map">

    In this Kaggle competition, your goal is not to find the shortest route among cities. Rather, you must attempt to determine the route labeled on a map.

    Calculating Line Distances

    The data for this competition is not made up of real-world maps, but rather randomly generated maps of varying attributes of size, city count, and optimality of the routes. The following image demonstrates a relatively small map, with few cities, and an optimal route.

    https://data.heatonresearch.com/images/wustl/kaggle/tsp/1.jpg" alt="Small Map">

    Not all maps are this small, or contain this optimal a route. Consider the following map, which is much larger.

    https://data.heatonresearch.com/images/wustl/kaggle/tsp/6.jpg" alt="Larger Map">

    The following attributes were randomly selected to generate each image.

    • Height
    • Width
    • City count
    • Cycles of Simulated Annealing optimization of initial random path

    The path distance is based on the sum of the Euclidean distance of all segments in the path. The distance units are in pixels.

    Dataset Challenges

    This is a regression problem, you are to estimate the total path length. Several challenges to consider.

    • If you indiscriminately scale the maps, you will lose size information.
    • Paths might overlap, causing the ration of total pixels to total length to become misleading.
    • As paths overlap bot other path segments and cities, the resulting color becomes brighter.

    The following picture shows a section from one map zoomed to the pixel-level:

    https://data.heatonresearch.com/images/wustl/kaggle/tsp/tsp_zoom.jpg" alt="TSP Zoom">

    CSV Files

    The following CSV files are provided, in addition to the images.

    • train.csv - Training data, with distance labels.
    • test.csv - Test data without distance labels.
    • tsp-all.csv - Training and test data combined with complete labels and additional information about each generated map.

    CSV File Format

    The tsp-all.csv file contains the following data.

    id,filename,distance,key
    0,0.jpg,83110,503x673-270-83110.jpg
    1,1.jpg,1035,906x222-10-1035.jpg
    2,2.jpg,20756,810x999-299-20756.jpg
    3,3.jpg,13286,781x717-272-13286.jpg
    4,4.jpg,13924,609x884-312-13924.jpg
    

    The columns:

    • id - A unique ID that allows linking across all three CSV files.
    • filename - The name of each map's image file.
    • distance - The total distance through the cities, this is the y/label.
    • key - The generator filename, provides the dimensions, city count, & distance.
  12. m

    Dataset for The effects of a number line intervention on calculation skills

    • figshare.mq.edu.au
    • researchdata.edu.au
    txt
    Updated May 12, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Carola Ruiz Hornblas; Saskia Kohnen; Rebecca Bull (2023). Dataset for The effects of a number line intervention on calculation skills [Dataset]. http://doi.org/10.25949/22799717.v1
    Explore at:
    txtAvailable download formats
    Dataset updated
    May 12, 2023
    Dataset provided by
    Macquarie University
    Authors
    Carola Ruiz Hornblas; Saskia Kohnen; Rebecca Bull
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Study information The sample included in this dataset represents five children who participated in a number line intervention study. Originally six children were included in the study, but one of them fulfilled the criterion for exclusion after missing several consecutive sessions. Thus, their data is not included in the dataset. All participants were currently attending Year 1 of primary school at an independent school in New South Wales, Australia. For children to be able to eligible to participate they had to present with low mathematics achievement by performing at or below the 25th percentile in the Maths Problem Solving and/or Numerical Operations subtests from the Wechsler Individual Achievement Test III (WIAT III A & NZ, Wechsler, 2016). Participants were excluded from participating if, as reported by their parents, they have any other diagnosed disorders such as attention deficit hyperactivity disorder, autism spectrum disorder, intellectual disability, developmental language disorder, cerebral palsy or uncorrected sensory disorders. The study followed a multiple baseline case series design, with a baseline phase, a treatment phase, and a post-treatment phase. The baseline phase varied between two and three measurement points, the treatment phase varied between four and seven measurement points, and all participants had 1 post-treatment measurement point. The number of measurement points were distributed across participants as follows: Participant 1 – 3 baseline, 6 treatment, 1 post-treatment Participant 3 – 2 baseline, 7 treatment, 1 post-treatment Participant 5 – 2 baseline, 5 treatment, 1 post-treatment Participant 6 – 3 baseline, 4 treatment, 1 post-treatment Participant 7 – 2 baseline, 5 treatment, 1 post-treatment In each session across all three phases children were assessed in their performance on a number line estimation task, a single-digit computation task, a multi-digit computation task, a dot comparison task and a number comparison task. Furthermore, during the treatment phase, all children completed the intervention task after these assessments. The order of the assessment tasks varied randomly between sessions.

    Measures Number Line Estimation. Children completed a computerised bounded number line task (0-100). The number line is presented in the middle of the screen, and the target number is presented above the start point of the number line to avoid signalling the midpoint (Dackermann et al., 2018). Target numbers included two non-overlapping sets (trained and untrained) of 30 items each. Untrained items were assessed on all phases of the study. Trained items were assessed independent of the intervention during baseline and post-treatment phases, and performance on the intervention is used to index performance on the trained set during the treatment phase. Within each set, numbers were equally distributed throughout the number range, with three items within each ten (0-10, 11-20, 21-30, etc.). Target numbers were presented in random order. Participants did not receive performance-based feedback. Accuracy is indexed by percent absolute error (PAE) [(number estimated - target number)/ scale of number line] x100.

    Single-Digit Computation. The task included ten additions with single-digit addends (1-9) and single-digit results (2-9). The order was counterbalanced so that half of the additions present the lowest addend first (e.g., 3 + 5) and half of the additions present the highest addend first (e.g., 6 + 3). This task also included ten subtractions with single-digit minuends (3-9), subtrahends (1-6) and differences (1-6). The items were presented horizontally on the screen accompanied by a sound and participants were required to give a verbal response. Participants did not receive performance-based feedback. Performance on this task was indexed by item-based accuracy.

    Multi-digit computational estimation. The task included eight additions and eight subtractions presented with double-digit numbers and three response options. None of the response options represent the correct result. Participants were asked to select the option that was closest to the correct result. In half of the items the calculation involved two double-digit numbers, and in the other half one double and one single digit number. The distance between the correct response option and the exact result of the calculation was two for half of the trials and three for the other half. The calculation was presented vertically on the screen with the three options shown below. The calculations remained on the screen until participants responded by clicking on one of the options on the screen. Participants did not receive performance-based feedback. Performance on this task is measured by item-based accuracy.

    Dot Comparison and Number Comparison. Both tasks included the same 20 items, which were presented twice, counterbalancing left and right presentation. Magnitudes to be compared were between 5 and 99, with four items for each of the following ratios: .91, .83, .77, .71, .67. Both quantities were presented horizontally side by side, and participants were instructed to press one of two keys (F or J), as quickly as possible, to indicate the largest one. Items were presented in random order and participants did not receive performance-based feedback. In the non-symbolic comparison task (dot comparison) the two sets of dots remained on the screen for a maximum of two seconds (to prevent counting). Overall area and convex hull for both sets of dots is kept constant following Guillaume et al. (2020). In the symbolic comparison task (Arabic numbers), the numbers remained on the screen until a response was given. Performance on both tasks was indexed by accuracy.

    The Number Line Intervention During the intervention sessions, participants estimated the position of 30 Arabic numbers in a 0-100 bounded number line. As a form of feedback, within each item, the participants’ estimate remained visible, and the correct position of the target number appeared on the number line. When the estimate’s PAE was lower than 2.5, a message appeared on the screen that read “Excellent job”, when PAE was between 2.5 and 5 the message read “Well done, so close! and when PAE was higher than 5 the message read “Good try!” Numbers were presented in random order.

    Variables in the dataset Age = age in ‘years, months’ at the start of the study Sex = female/male/non-binary or third gender/prefer not to say (as reported by parents) Math_Problem_Solving_raw = Raw score on the Math Problem Solving subtest from the WIAT III (WIAT III A & NZ, Wechsler, 2016). Math_Problem_Solving_Percentile = Percentile equivalent on the Math Problem Solving subtest from the WIAT III (WIAT III A & NZ, Wechsler, 2016). Num_Ops_Raw = Raw score on the Numerical Operations subtest from the WIAT III (WIAT III A & NZ, Wechsler, 2016). Math_Problem_Solving_Percentile = Percentile equivalent on the Numerical Operations subtest from the WIAT III (WIAT III A & NZ, Wechsler, 2016).

    The remaining variables refer to participants’ performance on the study tasks. Each variable name is composed by three sections. The first one refers to the phase and session. For example, Base1 refers to the first measurement point of the baseline phase, Treat1 to the first measurement point on the treatment phase, and post1 to the first measurement point on the post-treatment phase.

    The second part of the variable name refers to the task, as follows: DC = dot comparison SDC = single-digit computation NLE_UT = number line estimation (untrained set) NLE_T= number line estimation (trained set) CE = multidigit computational estimation NC = number comparison The final part of the variable name refers to the type of measure being used (i.e., acc = total correct responses and pae = percent absolute error).

    Thus, variable Base2_NC_acc corresponds to accuracy on the number comparison task during the second measurement point of the baseline phase and Treat3_NLE_UT_pae refers to the percent absolute error on the untrained set of the number line task during the third session of the Treatment phase.

  13. z

    mmWave-based Fitness Activity Recognition Dataset

    • zenodo.org
    png, zip
    Updated Jul 12, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yucheng Xie; Xiaonan Guo; Yan Wang; Jerry Cheng; Yingying Chen; Yucheng Xie; Xiaonan Guo; Yan Wang; Jerry Cheng; Yingying Chen (2024). mmWave-based Fitness Activity Recognition Dataset [Dataset]. http://doi.org/10.5281/zenodo.7793613
    Explore at:
    zip, pngAvailable download formats
    Dataset updated
    Jul 12, 2024
    Dataset provided by
    Zenodo
    Authors
    Yucheng Xie; Xiaonan Guo; Yan Wang; Jerry Cheng; Yingying Chen; Yucheng Xie; Xiaonan Guo; Yan Wang; Jerry Cheng; Yingying Chen
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Description:

    This mmWave Datasets are used for fitness activity identification. This dataset (FA Dataset) contains 14 common fitness daily activities. The data are captured by the mmWave radar TI-AWR1642. The dataset can be used by fellow researchers to reproduce the original work or to further explore other machine-learning problems in the domain of mmWave signals.

    Format: .png format

    Section 1: Device Configuration

    Section 2: Data Format

    We provide our mmWave data in heatmaps for this dataset. The data file is in the png format. The details are shown in the following:

    • 14 activities are included in the FA Dataset.
    • 2 participants are included in the FA Dataset.
    • FA_d_p_i_u_j.png:
      • d represents the date to collect the fitness data.
      • p represents the environment to collect the fitness data.
      • i represents fitness activity type index
      • u represents user id
      • j represents sample index
    • Example:
      • FA_20220101_lab_1_2_3 represents the 3rd data sample of user 2 of activity 1 collected in the lab

    Section 3: Experimental Setup

    • We place the mmWave device on a table with a height of 60cm.
    • The participants are asked to perform fitness activity in front of a mmWave device with a distance of 2m.
    • The data are collected at an lab with a size of (5.0m×3.0m).

    Section 4: Data Description

    • We develop a spatial-temporal heatmap to integrates multiple activity features, including the range of movement, velocity, and time duration of each activity repetition.

    • We first derive the Doppler-range map of the users' activity by calculating Range-FFT and Doppler-FFT. Then, we generate the spatial-temporal heatmap by accumulating the velocity of every distance in every Doppler-range map together. Next, we normalize the derived velocity information and present the velocity-distance relationship in time dimension. In this way, we transfer the original instantaneous velocity-distance relationship to a more comprehensive spatial-temporal heatmap which describes the process of a whole activity.

    • As shown in Figure attached, in each spatial-temporal heatmap, the horizontal axis represents the time duration of an activity repetition while the vertical axis represents the range of movement. The velocity is represented by color.

    • We create 14 zip files to store the the dataset. There are 14 zip files starting with "FA", each contains repetitions from the same fitness activity.

    14 common daily activities and their corresponding files

    File Name Activity Type File Name Activity Type

    FA1 Crunches FA8 Squats

    FA2 Elbow plank and reach FA9 Burpees

    FA3 Leg raise FA10 Chest squeezes

    FA4 Lunges FA11 High knees

    FA5 Mountain climber FA12 Side leg raise

    FA6 Punches FA13 Side to side chops

    FA7 Push ups FA14 Turning kicks

    Section 5: Raw Data and Data Processing Algorithms

    • We also provide the mmWave raw data (.mat format) stored in the same zip file corresponding to the heatmap datasets. Each .mat file can store one set of activity repetitions (e.g., 4 repetations) from a same user.
      • For example: FA_d_p_i_u_j.mat:
        • d represents the data to collect the data.
        • p represents the environment to collect the data.
        • i represents the activity type index
        • u represents the user id
        • j represents the set index
    • We plan to provide the data processing algorithms (heatmap_generation.py) to load the mmWave raw data and generate the corresponding heatmap data.

    Section 6: Citations

    If your paper is related to our works, please cite our papers as follows.

    https://ieeexplore.ieee.org/document/9868878/

    Xie, Yucheng, Ruizhe Jiang, Xiaonan Guo, Yan Wang, Jerry Cheng, and Yingying Chen. "mmFit: Low-Effort Personalized Fitness Monitoring Using Millimeter Wave." In 2022 International Conference on Computer Communications and Networks (ICCCN), pp. 1-10. IEEE, 2022.

    Bibtex:

    @inproceedings{xie2022mmfit,

    title={mmFit: Low-Effort Personalized Fitness Monitoring Using Millimeter Wave},

    author={Xie, Yucheng and Jiang, Ruizhe and Guo, Xiaonan and Wang, Yan and Cheng, Jerry and Chen, Yingying},

    booktitle={2022 International Conference on Computer Communications and Networks (ICCCN)},

    pages={1--10},

    year={2022},

    organization={IEEE}

    }

  14. N

    South Range, MI Age Group Population Dataset: A Complete Breakdown of South...

    • neilsberg.com
    csv, json
    Updated Feb 22, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Neilsberg Research (2025). South Range, MI Age Group Population Dataset: A Complete Breakdown of South Range Age Demographics from 0 to 85 Years and Over, Distributed Across 18 Age Groups // 2025 Edition [Dataset]. https://www.neilsberg.com/research/datasets/45476f67-f122-11ef-8c1b-3860777c1fe6/
    Explore at:
    json, csvAvailable download formats
    Dataset updated
    Feb 22, 2025
    Dataset authored and provided by
    Neilsberg Research
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Michigan, South Range
    Variables measured
    Population Under 5 Years, Population over 85 years, Population Between 5 and 9 years, Population Between 10 and 14 years, Population Between 15 and 19 years, Population Between 20 and 24 years, Population Between 25 and 29 years, Population Between 30 and 34 years, Population Between 35 and 39 years, Population Between 40 and 44 years, and 9 more
    Measurement technique
    The data presented in this dataset is derived from the latest U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates. To measure the two variables, namely (a) population and (b) population as a percentage of the total population, we initially analyzed and categorized the data for each of the age groups. For age groups we divided it into roughly a 5 year bucket for ages between 0 and 85. For over 85, we aggregated data into a single group for all ages. For further information regarding these estimates, please feel free to reach out to us via email at research@neilsberg.com.
    Dataset funded by
    Neilsberg Research
    Description
    About this dataset

    Context

    The dataset tabulates the South Range population distribution across 18 age groups. It lists the population in each age group along with the percentage population relative of the total population for South Range. The dataset can be utilized to understand the population distribution of South Range by age. For example, using this dataset, we can identify the largest age group in South Range.

    Key observations

    The largest age group in South Range, MI was for the group of age 20 to 24 years years with a population of 99 (16.87%), according to the ACS 2019-2023 5-Year Estimates. At the same time, the smallest age group in South Range, MI was the 80 to 84 years years with a population of 3 (0.51%). Source: U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates

    Content

    When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates

    Age groups:

    • Under 5 years
    • 5 to 9 years
    • 10 to 14 years
    • 15 to 19 years
    • 20 to 24 years
    • 25 to 29 years
    • 30 to 34 years
    • 35 to 39 years
    • 40 to 44 years
    • 45 to 49 years
    • 50 to 54 years
    • 55 to 59 years
    • 60 to 64 years
    • 65 to 69 years
    • 70 to 74 years
    • 75 to 79 years
    • 80 to 84 years
    • 85 years and over

    Variables / Data Columns

    • Age Group: This column displays the age group in consideration
    • Population: The population for the specific age group in the South Range is shown in this column.
    • % of Total Population: This column displays the population of each age group as a proportion of South Range total population. Please note that the sum of all percentages may not equal one due to rounding of values.

    Good to know

    Margin of Error

    Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.

    Custom data

    If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.

    Inspiration

    Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.

    Recommended for further research

    This dataset is a part of the main dataset for South Range Population by Age. You can refer the same here

  15. Estimated stand-off distance between ADS-B equipped aircraft and obstacles

    • zenodo.org
    • data.niaid.nih.gov
    • +1more
    jpeg, zip
    Updated Jul 12, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Andrew Weinert; Andrew Weinert (2024). Estimated stand-off distance between ADS-B equipped aircraft and obstacles [Dataset]. http://doi.org/10.5281/zenodo.7741273
    Explore at:
    zip, jpegAvailable download formats
    Dataset updated
    Jul 12, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Andrew Weinert; Andrew Weinert
    License

    Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
    License information was derived automatically

    Description

    Summary:

    Estimated stand-off distance between ADS-B equipped aircraft and obstacles. Obstacle information was sourced from the FAA Digital Obstacle File and the FHWA National Bridge Inventory. Aircraft tracks were sourced from processed data curated from the OpenSky Network. Results are presented as histograms organized by aircraft type and distance away from runways.

    Description:

    For many aviation safety studies, aircraft behavior is represented using encounter models, which are statistical models of how aircraft behave during close encounters. They are used to provide a realistic representation of the range of encounter flight dynamics where an aircraft collision avoidance system would be likely to alert. These models currently and have historically have been limited to interactions between aircraft; they have not represented the specific interactions between obstacles and aircraft equipped transponders. In response, we calculated the standoff distance between obstacles and ADS-B equipped manned aircraft.

    For robustness, this assessment considered two different datasets of manned aircraft tracks and two datasets of obstacles. For robustness, MIT LL calculated the standoff distance using two different datasets of aircraft tracks and two datasets of obstacles. This approach aligned with the foundational research used to support the ASTM F3442/F3442M-20 well clear criteria of 2000 feet laterally and 250 feet AGL vertically.

    The two datasets of processed tracks of ADS-B equipped aircraft curated from the OpenSky Network. It is likely that rotorcraft were underrepresented in these datasets. There were also no considerations for aircraft equipped only with Mode C or not equipped with any transponders. The first dataset was used to train the v1.3 uncorrelated encounter models and referred to as the “Monday” dataset. The second dataset is referred to as the “aerodrome” dataset and was used to train the v2.0 and v3.x terminal encounter model. The Monday dataset consisted of 104 Mondays across North America. The other dataset was based on observations at least 8 nautical miles within Class B, C, D aerodromes in the United States for the first 14 days of each month from January 2019 through February 2020. Prior to any processing, the datasets required 714 and 847 Gigabytes of storage. For more details on these datasets, please refer to "Correlated Bayesian Model of Aircraft Encounters in the Terminal Area Given a Straight Takeoff or Landing" and “Benchmarking the Processing of Aircraft Tracks with Triples Mode and Self-Scheduling.”

    Two different datasets of obstacles were also considered. First was point obstacles defined by the FAA digital obstacle file (DOF) and consisted of point obstacle structures of antenna, lighthouse, meteorological tower (met), monument, sign, silo, spire (steeple), stack (chimney; industrial smokestack), transmission line tower (t-l tower), tank (water; fuel), tramway, utility pole (telephone pole, or pole of similar height, supporting wires), windmill (wind turbine), and windsock. Each obstacle was represented by a cylinder with the height reported by the DOF and a radius based on the report horizontal accuracy. We did not consider the actual width and height of the structure itself. Additionally, we only considered obstacles at least 50 feet tall and marked as verified in the DOF.

    The other obstacle dataset, termed as “bridges,” was based on the identified bridges in the FAA DOF and additional information provided by the National Bridge Inventory. Due to the potential size and extent of bridges, it would not be appropriate to model them as point obstacles; however, the FAA DOF only provides a point location and no information about the size of the bridge. In response, we correlated the FAA DOF with the National Bridge Inventory, which provides information about the length of many bridges. Instead of sizing the simulated bridge based on horizontal accuracy, like with the point obstacles, the bridges were represented as circles with a radius of the longest, nearest bridge from the NBI. A circle representation was required because neither the FAA DOF or NBI provided sufficient information about orientation to represent bridges as rectangular cuboid. Similar to the point obstacles, the height of the obstacle was based on the height reported by the FAA DOF. Accordingly, the analysis using the bridge dataset should be viewed as risk averse and conservative. It is possible that a manned aircraft was hundreds of feet away from an obstacle in actuality but the estimated standoff distance could be significantly less. Additionally, all obstacles are represented with a fixed height, the potentially flat and low level entrances of the bridge are assumed to have the same height as the tall bridge towers. The attached figure illustrates an example simulated bridge.

    It would had been extremely computational inefficient to calculate the standoff distance for all possible track points. Instead, we define an encounter between an aircraft and obstacle as when an aircraft flying 3069 feet AGL or less comes within 3000 feet laterally of any obstacle in a 60 second time interval. If the criteria were satisfied, then for that 60 second track segment we calculate the standoff distance to all nearby obstacles. Vertical separation was based on the MSL altitude of the track and the maximum MSL height of an obstacle.

    For each combination of aircraft track and obstacle datasets, the results were organized seven different ways. Filtering criteria were based on aircraft type and distance away from runways. Runway data was sourced from the FAA runways of the United States, Puerto Rico, and Virgin Islands open dataset. Aircraft type was identified as part of the em-processing-opensky workflow.

    • All: No filter, all observations that satisfied encounter conditions
    • nearRunway: Aircraft within or at 2 nautical miles of a runway
    • awayRunway: Observations more than 2 nautical miles from a runway
    • glider: Observations when aircraft type is a glider
    • fwme: Observations when aircraft type is a fixed-wing multi-engine
    • fwse: Observations when aircraft type is a fixed-wing single engine
    • rotorcraft: Observations when aircraft type is a rotorcraft

    License

    This dataset is licensed under Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International(CC BY-NC-ND 4.0).

    This license requires that reusers give credit to the creator. It allows reusers to copy and distribute the material in any medium or format in unadapted form and for noncommercial purposes only. Only noncommercial use of your work is permitted. Noncommercial means not primarily intended for or directed towards commercial advantage or monetary compensation. Exceptions are given for the not for profit standards organizations of ASTM International and RTCA.

    MIT is releasing this dataset in good faith to promote open and transparent research of the low altitude airspace. Given the limitations of the dataset and a need for more research, a more restrictive license was warranted. Namely it is based only on only observations of ADS-B equipped aircraft, which not all aircraft in the airspace are required to employ; and observations were source from a crowdsourced network whose surveillance coverage has not been robustly characterized.

    As more research is conducted and the low altitude airspace is further characterized or regulated, it is expected that a future version of this dataset may have a more permissive license.

    Distribution Statement

    DISTRIBUTION STATEMENT A. Approved for public release. Distribution is unlimited.

    © 2021 Massachusetts Institute of Technology.

    Delivered to the U.S. Government with Unlimited Rights, as defined in DFARS Part 252.227-7013 or 7014 (Feb 2014). Notwithstanding any copyright notice, U.S. Government rights in this work are defined by DFARS 252.227-7013 or DFARS 252.227-7014 as detailed above. Use of this work other than as specifically authorized by the U.S. Government may violate any copyrights that exist in this work.

    This material is based upon work supported by the Federal Aviation Administration under Air Force Contract No. FA8702-15-D-0001. Any opinions, findings, conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the Federal Aviation Administration.

    This document is derived from work done for the FAA (and possibly others); it is not the direct product of work done for the FAA. The information provided herein may include content supplied by third parties. Although the data and information contained herein has been produced or processed from sources believed to be reliable, the Federal Aviation Administration makes no warranty, expressed or implied, regarding the accuracy, adequacy, completeness, legality, reliability or usefulness of any information, conclusions or recommendations provided herein. Distribution of the information contained herein does not constitute an endorsement or warranty of the data or information provided herein by the Federal Aviation Administration or the U.S. Department of Transportation. Neither the Federal Aviation Administration nor the U.S. Department of

  16. p

    Data from: MIMIC-IV-Ext-Instr: A Dataset of 450K+ EHR-Grounded...

    • physionet.org
    Updated Sep 9, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Zhenbang Wu; Anant Dadu; Mike Nalls; Faraz Faghri; Jimeng Sun (2025). MIMIC-IV-Ext-Instr: A Dataset of 450K+ EHR-Grounded Instruction-Following Examples [Dataset]. http://doi.org/10.13026/e5bq-pr14
    Explore at:
    Dataset updated
    Sep 9, 2025
    Authors
    Zhenbang Wu; Anant Dadu; Mike Nalls; Faraz Faghri; Jimeng Sun
    License

    https://github.com/MIT-LCP/license-and-dua/tree/master/draftshttps://github.com/MIT-LCP/license-and-dua/tree/master/drafts

    Description

    Large language models (LLMs) have shown impressive capabilities in solving a wide range of tasks based on human instructions. However, developing a conversational AI assistant for electronic health record (EHR) data remains challenging due to the lack of large-scale instruction-following datasets. To address this, we present MIMIC-IV-Ext-Instr, a dataset containing over 450K open-ended, instruction-following examples generated using GPT-3.5 on a HIPAA-compliant platform. Derived from the MIMIC-IV EHR database, MIMIC-IV-Ext-Instr spans a wide range of topics and is specifically designed to support instruction-tuning of general-purpose LLMs for diverse clinical applications.

  17. d

    Data from: Native ranges of freshwater fishes of North America

    • catalog.data.gov
    • data.usgs.gov
    • +1more
    Updated Nov 27, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. Geological Survey (2025). Native ranges of freshwater fishes of North America [Dataset]. https://catalog.data.gov/dataset/native-ranges-of-freshwater-fishes-of-north-america
    Explore at:
    Dataset updated
    Nov 27, 2025
    Dataset provided by
    United States Geological Surveyhttp://www.usgs.gov/
    Area covered
    North America
    Description

    Background: The Nonindigenous Aquatic Species (NAS) Database functions as a repository and clearinghouse for the occurrence of nonindigenous aquatic species information from across the United States. The Database contains locality information on more than 1,300 species introduced as early as 1800, including freshwater vertebrates and invertebrates, aquatic plants, and marine fishes. Taxa include both foreign species and North American native species that have been translocated outside of their natural range. Locality data are derived from many sources, including scientific literature; Federal, State, and local natural resource monitoring programs; museum collections; news agencies; and direct submission through online reporting forms. To effectively identify and record new introductions for North American native taxa, a robust estimate of their natural native ranges is required. Previously, the NAS Database has used native range information for fishes provided by NatureServe, which was collected from State natural heritage program inventory data and published State fish books. Although these range maps represent an essential first step in assembling native range data, the NatureServe data has varied for many species due to initial data assumptions (i.e., species presence = nativity). Additionally, NatureServe native ranges exhibit watershed gaps for many species. NAS program staff members have made thousands of corrections to these data internally and periodically communicate these changes back to NatureServe. Methods: Native ranges were developed from several data sources. Dr. Dana Infante, Michigan State University, provided the NAS program with occurrence (presence) data from 40-50 Federal, State, museum, and university data providers gathered during her work on the National Fish Habitat Partnership (NFHP). Although many data providers have offered datasets with no restrictions, some have restrictions on redistribution. In addition to the NFHP data, we utilized occurrence datasets for United States museum collections from Biodiversity Information Serving Our Nation (BISON), National Science Foundation's VertNet, FishNet 2 (fish collections in natural history museums, universities, and other institutions), Multistate Aquatic Resources Information System (MARIS) data and Global Biodiversity Information Facility (GBIF), along with a review of State fish books and other primary literature, to complete native range data maintained locally in the NAS Database. Occurrence datasets will be combined into larger, species-specific datasets for further processing at a hydrologic unit code (HUC). We will use GIS analyses to identify watershed occurrence at the eight-digit (HUC8) and twelve-digit (HUC12) level, using the 2015 version of the Watershed Boundary Dataset. HUCs containing known nonindigenous occurrences will be removed from the native range. Watershed gaps (i.e., a HUC that lies between two that are identified as part of the native range) will be investigated using historical literature to identify data gaps from actual range gaps. We will supply native range data by HUC8 (and HUC12 where possible) for 320 species listed below. These data will be provided as a comma-separated values (CSV) file and be made available on the NAS website via web services application programming interface (API).

  18. z

    mmWave-based Activity Recognition Dataset

    • zenodo.org
    png, zip
    Updated Jul 12, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yucheng Xie; Ruizhe Jiang; Xiaonan Guo; Yan Wang; Jerry Cheng; Yingying Chen; Yucheng Xie; Ruizhe Jiang; Xiaonan Guo; Yan Wang; Jerry Cheng; Yingying Chen (2024). mmWave-based Activity Recognition Dataset [Dataset]. http://doi.org/10.5281/zenodo.7678020
    Explore at:
    png, zipAvailable download formats
    Dataset updated
    Jul 12, 2024
    Dataset provided by
    Zenodo
    Authors
    Yucheng Xie; Ruizhe Jiang; Xiaonan Guo; Yan Wang; Jerry Cheng; Yingying Chen; Yucheng Xie; Ruizhe Jiang; Xiaonan Guo; Yan Wang; Jerry Cheng; Yingying Chen
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Description:

    This mmWave Datasets are used for activity verification. It contains two datasets. The first dataset (FA Dataset) contains 14 common daily activities. This second one (EA Dataset) contains 5 kinds of eating activities. The data are captured by the mmWave radar TI-AWR1642. The dataset can be used by fellow researchers to reproduce the original work or to further explore other machine-learning problems in the domain of mmWave signals.

    Format: .png format

    Section 1: Device Configuration

    Section 2: Data Format

    We provide our mmWave data in heatmaps for the two datasets. The data file is in the png format. The details are shown in the following:

    FA Dataset

    • 2 participants are included in the FA Dataset.
    • 14 activities are included in the FA Dataset.
    • FA_d_p_i_u_j.png:
      • d represents the data to collect the data.
      • p represents the environment to collect the data.
      • i represents activity type index
      • u represents user id
      • j represents sample index
    • Example:
      • FA_20220101_lab_1_2_3 represents the 3rd data sample of user 2 of activity 1 collected in the lab

    EA Dataset

    • 2 participants are included in the EA Dataset.
    • 5 activities are included in the EA Dataset.
    • EA_d_p_i_u_j.png:
      • d represents the data to collect the data.
      • p represents the environment to collect the data.
      • i represents the activity type index
      • u represents the user id
      • j represents the sample index

    Section 3: Experimental Setup

    FA Dataset

    • We place the mmWave device on a table with a height of 60cm.
    • The participants are asked to perform fitness activity in front of a mmWave device with a distance of 2m.
    • The data are collected at an lab with a size of (5.0m×3.0m).

    EA Dataset

    • We place the mmWave device on a table with a height of 60cm.
    • The participants are asked to eat with different utensils (i.e., fork, fork&knife, spoon, chopsticks, bare hand) in front of a mmWave device with a distance of 1m.
    • The data are collected at an lab with a size of (5.0m×3.0m).

    Section 4: Data Description

    • We develop a spatial-temporal heatmap to integrates multiple activity features, including the range of movement, velocity, and time duration of each activity repetition.

    • We first derive the Doppler-range map of the users’ activity by calculating Range-FFT and Doppler-FFT. Then, we generate the spatial-temporal heatmap by accumulating the velocity of every distance in every Doppler-range map together. Next, we normalize the derived velocity information and present the velocity-distance relationship in time dimension. In this way, we transfer the original instantaneous velocity-distance relationship to a more comprehensive spatial-temporal heatmap which describes the process of a whole activity.

    • As shown in Figure attached, in each spatial-temporal heatmap, the horizontal axis represents the time duration of an activity repetition while the vertical axis represents the range of movement. The velocity is represented by color.

    • We create 2 folders to store two dataset respectively. In FA folder, there are 14 subfolders, each contains repetitions from the same fitness activity. In EA folder, there are 5 subfolders, each contains repetitions with different utensils.
    14 common daily activities and their corresponding folders

    Folder Name

    Activity Type

    Folder Name

    Activity Type

    FA1

    Crunches

    FA8

    Squats

    FA2

    Elbow plank and reach

    FA9

    Burpees

    FA3

    Leg raise

    FA10

    Chest squeezes

    FA4

    Lunges

    FA11

    High knees

    FA5

    Mountain climber

    FA12

    Side leg raise

    FA6

    Punches

    FA13

    Side to side chops

    FA7

    Push ups

    FA14

    Turning kicks

    5 eating activities and their corresponding folders

    Folder Name

    Activity Type

    EA1

    Eating with chopsticks

    EA2

    Eating with fork

    EA3

    Eating with bare hand

    EA4

    Eating with fork&knife

    EA5

    Eating with spoon

    Section 5: Raw Data and Data Processing Algorithms

    • We also provide the mmWave raw data (.mat format) stored in the same folder corresponding to the heatmap datasets. Each .mat file can store one set of activity repetitions (e.g., 4 repetations) from a same user.
      • For example: EA_d_p_i_u_j.mat:
        • d represents the data to collect the data.
        • p represents the environment to collect the data.
        • i represents the activity type index
        • u represents the user id
        • j represents the set index
    • We plan to provide the data processing algorithms (heatmap_generation.py) to load the mmWave raw data and generate the corresponding heatmap data.

    Section 6: Citations

    If your paper is related to our works, please cite our papers as follows.

    https://ieeexplore.ieee.org/document/9868878/

    Xie, Yucheng, Ruizhe Jiang, Xiaonan Guo, Yan Wang, Jerry Cheng, and Yingying Chen. "mmFit: Low-Effort Personalized Fitness Monitoring Using Millimeter Wave." In 2022 International Conference on Computer Communications and Networks (ICCCN), pp. 1-10. IEEE, 2022.

    Bibtex:

    @inproceedings{xie2022mmfit,

    title={mmFit: Low-Effort Personalized Fitness Monitoring Using Millimeter Wave},

    author={Xie, Yucheng and Jiang, Ruizhe and Guo, Xiaonan and Wang, Yan and Cheng, Jerry and Chen, Yingying},

    booktitle={2022 International Conference on Computer Communications and Networks (ICCCN)},

    pages={1--10},

    year={2022},

    organization={IEEE}

    }

    https://www.sciencedirect.com/science/article/abs/pii/S2352648321000532

    Xie, Yucheng, Ruizhe Jiang, Xiaonan Guo, Yan Wang, Jerry Cheng, and Yingying Chen. "mmEat: Millimeter wave-enabled environment-invariant eating behavior monitoring." Smart Health 23 (2022): 100236.

    Bibtex:

    @article{xie2022mmeat,

    title={mmEat: Millimeter wave-enabled environment-invariant eating behavior monitoring},

    author={Xie, Yucheng and Jiang, Ruizhe and Guo, Xiaonan and Wang, Yan and Cheng, Jerry and Chen, Yingying},

    journal={Smart Health},

    volume={23},

    pages={100236},

    year={2022},

    publisher={Elsevier}

    }

  19. Collection of example datasets used for the book - R Programming -...

    • figshare.com
    txt
    Updated Dec 4, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kingsley Okoye; Samira Hosseini (2023). Collection of example datasets used for the book - R Programming - Statistical Data Analysis in Research [Dataset]. http://doi.org/10.6084/m9.figshare.24728073.v1
    Explore at:
    txtAvailable download formats
    Dataset updated
    Dec 4, 2023
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Kingsley Okoye; Samira Hosseini
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This book is written for statisticians, data analysts, programmers, researchers, teachers, students, professionals, and general consumers on how to perform different types of statistical data analysis for research purposes using the R programming language. R is an open-source software and object-oriented programming language with a development environment (IDE) called RStudio for computing statistics and graphical displays through data manipulation, modelling, and calculation. R packages and supported libraries provides a wide range of functions for programming and analyzing of data. Unlike many of the existing statistical softwares, R has the added benefit of allowing the users to write more efficient codes by using command-line scripting and vectors. It has several built-in functions and libraries that are extensible and allows the users to define their own (customized) functions on how they expect the program to behave while handling the data, which can also be stored in the simple object system.For all intents and purposes, this book serves as both textbook and manual for R statistics particularly in academic research, data analytics, and computer programming targeted to help inform and guide the work of the R users or statisticians. It provides information about different types of statistical data analysis and methods, and the best scenarios for use of each case in R. It gives a hands-on step-by-step practical guide on how to identify and conduct the different parametric and non-parametric procedures. This includes a description of the different conditions or assumptions that are necessary for performing the various statistical methods or tests, and how to understand the results of the methods. The book also covers the different data formats and sources, and how to test for reliability and validity of the available datasets. Different research experiments, case scenarios and examples are explained in this book. It is the first book to provide a comprehensive description and step-by-step practical hands-on guide to carrying out the different types of statistical analysis in R particularly for research purposes with examples. Ranging from how to import and store datasets in R as Objects, how to code and call the methods or functions for manipulating the datasets or objects, factorization, and vectorization, to better reasoning, interpretation, and storage of the results for future use, and graphical visualizations and representations. Thus, congruence of Statistics and Computer programming for Research.

  20. F

    Z+F Imager 5016 Distance Uncertainty

    • data.uni-hannover.de
    jpeg, pdf, ply, txt
    Updated Sep 3, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Geodätisches Institut Hannover (2025). Z+F Imager 5016 Distance Uncertainty [Dataset]. https://data.uni-hannover.de/dataset/z-f-imager-5016-distance-uncertainty
    Explore at:
    ply, pdf, txt, jpegAvailable download formats
    Dataset updated
    Sep 3, 2025
    Dataset authored and provided by
    Geodätisches Institut Hannover
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    This dataset presents a comparative analysis between a high accurate reference point cloud acquired using the Leica ATR 960 (Laser tracker) and Leica LAS XL (Hand-held scanner), and a total of 51 laser scans point clouds using Z+F Imager 5016. The comparisons were carried out at the Hitec Laboratory of the Geodetic Institute Hannover, where controlled scanning conditions were maintained while capturing various objects.

    Throughout the entire measurement process, great care was taken to ensure constant temperature and air pressure. The deviations observed through backward modeling are reflected in the distance measurements. Additionally, to explore potential factors influencing TLS distance measurements, feature engineering was conducted. The dataset is exceptionally well-suited for understanding and potentially modeling the uncertainties associated with TLS distance measurements.

    Measurement process and backward modelling

    https://data.uni-hannover.de/dataset/e0dd7c6c-de06-4c44-8848-e1d7f9757a1a/resource/93a1a7a0-0704-406c-a58b-0d0181cbe6ec/download/measurement_process.jpg" alt="">

    Feature engineering

    The formulas used for feature engineering are displayed in the following document: Feature engineering

    Object describtion & Viewpoints

    The definitions of individual objects can be extracted from the following figures. It can be observed that some objects exhibit similar characteristics. https://data.uni-hannover.de/dataset/e0dd7c6c-de06-4c44-8848-e1d7f9757a1a/resource/4a305c9d-00db-4107-82d6-e58dafb37ada/download/objects.jpg" alt="Objects inside the Hitec Laboratory">

    The TLS viewpoints were distributed throughout the entire space of the laboratory. The 3D coordinates of the viewpoints as well as the corresponding standard deviations of the translation parameters, derived from the georeferencing process are given in document. Viewpoint overview

    Moreover, it should be mentioned that some TLS viewpoints have duplicate scans taken in the first and second phase.

    https://data.uni-hannover.de/dataset/e0dd7c6c-de06-4c44-8848-e1d7f9757a1a/resource/af7eb4e9-fb96-43c2-b3a6-00d0bcd3cbc6/download/environment.jpg" alt="">

    Data set description

    Each object in the dataset has its own individual data stored as a PLY file. These PLY files contain not only the XYZ coordinates but also the features and residuals. A comprehensive description of the dataset can be found in the associated documentation. Data description

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
ANSHIKA SHARMA (2025). 500 CITIES DISTANCE DATASET [Dataset]. https://www.kaggle.com/datasets/anshikasharmacseai/500-cities-distance-daatset
Organization logo

500 CITIES DISTANCE DATASET

City-to-City Distance Dataset (Undirected Weighted Graph) for TSP,MST PROBLEMS

Explore at:
zip(9653 bytes)Available download formats
Dataset updated
Sep 29, 2025
Authors
ANSHIKA SHARMA
Description

This dataset contains pairwise distances between cities represented as an undirected weighted graph. Each row is an edge describing the travel distance between two cities. It is ideal for experiments in graph algorithms (shortest path, MST), combinatorial optimization (TSP), route planning, and educational demonstrations.

Columns:

From — source city (string)

To — destination city (string)

Distance — numerical distance (edge weight)

Quick stats (from provided data):

Number of distinct cities: 8 (City1 .. City8)

Number of edges (rows): 17

Graph type: undirected, weighted (assumed symmetric)

Use cases

Benchmarking shortest-path algorithms (Dijkstra, Bellman-Ford, Floyd–Warshall)

Minimum Spanning Tree (Kruskal/Prim) experiments

Traveling Salesman Problem (TSP) solvers and heuristics

Route planning and logistics toy problems

Teaching graph theory and visualization with networkx

Search
Clear search
Close search
Google apps
Main menu