100+ datasets found
  1. n

    Data from: Correcting for missing and irregular data in home-range...

    • data.niaid.nih.gov
    • search.dataone.org
    • +1more
    zip
    Updated Jan 9, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Christen H. Fleming; Daniel Sheldon; William F. Fagan; Peter Leimgruber; Thomas Mueller; Dejid Nandintsetseg; Michael J. Noonan; Kirk A. Olson; Edy Setyawan; Abraham Sianipar; Justin M. Calabrese (2018). Correcting for missing and irregular data in home-range estimation [Dataset]. http://doi.org/10.5061/dryad.n42h0
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jan 9, 2018
    Dataset provided by
    Smithsonian Conservation Biology Institute
    Goethe University Frankfurt
    University of Massachusetts Amherst
    University of Maryland, College Park
    University of Tasmania
    Conservation International Indonesia; Marine Program; Jalan Pejaten Barat 16A, Kemang Jakarta DKI Jakarta 12550 Indonesia
    Authors
    Christen H. Fleming; Daniel Sheldon; William F. Fagan; Peter Leimgruber; Thomas Mueller; Dejid Nandintsetseg; Michael J. Noonan; Kirk A. Olson; Edy Setyawan; Abraham Sianipar; Justin M. Calabrese
    License

    https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html

    Area covered
    Mongolia
    Description

    Home-range estimation is an important application of animal tracking data that is frequently complicated by autocorrelation, sampling irregularity, and small effective sample sizes. We introduce a novel, optimal weighting method that accounts for temporal sampling bias in autocorrelated tracking data. This method corrects for irregular and missing data, such that oversampled times are downweighted and undersampled times are upweighted to minimize error in the home-range estimate. We also introduce computationally efficient algorithms that make this method feasible with large datasets. Generally speaking, there are three situations where weight optimization improves the accuracy of home-range estimates: with marine data, where the sampling schedule is highly irregular, with duty cycled data, where the sampling schedule changes during the observation period, and when a small number of home-range crossings are observed, making the beginning and end times more independent and informative than the intermediate times. Using both simulated data and empirical examples including reef manta ray, Mongolian gazelle, and African buffalo, optimal weighting is shown to reduce the error and increase the spatial resolution of home-range estimates. With a conveniently packaged and computationally efficient software implementation, this method broadens the array of datasets with which accurate space-use assessments can be made.

  2. Credit Card Eligibility Data: Determining Factors

    • kaggle.com
    zip
    Updated May 18, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rohit Sharma (2024). Credit Card Eligibility Data: Determining Factors [Dataset]. https://www.kaggle.com/datasets/rohit265/credit-card-eligibility-data-determining-factors
    Explore at:
    zip(303227 bytes)Available download formats
    Dataset updated
    May 18, 2024
    Authors
    Rohit Sharma
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Description of the Credit Card Eligibility Data: Determining Factors

    The Credit Card Eligibility Dataset: Determining Factors is a comprehensive collection of variables aimed at understanding the factors that influence an individual's eligibility for a credit card. This dataset encompasses a wide range of demographic, financial, and personal attributes that are commonly considered by financial institutions when assessing an individual's suitability for credit.

    Each row in the dataset represents a unique individual, identified by a unique ID, with associated attributes ranging from basic demographic information such as gender and age, to financial indicators like total income and employment status. Additionally, the dataset includes variables related to familial status, housing, education, and occupation, providing a holistic view of the individual's background and circumstances.

    VariableDescription
    IDAn identifier for each individual (customer).
    GenderThe gender of the individual.
    Own_carA binary feature indicating whether the individual owns a car.
    Own_propertyA binary feature indicating whether the individual owns a property.
    Work_phoneA binary feature indicating whether the individual has a work phone.
    PhoneA binary feature indicating whether the individual has a phone.
    EmailA binary feature indicating whether the individual has provided an email address.
    UnemployedA binary feature indicating whether the individual is unemployed.
    Num_childrenThe number of children the individual has.
    Num_familyThe total number of family members.
    Account_lengthThe length of the individual's account with a bank or financial institution.
    Total_incomeThe total income of the individual.
    AgeThe age of the individual.
    Years_employedThe number of years the individual has been employed.
    Income_typeThe type of income (e.g., employed, self-employed, etc.).
    Education_typeThe education level of the individual.
    Family_statusThe family status of the individual.
    Housing_typeThe type of housing the individual lives in.
    Occupation_typeThe type of occupation the individual is engaged in.
    TargetThe target variable for the classification task, indicating whether the individual is eligible for a credit card or not (e.g., Yes/No, 1/0).

    Researchers, analysts, and financial institutions can leverage this dataset to gain insights into the key factors influencing credit card eligibility and to develop predictive models that assist in automating the credit assessment process. By understanding the relationship between various attributes and credit card eligibility, stakeholders can make more informed decisions, improve risk assessment strategies, and enhance customer targeting and segmentation efforts.

    This dataset is valuable for a wide range of applications within the financial industry, including credit risk management, customer relationship management, and marketing analytics. Furthermore, it provides a valuable resource for academic research and educational purposes, enabling students and researchers to explore the intricate dynamics of credit card eligibility determination.

  3. Data from: Current and projected research data storage needs of Agricultural...

    • catalog.data.gov
    • agdatacommons.nal.usda.gov
    • +2more
    Updated Apr 21, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Agricultural Research Service (2025). Current and projected research data storage needs of Agricultural Research Service researchers in 2016 [Dataset]. https://catalog.data.gov/dataset/current-and-projected-research-data-storage-needs-of-agricultural-research-service-researc-f33da
    Explore at:
    Dataset updated
    Apr 21, 2025
    Dataset provided by
    Agricultural Research Servicehttps://www.ars.usda.gov/
    Description

    The USDA Agricultural Research Service (ARS) recently established SCINet , which consists of a shared high performance computing resource, Ceres, and the dedicated high-speed Internet2 network used to access Ceres. Current and potential SCINet users are using and generating very large datasets so SCINet needs to be provisioned with adequate data storage for their active computing. It is not designed to hold data beyond active research phases. At the same time, the National Agricultural Library has been developing the Ag Data Commons, a research data catalog and repository designed for public data release and professional data curation. Ag Data Commons needs to anticipate the size and nature of data it will be tasked with handling. The ARS Web-enabled Databases Working Group, organized under the SCINet initiative, conducted a study to establish baseline data storage needs and practices, and to make projections that could inform future infrastructure design, purchases, and policies. The SCINet Web-enabled Databases Working Group helped develop the survey which is the basis for an internal report. While the report was for internal use, the survey and resulting data may be generally useful and are being released publicly. From October 24 to November 8, 2016 we administered a 17-question survey (Appendix A) by emailing a Survey Monkey link to all ARS Research Leaders, intending to cover data storage needs of all 1,675 SY (Category 1 and Category 4) scientists. We designed the survey to accommodate either individual researcher responses or group responses. Research Leaders could decide, based on their unit's practices or their management preferences, whether to delegate response to a data management expert in their unit, to all members of their unit, or to themselves collate responses from their unit before reporting in the survey. Larger storage ranges cover vastly different amounts of data so the implications here could be significant depending on whether the true amount is at the lower or higher end of the range. Therefore, we requested more detail from "Big Data users," those 47 respondents who indicated they had more than 10 to 100 TB or over 100 TB total current data (Q5). All other respondents are called "Small Data users." Because not all of these follow-up requests were successful, we used actual follow-up responses to estimate likely responses for those who did not respond. We defined active data as data that would be used within the next six months. All other data would be considered inactive, or archival. To calculate per person storage needs we used the high end of the reported range divided by 1 for an individual response, or by G, the number of individuals in a group response. For Big Data users we used the actual reported values or estimated likely values. Resources in this dataset:Resource Title: Appendix A: ARS data storage survey questions. File Name: Appendix A.pdfResource Description: The full list of questions asked with the possible responses. The survey was not administered using this PDF but the PDF was generated directly from the administered survey using the Print option under Design Survey. Asterisked questions were required. A list of Research Units and their associated codes was provided in a drop down not shown here. Resource Software Recommended: Adobe Acrobat,url: https://get.adobe.com/reader/ Resource Title: CSV of Responses from ARS Researcher Data Storage Survey. File Name: Machine-readable survey response data.csvResource Description: CSV file includes raw responses from the administered survey, as downloaded unfiltered from Survey Monkey, including incomplete responses. Also includes additional classification and calculations to support analysis. Individual email addresses and IP addresses have been removed. This information is that same data as in the Excel spreadsheet (also provided).Resource Title: Responses from ARS Researcher Data Storage Survey. File Name: Data Storage Survey Data for public release.xlsxResource Description: MS Excel worksheet that Includes raw responses from the administered survey, as downloaded unfiltered from Survey Monkey, including incomplete responses. Also includes additional classification and calculations to support analysis. Individual email addresses and IP addresses have been removed.Resource Software Recommended: Microsoft Excel,url: https://products.office.com/en-us/excel

  4. i

    Large and Long-Range Graph Dataset

    • ieee-dataport.org
    Updated Sep 18, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    shuo wang (2025). Large and Long-Range Graph Dataset [Dataset]. https://ieee-dataport.org/documents/large-and-long-range-graph-dataset
    Explore at:
    Dataset updated
    Sep 18, 2025
    Authors
    shuo wang
    Description

    PCQM-Contact (CC BY 4.0)

  5. Z

    Fused Image dataset for convolutional neural Network-based crack Detection...

    • data.niaid.nih.gov
    • zenodo.org
    Updated Apr 20, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Shanglian Zhou; Carlos Canchila; Wei Song (2023). Fused Image dataset for convolutional neural Network-based crack Detection (FIND) [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_6383043
    Explore at:
    Dataset updated
    Apr 20, 2023
    Authors
    Shanglian Zhou; Carlos Canchila; Wei Song
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The “Fused Image dataset for convolutional neural Network-based crack Detection” (FIND) is a large-scale image dataset with pixel-level ground truth crack data for deep learning-based crack segmentation analysis. It features four types of image data including raw intensity image, raw range (i.e., elevation) image, filtered range image, and fused raw image. The FIND dataset consists of 2500 image patches (dimension: 256x256 pixels) and their ground truth crack maps for each of the four data types.

    The images contained in this dataset were collected from multiple bridge decks and roadways under real-world conditions. A laser scanning device was adopted for data acquisition such that the captured raw intensity and raw range images have pixel-to-pixel location correspondence (i.e., spatial co-registration feature). The filtered range data were generated by applying frequency domain filtering to eliminate image disturbances (e.g., surface variations, and grooved patterns) from the raw range data [1]. The fused image data were obtained by combining the raw range and raw intensity data to achieve cross-domain feature correlation [2,3]. Please refer to [4] for a comprehensive benchmark study performed using the FIND dataset to investigate the impact from different types of image data on deep convolutional neural network (DCNN) performance.

    If you share or use this dataset, please cite [4] and [5] in any relevant documentation.

    In addition, an image dataset for crack classification has also been published at [6].

    References:

    [1] Shanglian Zhou, & Wei Song. (2020). Robust Image-Based Surface Crack Detection Using Range Data. Journal of Computing in Civil Engineering, 34(2), 04019054. https://doi.org/10.1061/(asce)cp.1943-5487.0000873

    [2] Shanglian Zhou, & Wei Song. (2021). Crack segmentation through deep convolutional neural networks and heterogeneous image fusion. Automation in Construction, 125. https://doi.org/10.1016/j.autcon.2021.103605

    [3] Shanglian Zhou, & Wei Song. (2020). Deep learning–based roadway crack classification with heterogeneous image data fusion. Structural Health Monitoring, 20(3), 1274-1293. https://doi.org/10.1177/1475921720948434

    [4] Shanglian Zhou, Carlos Canchila, & Wei Song. (2023). Deep learning-based crack segmentation for civil infrastructure: data types, architectures, and benchmarked performance. Automation in Construction, 146. https://doi.org/10.1016/j.autcon.2022.104678

    5 Shanglian Zhou, Carlos Canchila, & Wei Song. (2022). Fused Image dataset for convolutional neural Network-based crack Detection (FIND) [Data set]. Zenodo. https://doi.org/10.5281/zenodo.6383044

    [6] Wei Song, & Shanglian Zhou. (2020). Laser-scanned roadway range image dataset (LRRD). Laser-scanned Range Image Dataset from Asphalt and Concrete Roadways for DCNN-based Crack Classification, DesignSafe-CI. https://doi.org/10.17603/ds2-bzv3-nc78

  6. 🛒 Supermarket Data

    • kaggle.com
    zip
    Updated Jul 19, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    mexwell (2024). 🛒 Supermarket Data [Dataset]. https://www.kaggle.com/datasets/mexwell/supermarket-data/versions/1
    Explore at:
    zip(78427538 bytes)Available download formats
    Dataset updated
    Jul 19, 2024
    Authors
    mexwell
    Description

    This is the dataset released as companion for the paper “Explaining the Product Range Effect in Purchase Data“, presented at the BigData 2013 conference.

    • supermarket_distances: three columns. The first column is the customer id, the second is the shop id and the third is the distance between the customer’s house and the shop location. The distance is a calculated in meters as a straight line so it does not take into account the road graph.
    • supermarket_prices: two columns. The first column is the product id and the second column is its unit price. The price is in Euro and it is calculated as the average unit price for the time span of the dataset.
    • supermarket_purchases: four columns. The first column is the customer id, the second is the product id, the third is the shop id and the fourth is the total amount of items that the customer bought the product in that particular shop. The data is recorded from January 2007 to December 2011.

    Citation

    Pennacchioli, D., Coscia, M., Rinzivillo, S., Pedreschi, D. and Giannotti, F., Explaining the Product Range Effect in Purchase Data. In BigData, 2013.

    Acknowlegement

    Foto von Eduardo Soares auf Unsplash

  7. Data from: GALILEO VENUS RANGE FIX RAW DATA V1.0

    • catalog.data.gov
    • datasets.ai
    • +1more
    Updated Aug 22, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    National Aeronautics and Space Administration (2025). GALILEO VENUS RANGE FIX RAW DATA V1.0 [Dataset]. https://catalog.data.gov/dataset/galileo-venus-range-fix-raw-data-v1-0-0943a
    Explore at:
    Dataset updated
    Aug 22, 2025
    Dataset provided by
    NASAhttp://nasa.gov/
    Description

    Raw radio tracking data used to determine the precise distance to Venus (and improve knowledge of the Astronomical Unit) from the Galileo flyby on 10 February 1990.

  8. housing

    • kaggle.com
    zip
    Updated Sep 22, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    HappyRautela (2023). housing [Dataset]. https://www.kaggle.com/datasets/happyrautela/housing
    Explore at:
    zip(809785 bytes)Available download formats
    Dataset updated
    Sep 22, 2023
    Authors
    HappyRautela
    Description

    The exercise after this contains questions that are based on the housing dataset.

    1. How many houses have a waterfront? a. 21000 b. 21450 c. 163 d. 173

    2. How many houses have 2 floors? a. 2692 b. 8241 c. 10680 d. 161

    3. How many houses built before 1960 have a waterfront? a. 80 b. 7309 c. 90 d. 92

    4. What is the price of the most expensive house having more than 4 bathrooms? a. 7700000 b. 187000 c. 290000 d. 399000

    5. For instance, if the ‘price’ column consists of outliers, how can you make the data clean and remove the redundancies? a. Calculate the IQR range and drop the values outside the range. b. Calculate the p-value and remove the values less than 0.05. c. Calculate the correlation coefficient of the price column and remove the values less than the correlation coefficient. d. Calculate the Z-score of the price column and remove the values less than the z-score.

    6. What are the various parameters that can be used to determine the dependent variables in the housing data to determine the price of the house? a. Correlation coefficients b. Z-score c. IQR Range d. Range of the Features

    7. If we get the r2 score as 0.38, what inferences can we make about the model and its efficiency? a. The model is 38% accurate, and shows poor efficiency. b. The model is showing 0.38% discrepancies in the outcomes. c. Low difference between observed and fitted values. d. High difference between observed and fitted values.

    8. If the metrics show that the p-value for the grade column is 0.092, what all inferences can we make about the grade column? a. Significant in presence of other variables. b. Highly significant in presence of other variables c. insignificance in presence of other variables d. None of the above

    9. If the Variance Inflation Factor value for a feature is considerably higher than the other features, what can we say about that column/feature? a. High multicollinearity b. Low multicollinearity c. Both A and B d. None of the above

  9. GLAS/ICESat L1B Global Waveform-based Range Corrections Data (HDF5) V034 -...

    • data.nasa.gov
    Updated Mar 31, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    nasa.gov (2025). GLAS/ICESat L1B Global Waveform-based Range Corrections Data (HDF5) V034 - Dataset - NASA Open Data Portal [Dataset]. https://data.nasa.gov/dataset/glas-icesat-l1b-global-waveform-based-range-corrections-data-hdf5-v034
    Explore at:
    Dataset updated
    Mar 31, 2025
    Dataset provided by
    NASAhttp://nasa.gov/
    Description

    GLAH05 Level-1B waveform parameterization data include output parameters from the waveform characterization procedure and other parameters required to calculate surface slope and relief characteristics. GLAH05 contains parameterizations of both the transmitted and received pulses and other characteristics from which elevation and footprint-scale roughness and slope are calculated. The received pulse characterization uses two implementations of the retracking algorithms: one tuned for ice sheets, called the standard parameterization, used to calculate surface elevation for ice sheets, oceans, and sea ice; and another for land (the alternative parameterization). Each data granule has an associated browse product.

  10. B

    Data from: A comprehensive analysis of autocorrelation and bias in home...

    • datasetcatalog.nlm.nih.gov
    • borealisdata.ca
    • +1more
    Updated May 19, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Schabo, Dana G.; Ullmann, Wiebke; de Paula Cunha, Rogerio; Markham, A. Catherine; Alberts, Susan C.; Selva, Nuria; Koch, Flávia; Ali, Abdullahi H.; Zwijacz-Kozica, Tomasz; Thompson, Peter; Sergiel, Agnieszka; Mueller, Thomas; Dekker, Jasja; Ramalho, Emiliano E.; Patterson, Bruce D.; Morato, Ronaldo G.; Farwig, Nina; da Silva, Marina X.; LaPoint, Scott; Beyer, Dean; Medici, Emilia Patricia; Goheen, Jacob R.; Noonan, Michael J.; Olson, Kirk A.; Jeltsch, Florian; Belant, Jerrold L.; Fichtel, Claudia; Fleming, Christen H.; Akre, Tom S.; Ford, Adam T.; Nathan, Ran; Böhning-Gaese, Katrin; Fagan, William F.; Blaum, Niels; Tucker, Marlee A.; Antunes, Pamela C.; Drescher-Lehman, Jonathan; Rosner, Sascha; Calabrese, Justin M.; Paviolo, Agustin; Cullen Jr. , Laury; Fischer, Christina; Spiegel, Orr; Altmann, Jeanne; Zięba, Filip; Oliveira-Santos, Luiz Gustavo R.; Kappeler, Peter M.; Kauffman, Matthew; Janssen, René (2021). Data from: A comprehensive analysis of autocorrelation and bias in home range estimation [Dataset]. http://doi.org/10.5683/SP2/OAJTAO
    Explore at:
    Dataset updated
    May 19, 2021
    Authors
    Schabo, Dana G.; Ullmann, Wiebke; de Paula Cunha, Rogerio; Markham, A. Catherine; Alberts, Susan C.; Selva, Nuria; Koch, Flávia; Ali, Abdullahi H.; Zwijacz-Kozica, Tomasz; Thompson, Peter; Sergiel, Agnieszka; Mueller, Thomas; Dekker, Jasja; Ramalho, Emiliano E.; Patterson, Bruce D.; Morato, Ronaldo G.; Farwig, Nina; da Silva, Marina X.; LaPoint, Scott; Beyer, Dean; Medici, Emilia Patricia; Goheen, Jacob R.; Noonan, Michael J.; Olson, Kirk A.; Jeltsch, Florian; Belant, Jerrold L.; Fichtel, Claudia; Fleming, Christen H.; Akre, Tom S.; Ford, Adam T.; Nathan, Ran; Böhning-Gaese, Katrin; Fagan, William F.; Blaum, Niels; Tucker, Marlee A.; Antunes, Pamela C.; Drescher-Lehman, Jonathan; Rosner, Sascha; Calabrese, Justin M.; Paviolo, Agustin; Cullen Jr. , Laury; Fischer, Christina; Spiegel, Orr; Altmann, Jeanne; Zięba, Filip; Oliveira-Santos, Luiz Gustavo R.; Kappeler, Peter M.; Kauffman, Matthew; Janssen, René
    Description

    AbstractHome range estimation is routine practice in ecological research. While advances in animal tracking technology have increased our capacity to collect data to support home range analysis, these same advances have also resulted in increasingly autocorrelated data. Consequently, the question of which home range estimator to use on modern, highly autocorrelated tracking data remains open. This question is particularly relevant given that most estimators assume independently sampled data. Here, we provide a comprehensive evaluation of the effects of autocorrelation on home range estimation. We base our study on an extensive dataset of GPS locations from 369 individuals representing 27 species distributed across 5 continents. We first assemble a broad array of home range estimators, including Kernel Density Estimation (KDE) with four bandwidth optimizers (Gaussian reference function, autocorrelated-Gaussian reference function (AKDE), Silverman's rule of thumb, and least squares cross-validation), Minimum Convex Polygon, and Local Convex Hull methods. Notably, all of these estimators except AKDE assume independent and identically distributed (IID) data. We then employ half-sample cross-validation to objectively quantify estimator performance, and the recently introduced effective sample size for home range area estimation ($\hat{N}_\mathrm{area}$) to quantify the information content of each dataset. We found that AKDE 95\% area estimates were larger than conventional IID-based estimates by a mean factor of 2. The median number of cross-validated locations included in the holdout sets by AKDE 95\% (or 50\%) estimates was 95.3\% (or 50.1\%), confirming the larger AKDE ranges were appropriately selective at the specified quantile. Conversely, conventional estimates exhibited negative bias that increased with decreasing $\hat{N}_\mathrm{area}$. To contextualize our empirical results, we performed a detailed simulation study to tease apart how sampling frequency, sampling duration, and the focal animal's movement conspire to affect range estimates. Paralleling our empirical results, the simulation study demonstrated that AKDE was generally more accurate than conventional methods, particularly for small $\hat{N}_\mathrm{area}$. While 72\% of the 369 empirical datasets had \textgreater1000 total observations, only 4\% had an $\hat{N}_\mathrm{area}$ \textgreater1000, where 30\% had an $\hat{N}_\mathrm{area}$ \textless30. In this frequently encountered scenario of small $\hat{N}_\mathrm{area}$, AKDE was the only estimator capable of producing an accurate home range estimate on autocorrelated data.

  11. z

    mmWave-based Fitness Activity Recognition Dataset

    • zenodo.org
    png, zip
    Updated Jul 12, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yucheng Xie; Xiaonan Guo; Yan Wang; Jerry Cheng; Yingying Chen; Yucheng Xie; Xiaonan Guo; Yan Wang; Jerry Cheng; Yingying Chen (2024). mmWave-based Fitness Activity Recognition Dataset [Dataset]. http://doi.org/10.5281/zenodo.7793613
    Explore at:
    zip, pngAvailable download formats
    Dataset updated
    Jul 12, 2024
    Dataset provided by
    Zenodo
    Authors
    Yucheng Xie; Xiaonan Guo; Yan Wang; Jerry Cheng; Yingying Chen; Yucheng Xie; Xiaonan Guo; Yan Wang; Jerry Cheng; Yingying Chen
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Description:

    This mmWave Datasets are used for fitness activity identification. This dataset (FA Dataset) contains 14 common fitness daily activities. The data are captured by the mmWave radar TI-AWR1642. The dataset can be used by fellow researchers to reproduce the original work or to further explore other machine-learning problems in the domain of mmWave signals.

    Format: .png format

    Section 1: Device Configuration

    Section 2: Data Format

    We provide our mmWave data in heatmaps for this dataset. The data file is in the png format. The details are shown in the following:

    • 14 activities are included in the FA Dataset.
    • 2 participants are included in the FA Dataset.
    • FA_d_p_i_u_j.png:
      • d represents the date to collect the fitness data.
      • p represents the environment to collect the fitness data.
      • i represents fitness activity type index
      • u represents user id
      • j represents sample index
    • Example:
      • FA_20220101_lab_1_2_3 represents the 3rd data sample of user 2 of activity 1 collected in the lab

    Section 3: Experimental Setup

    • We place the mmWave device on a table with a height of 60cm.
    • The participants are asked to perform fitness activity in front of a mmWave device with a distance of 2m.
    • The data are collected at an lab with a size of (5.0m×3.0m).

    Section 4: Data Description

    • We develop a spatial-temporal heatmap to integrates multiple activity features, including the range of movement, velocity, and time duration of each activity repetition.

    • We first derive the Doppler-range map of the users' activity by calculating Range-FFT and Doppler-FFT. Then, we generate the spatial-temporal heatmap by accumulating the velocity of every distance in every Doppler-range map together. Next, we normalize the derived velocity information and present the velocity-distance relationship in time dimension. In this way, we transfer the original instantaneous velocity-distance relationship to a more comprehensive spatial-temporal heatmap which describes the process of a whole activity.

    • As shown in Figure attached, in each spatial-temporal heatmap, the horizontal axis represents the time duration of an activity repetition while the vertical axis represents the range of movement. The velocity is represented by color.

    • We create 14 zip files to store the the dataset. There are 14 zip files starting with "FA", each contains repetitions from the same fitness activity.

    14 common daily activities and their corresponding files

    File Name Activity Type File Name Activity Type

    FA1 Crunches FA8 Squats

    FA2 Elbow plank and reach FA9 Burpees

    FA3 Leg raise FA10 Chest squeezes

    FA4 Lunges FA11 High knees

    FA5 Mountain climber FA12 Side leg raise

    FA6 Punches FA13 Side to side chops

    FA7 Push ups FA14 Turning kicks

    Section 5: Raw Data and Data Processing Algorithms

    • We also provide the mmWave raw data (.mat format) stored in the same zip file corresponding to the heatmap datasets. Each .mat file can store one set of activity repetitions (e.g., 4 repetations) from a same user.
      • For example: FA_d_p_i_u_j.mat:
        • d represents the data to collect the data.
        • p represents the environment to collect the data.
        • i represents the activity type index
        • u represents the user id
        • j represents the set index
    • We plan to provide the data processing algorithms (heatmap_generation.py) to load the mmWave raw data and generate the corresponding heatmap data.

    Section 6: Citations

    If your paper is related to our works, please cite our papers as follows.

    https://ieeexplore.ieee.org/document/9868878/

    Xie, Yucheng, Ruizhe Jiang, Xiaonan Guo, Yan Wang, Jerry Cheng, and Yingying Chen. "mmFit: Low-Effort Personalized Fitness Monitoring Using Millimeter Wave." In 2022 International Conference on Computer Communications and Networks (ICCCN), pp. 1-10. IEEE, 2022.

    Bibtex:

    @inproceedings{xie2022mmfit,

    title={mmFit: Low-Effort Personalized Fitness Monitoring Using Millimeter Wave},

    author={Xie, Yucheng and Jiang, Ruizhe and Guo, Xiaonan and Wang, Yan and Cheng, Jerry and Chen, Yingying},

    booktitle={2022 International Conference on Computer Communications and Networks (ICCCN)},

    pages={1--10},

    year={2022},

    organization={IEEE}

    }

  12. Z

    ANN development + final testing datasets

    • data.niaid.nih.gov
    • resodate.org
    • +1more
    Updated Jan 24, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Authors (2020). ANN development + final testing datasets [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_1445865
    Explore at:
    Dataset updated
    Jan 24, 2020
    Authors
    Authors
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    File name definitions:

    '...v_50_175_250_300...' - dataset for velocity ranges [50, 175] + [250, 300] m/s

    '...v_175_250...' - dataset for velocity range [175, 250] m/s

    'ANNdevelop...' - used to perform 9 parametric sub-analyses where, in each one, many ANNs are developed (trained, validated and tested) and the one yielding the best results is selected

    'ANNtest...' - used to test the best ANN from each aforementioned parametric sub-analysis, aiming to find the best ANN model; this dataset includes the 'ANNdevelop...' counterpart

    Where to find the input (independent) and target (dependent) variable values for each dataset/excel ?

    input values in 'IN' sheet

    target values in 'TARGET' sheet

    Where to find the results from the best ANN model (for each target/output variable and each velocity range)?

    open the corresponding excel file and the expected (target) vs ANN (output) results are written in 'TARGET vs OUTPUT' sheet

    Check reference below (to be added when the paper is published)

    https://www.researchgate.net/publication/328849817_11_Neural_Networks_-_Max_Disp_-_Railway_Beams

  13. Head Hunting

    • kaggle.com
    zip
    Updated Nov 8, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mariyam Al Shatta (2023). Head Hunting [Dataset]. https://www.kaggle.com/datasets/mariyamalshatta/head-hunting/code
    Explore at:
    zip(1515 bytes)Available download formats
    Dataset updated
    Nov 8, 2023
    Authors
    Mariyam Al Shatta
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Business Context

    A research institute conducts a Talent Hunt Examination every year to hire people who can work on various research projects in the field of Mathematics and Computer Science. A2Z institute provides a preparatory program to help the aspirants prepare for the Talent Hunt Exam. The institute has a good record of helping many students clear the exam. Before the application for the next batch starts, the institute wants to attract more aspirants to their program. For this, the institute wants to assure the aspiring students of the quality of results obtained by students enrolled in their program in recent years.

    However, one challenge in estimating an average score is that every year the exam’s difficulty level varies a little, and the distribution of scores also changes accordingly. The institute keeps a track of the final scores of its alumni who attempted the exam previously. A dataset constituted of a simple random sample of final scores of 600 aspirants from the last three years is prepared by the institute.

    Objective

    The institute wants to provide an estimate of the average score obtained by aspirants who enroll in their program. Keeping in mind the variation in scores every year, the institute wants to provide a more reliable estimate of the average score using a range of scores instead of a single estimate. It is known from previous records that the standard deviation of the scores is 10 and the cut-off score in the most recent year was 84.

    A recent social media post from A2Z institute received feedback from a reputed critic, mentioning that the students from A2Z institute score less than last year's cut-off on average. The institute wants to test if the claim by the critic is valid.

    Solution Approach

    To provide a more reliable estimate of the average score using a range of scores instead of a single estimate, we will construct a 95% confidence interval for the mean score that an aspirant has scored after enrolling in the institute’s program. To test the validity of the critic's claim (the mean score of the students from A2Z institute is less than last year’s cut-off score of 84), we will perform a hypothesis test (taking alpha = 5%)

    Data

    The dataset provided (Talent_hunt.csv) contains the final scores of 600 aspirants enrolled in the institute’s program in the last three years.

  14. d

    Data from: Exact finite range DWBA calculations for heavy-ion induced...

    • elsevier.digitalcommonsdata.com
    Updated Jan 1, 1974
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    T. Tamura (1974). Exact finite range DWBA calculations for heavy-ion induced nuclear reactions [Dataset]. http://doi.org/10.17632/xthy9b534c.1
    Explore at:
    Dataset updated
    Jan 1, 1974
    Authors
    T. Tamura
    License

    https://www.elsevier.com/about/policies/open-access-licenses/elsevier-user-license/cpc-license/https://www.elsevier.com/about/policies/open-access-licenses/elsevier-user-license/cpc-license/

    Description

    Title of program: MARS-1-FOR-EFR-DWBA Catalogue Id: ABPB_v1_0

    Nature of problem The package SATURN-MARS-1 consists of two programs SATURN and MARS for calculating cross sections of reactions transferring nucleon(s) primarily between two heavy ions. The calculations are made within the framework of the finite-range distorted wave Born approximation(DWBA). The first part, SATURN, prepares the form factor(s) either for exact finite (EFR) or for no-recoil (NR) approach. The prepared form factor is then used by the second part MARS to calculate either EFR-DWBA or NR-DWBA cross-s ...

    Versions of this program held in the CPC repository in Mendeley Data abpb_v1_0; MARS-1-FOR-EFR-DWBA; 10.1016/0010-4655(74)90012-5

    This program has been imported from the CPC Program Library held at Queen's University Belfast (1969-2019)

  15. n

    Data from: Overcoming the challenge of small effective sample sizes in...

    • data.niaid.nih.gov
    • dataone.org
    • +2more
    zip
    Updated Sep 8, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Christen H. Fleming; Michael J. Noonan; Emilia Patricia Medici; Justin M. Calabrese (2019). Overcoming the challenge of small effective sample sizes in home-range estimation [Dataset]. http://doi.org/10.5061/dryad.16bc7f2
    Explore at:
    zipAvailable download formats
    Dataset updated
    Sep 8, 2019
    Authors
    Christen H. Fleming; Michael J. Noonan; Emilia Patricia Medici; Justin M. Calabrese
    License

    https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html

    Area covered
    Pantanal, Brazil
    Description

    Technological advances have steadily increased the detail of animal tracking datasets, yet fundamental data limitations exist for many species that cause substantial biases in home‐range estimation. Specifically, the effective sample size of a range estimate is proportional to the number of observed range crossings, not the number of sampled locations. Currently, the most accurate home‐range estimators condition on an autocorrelation model, for which the standard estimation frame‐works are based on likelihood functions, even though these methods are known to underestimate variance—and therefore ranging area—when effective sample sizes are small. Residual maximum likelihood (REML) is a widely used method for reducing bias in maximum‐likelihood (ML) variance estimation at small sample sizes. Unfortunately, we find that REML is too unstable for practical application to continuous‐time movement models. When the effective sample size N is decreased to N ≤ urn:x-wiley:2041210X:media:mee313270:mee313270-math-0001(10), which is common in tracking applications, REML undergoes a sudden divergence in variance estimation. To avoid this issue, while retaining REML’s first‐order bias correction, we derive a family of estimators that leverage REML to make a perturbative correction to ML. We also derive AIC values for REML and our estimators, including cases where model structures differ, which is not generally understood to be possible. Using both simulated data and GPS data from lowland tapir (Tapirus terrestris), we show how our perturbative estimators are more accurate than traditional ML and REML methods. Specifically, when urn:x-wiley:2041210X:media:mee313270:mee313270-math-0002(5) home‐range crossings are observed, REML is unreliable by orders of magnitude, ML home ranges are ~30% underestimated, and our perturbative estimators yield home ranges that are only ~10% underestimated. A parametric bootstrap can then reduce the ML and perturbative home‐range underestimation to ~10% and ~3%, respectively. Home‐range estimation is one of the primary reasons for collecting animal tracking data, and small effective sample sizes are a more common problem than is currently realized. The methods introduced here allow for more accurate movement‐model and home‐range estimation at small effective sample sizes, and thus fill an important role for animal movement analysis. Given REML’s widespread use, our methods may also be useful in other contexts where effective sample sizes are small.

  16. Electric Two-Wheeler Synthetic Dataset

    • kaggle.com
    zip
    Updated Jul 14, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Prajwal M Poojary (2025). Electric Two-Wheeler Synthetic Dataset [Dataset]. https://www.kaggle.com/datasets/prajwalmpoojary/electric-two-wheeler-synthetic-dataset
    Explore at:
    zip(199123 bytes)Available download formats
    Dataset updated
    Jul 14, 2025
    Authors
    Prajwal M Poojary
    License

    https://cdla.io/sharing-1-0/https://cdla.io/sharing-1-0/

    Description

    This dataset is synthetically generated to simulate realistic operational data from electric two-wheelers. It includes common parameters such as vehicle speed, battery voltage, current, state of charge, motor and ambient temperatures, regenerative braking status, and trip duration.

    The feature distributions are designed to mimic real-world driving behavior and environmental conditions typically observed in urban commuting scenarios for electric scooters and bikes.

    The dataset is primarily intended for machine learning tasks such as EV range prediction, regression modeling, exploratory data analysis (EDA), and educational purposes.

    This can serve as a baseline dataset for practicing predictive modeling workflows, building dashboards, or testing data visualization techniques.

    Dataset Columns Explained:

    • vehicle_speed_kmph: Vehicle speed in km/h.
    • battery_voltage_V: Battery voltage in volts.
    • battery_current_A: Battery current draw in amperes.
    • state_of_charge_%: Battery charge percentage.
    • motor_temperature_C: Motor temperature in Celsius.
    • ambient_temperature_C: Outside temperature during trip.
    • regen_braking_active: Whether regenerative braking was active (1 = yes, 0 = no).
    • trip_duration_min: Duration of the trip in minutes.
    • estimated_range_km: Calculated estimated range of vehicle in km.
  17. n

    Data from: Evaluating range-expansion models for calculating nonnative...

    • data.niaid.nih.gov
    • datadryad.org
    zip
    Updated Apr 28, 2015
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sonja Preuss; Matthew Low; Anna Cassel-Lundhagen; Åsa Berggren (2015). Evaluating range-expansion models for calculating nonnative species’ expansion rate [Dataset]. http://doi.org/10.5061/dryad.ns624
    Explore at:
    zipAvailable download formats
    Dataset updated
    Apr 28, 2015
    Dataset provided by
    Swedish University of Agricultural Sciences
    Authors
    Sonja Preuss; Matthew Low; Anna Cassel-Lundhagen; Åsa Berggren
    License

    https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html

    Area covered
    Sweden
    Description
    1. Species range shifts associated with environmental change or biological invasions are increasingly important study areas. However, quantifying range expansion rates may be heavily influenced by methodology and/or sampling bias. 2. We compared expansion rate estimates of Roesel’s bush-cricket (Metrioptera roeselii, Hagenbach 1822), a non-native species currently expanding its range in south-central Sweden, from range statistic models based on distance measures (mean, median, 95th gamma quantile, marginal mean, maximum and conditional maximum) and an area-based method (grid occupancy). We used sampling simulations to determine the sensitivity of the different methods to incomplete sampling across the species’ range. 3. For periods when we had comprehensive survey data, range expansion estimates clustered into two groups: (i) those calculated from range margin statistics (gamma, marginal mean, maximum and conditional maximum: ~3 km/yr), and (ii) those calculated from the central tendency (mean and median) and the area-based method of grid occupancy (~1.5 km/yr). 4. Range statistic measures differed greatly in their sensitivity to sampling effort; the proportion of sampling required to achieve an estimate within 10% of the true value ranged from 0.17-0.9. Grid occupancy and median were most sensitive to sampling effort, and the maximum and gamma quantile the least. 5. If periods with incomplete sampling were included in the range expansion calculations, this generally lowered the estimates (range 16-72%), with exception of the gamma quantile that was slightly higher (6%). 6. Care should be taken when interpreting rate expansion estimates from data sampled from only a fraction of the full distribution. Methods based on the central tendency will give rates approximately half that of methods based on the range margin. The gamma quantile method appears to be the most robust to incomplete sampling bias and should be considered as the method of choice when sampling the entire distribution is not possible.
  18. Amazon AWS Recon Data For Finding Origin IP - 93M

    • kaggle.com
    zip
    Updated Sep 17, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Chirag Artani (2023). Amazon AWS Recon Data For Finding Origin IP - 93M [Dataset]. https://www.kaggle.com/datasets/chiragartani/amazon-aws-asn-cidr-ip-to-hostname-recon-data
    Explore at:
    zip(225734462 bytes)Available download formats
    Dataset updated
    Sep 17, 2023
    Authors
    Chirag Artani
    Description

    Our mission with this project is to provide an always up-to-date and freely accessible map of the cloud landscape for every major cloud service provider.

    We've decided to kick things off with collecting SSL certificate data of AWS EC2 machines, considering the value of this data to security researchers. However, we plan to expand the project to include more data and providers in the near future. Your input and suggestions are incredibly valuable to us, so please don't hesitate to reach out on Twitter or Discord and let us know what areas you think we should prioritize next!

    How to find origin IP of any domain or subdomain inside this database?

    You can find origin IP for an example: instacart.com, Just search there instacart.com

    You can use command as well if you are using linux. Open the dataset using curl or wget and then **cd ** folder now run command: find . -type f -iname "*.csv" -print0 | xargs -0 grep "word"

    Like: find . -type f -iname "*.csv" -print0 | xargs -0 grep "instacart.com"

    Done, You will see output.

    How can SSL certificate data benefit you? The SSL data is organized into CSV files, with the following properties collected for every found certificate:

    IP Address Common Name Organization Country Locality Province Subject Alternative DNS Name Subject Alternative IP address Self-signed (boolean)

    IP Address Common Name Organization Country Locality Province Subject Alternative DNS Name Subject Alternative IP address Self-signed 1.2.3.4 example.com Example, Inc. US San Francisco California example.com 1.2.3.4 false 5.6.7.8 acme.net Acme, Inc. US Seattle Washington *.acme.net 5.6.7.8 false So what can you do with this data?

    Enumerate subdomains of your target domains Search for your target's domain names (e.g. example.com) and find hits in the Common Name and Subject Alternative Name fields of the collected certificates. All IP ranges are scanned daily and the dataset gets updated accordingly so you are very likley to find ephemeral hosts before they are taken down.

    Enumerate domains of your target companies Search for your target's company name (e.g. Example, Inc.), find hits in the Organization field, and explore the associated Common Name and Subject Alternative Name fields. The results will probably include subdomains of the domains you're familiar with and if you're in luck you might find new root domains expanding the scope.

    Enumerate possible sub-subdomain enumeration target If the certificate is issued for a wildcard (e.g. *.foo.example.com), chances are there are other subdomains you can find by brute-forcing there. And you know how effective of this technique can be. Here are some wordlists to help you with that!

    💡 Note: Remeber to monitor the dataset for daily updates to get notified whenever a new asset comes up!

    Perform IP lookups Search for an IP address (e.g. 3.122.37.147) to find host names associated with it, and explore the Common Name, Subject Alternative Name, and Organization fields to gain find more information about that address.

    Discover origin IP addresses to bypass proxy services When a website is hidden behind security proxy services like Cloudflare, Akamai, Incapsula, and others, it is possible to search for the host name (e.g., example.com) in the dataset. This search may uncover the origin IP address, allowing you to bypass the proxy. We've discussed a similar technique on our blog which you can find here!

    Get a fresh dataset of live web servers Each IP address in the dataset corresponds to an HTTPS server running on port 443. You can use this data for large-scale research without needing to spend time collecting it yourself.

    Whatever else you can think of If you use this data for a cool project or research, we would love to hear about it!

    Additionally, below you will find a detailed explanation of our data collection process and how you can implement the same technique to gather information from your own IP ranges.

    TB; DZ (Too big; didn't zoom):

    We kick off the workflow with a simple bash script that retrieves AWS's IP ranges. Using a JQ query, we extract the IP ranges of EC2 machines by filtering for .prefixes[] | select(.service=="EC2") | .ip_prefix. Other services are excluded from this workflow since they don't support custom SSL certificates, making their data irrelevant for our dataset.

    Then, we use mapcidr to divide the IP ranges obtained in step 1 into smaller ranges, each containing up to 100k hosts (Thanks, ProjectDiscovery team!). This step will be handy in the next step when we run the parallel scanning process.

    At the time of writing, the EC2 IP ranges include over 57 million IP addresses, so scanning them all on a single machine would be impractical, which is where our file-splitter node comes into play.

    This node iterates through the input from mapcidr and triggers individual jobs for each range. When executing this w...

  19. n

    Data from: Contrasting effects of host or local specialization: widespread...

    • data-staging.niaid.nih.gov
    • ourarchive.otago.ac.nz
    • +3more
    zip
    Updated Mar 13, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Daniela de Angeli Dutra; Gabriel Moreira Félix; Robert Poulin (2024). Contrasting effects of host or local specialization: widespread haemosporidians are host generalist whereas local specialists are locally abundant [Dataset]. http://doi.org/10.5061/dryad.j3tx95xfb
    Explore at:
    zipAvailable download formats
    Dataset updated
    Mar 13, 2024
    Dataset provided by
    University of Otago
    Universidade Estadual de Campinas (UNICAMP)
    Authors
    Daniela de Angeli Dutra; Gabriel Moreira Félix; Robert Poulin
    License

    https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html

    Description

    Aim: Despite the wide distribution of many parasites around the globe, the range of individual species varies significantly even among phylogenetically related taxa. Since parasites need suitable hosts to complete their development, parasite geographical and environmental ranges should be limited to communities where their hosts are found. Parasites may also suffer from a trade-off between being locally abundant or widely dispersed. We hypothesize that the geographical and environmental ranges of parasites are negatively associated to their host specificity and their local abundance. Location: Worldwide Time period: 2009 to 2021 Major taxa studied: Avian haemosporidian parasites Methods: We tested these hypotheses using a global database which comprises data on avian haemosporidian parasites from across the world. For each parasite lineage, we computed five metrics: phylogenetic host-range, environmental range, geographical range, and their mean local and total number of observations in the database. Phylogenetic generalized least squares models were ran to evaluate the influence of phylogenetic host-range and total and local abundances on geographical and environmental range. In addition, we analysed separately the two regions with the largest amount of available data: Europe and South America. Results: We evaluated 401 lineages from 757 localities and observed that generalism (i.e. phylogenetic host range) associates positively to both the parasites’ geographical and environmental ranges at global and Europe scales. For South America, generalism only associates with geographical range. Finally, mean local abundance (mean local number of parasite occurrences) was negatively related to geographical and environmental range. This pattern was detected worldwide and in South America, but not in Europe. Main Conclusions: We demonstrate that parasite specificity is linked to both their geographical and environmental ranges. The fact that locally abundant parasites present restricted ranges, indicates a trade-off between these two traits. This trade-off, however, only becomes evident when sufficient heterogeneous host communities are considered. Methods We compiled data on haemosporidian lineages from the MalAvi database (http://130.235.244.92/Malavi/ , Bensch et al. 2009) including all the data available from the “Grand Lineage Summary” representing Plasmodium and Haemoproteus genera from wild birds and that contained information regarding location. After checking for duplicated sequences, this dataset comprised a total of ~6200 sequenced parasites representing 1602 distinct lineages (775 Plasmodium and 827 Haemoproteus) collected from 1139 different host species and 757 localities from all continents except Antarctica (Supplementary figure 1, Supplementary Table 1). The parasite lineages deposited in MalAvi are based on a cyt b fragment of 478 bp. This dataset was used to calculate the parasites’ geographical, environmental and phylogenetic ranges. Geographical range All analyses in this study were performed using R version 4.02. In order to estimate the geographical range of each parasite lineage, we applied the R package “GeoRange” (Boyle, 2017) and chose the variable minimum spanning tree distance (i.e., shortest total distance of all lines connecting each locality where a particular lineage has been found). Using the function “create.matrix” from the “fossil” package, we created a matrix of lineages and coordinates and employed the function “GeoRange_MultiTaxa” to calculate the minimum spanning tree distance for each parasite lineage distance (i.e. shortest total distance in kilometers of all lines connecting each locality). Therefore, as at least two distinct sites are necessary to calculate this distance, parasites observed in a single locality could not have their geographical range estimated. For this reason, only parasites observed in two or more localities were considered in our phylogenetically controlled least squares (PGLS) models. Host and Environmental diversity Traditionally, ecologists use Shannon entropy to measure diversity in ecological assemblages (Pielou, 1966). The Shannon entropy of a set of elements is related to the degree of uncertainty someone would have about the identity of a random selected element of that set (Jost, 2006). Thus, Shannon entropy matches our intuitive notion of biodiversity, as the more diverse an assemblage is, the more uncertainty regarding to which species a randomly selected individual belongs. Shannon diversity increases with both the assemblage richness (e.g., the number of species) and evenness (e.g., uniformity in abundance among species). To compare the diversity of assemblages that vary in richness and evenness in a more intuitive manner, we can normalize diversities by Hill numbers (Chao et al., 2014b). The Hill number of an assemblage represents the effective number of species in the assemblage, i.e., the number of equally abundant species that are needed to give the same value of the diversity metric in that assemblage. Hill numbers can be extended to incorporate phylogenetic information. In such case, instead of species, we are measuring the effective number of phylogenetic entities in the assemblage. Here, we computed phylogenetic host-range as the phylogenetic Hill number associated with the assemblage of hosts found infected by a given parasite. Analyses were performed using the function “hill_phylo” from the “hillr” package (Chao et al., 2014a). Hill numbers are parameterized by a parameter “q” that determines the sensitivity of the metric to relative species abundance. Different “q” values produce Hill numbers associated with different diversity metrics. We set q = 1 to compute the Hill number associated with Shannon diversity. Here, low Hill numbers indicate specialization on a narrow phylogenetic range of hosts, whereas a higher Hill number indicates generalism across a broader phylogenetic spectrum of hosts. We also used Hill numbers to compute the environmental range of sites occupied by each parasite lineage. Firstly, we collected the 19 bioclimatic variables from WorldClim version 2 (http://www.worldclim.com/version2) for all sites used in this study (N = 713). Then, we standardized the 19 variables by centering and scaling them by their respective mean and standard deviation. Thereafter, we computed the pairwise Euclidian environmental distance among all sites and used this distance to compute a dissimilarity cluster. Finally, as for the phylogenetic Hill number, we used this dissimilarity cluster to compute the environmental Hill number of the assemblage of sites occupied by each parasite lineage. The environmental Hill number for each parasite can be interpreted as the effective number of environmental conditions in which a parasite lineage occurs. Thus, the higher the environmental Hill number, the more generalist the parasite is regarding the environmental conditions in which it can occur. Parasite phylogenetic tree A Bayesian phylogenetic reconstruction was performed. We built a tree for all parasite sequences for which we were able to estimate the parasite’s geographical, environmental and phylogenetic ranges (see above); this represented 401 distinct parasite lineages. This inference was produced using MrBayes 3.2.2 (Ronquist & Huelsenbeck, 2003) with the GTR + I + G model of nucleotide evolution, as recommended by ModelTest (Posada & Crandall, 1998), which selects the best-fit nucleotide substitution model for a set of genetic sequences. We ran four Markov chains simultaneously for a total of 7.5 million generations that were sampled every 1000 generations. The first 1250 million trees (25%) were discarded as a burn-in step and the remaining trees were used to calculate the posterior probabilities of each estimated node in the final consensus tree. Our final tree obtained a cumulative posterior probability of 0.999. Leucocytozoon caulleryi was used as the outgroup to root the phylogenetic tree as Leucocytozoon spp. represents a basal group within avian haemosporidians (Pacheco et al., 2020).

  20. Amphibian metamorphosis assays- biological & histopathological data and...

    • catalog.data.gov
    Updated Jun 15, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. Environmental Protection Agency (2025). Amphibian metamorphosis assays- biological & histopathological data and range finding studies [Dataset]. https://catalog.data.gov/dataset/amphibian-metamorphosis-assays-biological-histopathological-data-and-range-finding-studies
    Explore at:
    Dataset updated
    Jun 15, 2025
    Dataset provided by
    United States Environmental Protection Agencyhttp://www.epa.gov/
    Description

    Five chemicals [2-ethylhexyl 4-hydroxybenzoate (2-EHHB), 4-nonylphenol-branched (4-NP), 4-tert-octylphenol (4-OP), benzyl butyl phthalate (BBP) and dibutyl phthalate (DBP) were subjected to a 21-day Amphibian Metamorphosis Assay (AMA) following OCSPP 890.1100 test guidelines. The selected chemicals exhibited estrogenic or androgenic bioactivity in high throughput screening data obtained from US EPA ToxCast models. Xenopus laevis larvae were exposed nominally to each chemical at 3.6, 10.9, 33.0 and 100 µg/L, except 4-NP for which concentrations were 1.8, 5.5, 16.5 and 50 µg/L. Endpoint data (daily or given study day (SD)) collected included: mortality (daily), developmental stage (SD 7 and 21), hind limb length (HLL) (SD 7 and 21), snout-vent length (SVL) (SD 7 and 21), wet body weight (BW) (SD 7 and 21), and thyroid histopathology (SD 21). 4-OP and BBP caused accelerated development compared to controls at the mean measured concentration of 39.8 and 3.5 µg/L, respectively. Normalized HLL was increased on SD 21 for all chemicals except 4-NP. Histopathology revealed mild thyroid follicular cell hypertrophy at all BBP concentrations, while moderate thyroid follicular cell hypertrophy occurred at the 105 µg /L BBP concentration. Evidence of accelerated metamorphic development was also observed histopathologically in BBP-treated frogs at concentrations as low as 3.5 µg/L. Increased BW relative to control occurred for all chemicals except 4-OP. Increase in SVL was observed in larvae exposed to 4-NP, BBP and DBP on SD 21. With the exception of 4-NP, four of the chemicals tested appeared to alter thyroid axis-driven metamorphosis, albeit through different lines of evidence, with BBP and DBP providing the strongest evidence of effects on the thyroid axis. Citation information for this dataset can be found in Data.gov's References section.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Christen H. Fleming; Daniel Sheldon; William F. Fagan; Peter Leimgruber; Thomas Mueller; Dejid Nandintsetseg; Michael J. Noonan; Kirk A. Olson; Edy Setyawan; Abraham Sianipar; Justin M. Calabrese (2018). Correcting for missing and irregular data in home-range estimation [Dataset]. http://doi.org/10.5061/dryad.n42h0

Data from: Correcting for missing and irregular data in home-range estimation

Related Article
Explore at:
zipAvailable download formats
Dataset updated
Jan 9, 2018
Dataset provided by
Smithsonian Conservation Biology Institute
Goethe University Frankfurt
University of Massachusetts Amherst
University of Maryland, College Park
University of Tasmania
Conservation International Indonesia; Marine Program; Jalan Pejaten Barat 16A, Kemang Jakarta DKI Jakarta 12550 Indonesia
Authors
Christen H. Fleming; Daniel Sheldon; William F. Fagan; Peter Leimgruber; Thomas Mueller; Dejid Nandintsetseg; Michael J. Noonan; Kirk A. Olson; Edy Setyawan; Abraham Sianipar; Justin M. Calabrese
License

https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html

Area covered
Mongolia
Description

Home-range estimation is an important application of animal tracking data that is frequently complicated by autocorrelation, sampling irregularity, and small effective sample sizes. We introduce a novel, optimal weighting method that accounts for temporal sampling bias in autocorrelated tracking data. This method corrects for irregular and missing data, such that oversampled times are downweighted and undersampled times are upweighted to minimize error in the home-range estimate. We also introduce computationally efficient algorithms that make this method feasible with large datasets. Generally speaking, there are three situations where weight optimization improves the accuracy of home-range estimates: with marine data, where the sampling schedule is highly irregular, with duty cycled data, where the sampling schedule changes during the observation period, and when a small number of home-range crossings are observed, making the beginning and end times more independent and informative than the intermediate times. Using both simulated data and empirical examples including reef manta ray, Mongolian gazelle, and African buffalo, optimal weighting is shown to reduce the error and increase the spatial resolution of home-range estimates. With a conveniently packaged and computationally efficient software implementation, this method broadens the array of datasets with which accurate space-use assessments can be made.

Search
Clear search
Close search
Google apps
Main menu