100+ datasets found
  1. Path loss at 5G high frequency range in South Asia

    • kaggle.com
    Updated Apr 25, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    S M MEHEDI ZAMAN (2023). Path loss at 5G high frequency range in South Asia [Dataset]. https://www.kaggle.com/datasets/smmehedizaman/path-loss-at-5g-high-frequency-range-in-south-asia
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 25, 2023
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    S M MEHEDI ZAMAN
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Area covered
    South Asia, Asia
    Description

    This dataset has been generated using NYUSIM 3.0 mm-Wave channel simulator software, which takes into account atmospheric data such as rain rate, humidity, barometric pressure, and temperature. The input data was collected over the course of a year in South Asia. As a result, the dataset provides an accurate representation of the seasonal variations in mm-wave channel characteristics in these areas. The dataset includes a total of 2835 records, each of which contains T-R Separation Distance (m), Time Delay (ns), Received Power (dBm), Phase (rad), Azimuth AoD (degree), Elevation AoD (degree), Azimuth AoA (degree), Elevation, AoA (degree), RMS Delay Spread (ns), Season, Frequency and Path Loss (dB). Four main seasons have been considered in this dataset: Spring, Summer, Fall, and Winter. Each season is subdivided into three parts (i.e., low, medium, and high), to accurately include the atmospheric variations in a season. To simulate the path loss, realistic Tx and Rx height, NLoS environment, and mean human blockage attenuation effects have been taken into consideration. The data has been preprocessed and normalized to ensure consistency and ease of use. Researchers in the field of mm-wave communications and networking can use this dataset to study the impact of atmospheric conditions on mm-wave channel characteristics and develop more accurate models for predicting channel behavior. The dataset can also be used to evaluate the performance of different communication protocols and signal processing techniques under varying weather conditions. Note that while the data was collected specifically in South Asia region, the high correlation between the weather patterns in this region and other areas means that the dataset may also be applicable to other regions with similar atmospheric conditions.

    Acknowledgements The paper in which the dataset was proposed is available on: https://ieeexplore.ieee.org/abstract/document/10307972

    Citation

    If you use this dataset, please cite the following paper:

    Rashed Hasan Ratul, S. M. Mehedi Zaman, Hasib Arman Chowdhury, Md. Zayed Hassan Sagor, Mohammad Tawhid Kawser, and Mirza Muntasir Nishat, “Atmospheric Influence on the Path Loss at High Frequencies for Deployment of 5G Cellular Communication Networks,” 2023 14th International Conference on Computing Communication and Networking Technologies (ICCCNT), 2023, pp. 1–6. https://doi.org/10.1109/ICCCNT56998.2023.10307972

    BibTeX ```bibtex @inproceedings{Ratul2023Atmospheric, author = {Ratul, Rashed Hasan and Zaman, S. M. Mehedi and Chowdhury, Hasib Arman and Sagor, Md. Zayed Hassan and Kawser, Mohammad Tawhid and Nishat, Mirza Muntasir}, title = {Atmospheric Influence on the Path Loss at High Frequencies for Deployment of {5G} Cellular Communication Networks}, booktitle = {2023 14th International Conference on Computing Communication and Networking Technologies (ICCCNT)}, year = {2023}, pages = {1--6}, doi = {10.1109/ICCCNT56998.2023.10307972}, keywords = {Wireless communication; Fluctuations; Rain; 5G mobile communication; Atmospheric modeling; Simulation; Predictive models; 5G-NR; mm-wave propagation; path loss; atmospheric influence; NYUSIM; ML} }

  2. Point Cloud Mnist 2D

    • kaggle.com
    zip
    Updated Feb 12, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Cristian Garcia (2020). Point Cloud Mnist 2D [Dataset]. https://www.kaggle.com/datasets/cristiangarcia/pointcloudmnist2d/discussion
    Explore at:
    zip(34176926 bytes)Available download formats
    Dataset updated
    Feb 12, 2020
    Authors
    Cristian Garcia
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Point Cloud MNIST 2D

    This is a simple dataset for getting started with Machine Learning for point cloud data. It take the original MNIST and converts each of the non-zero pixels into points in a 2D space. The idea is to classify each collection of point (rather than images) to the same label as in the MNIST. The source for generating this dataset can be found in this repository: cgarciae/point-cloud-mnist-2D

    Format

    There are 2 files: train.csv and test.csv. Each file has the columns

    label,x0,y0,v0,x1,y1,v1,...,x350,y350,v350

    where

    • label contains the target label in the range [0, 9]
    • x{i} contain the x position of the pixel/point as viewed in a Cartesian plane in the range [-1, 27].
    • y{i} contain the y position of the pixel/point as viewed in a Cartesian plane in the range [-1, 27].
    • v{i} contain the value of the pixel in the range [-1, 255].

    Padding

    The maximum number of point found on a image was 351, images with less points where padded to this length using the following values:

    • x{i} = -1
    • y{i} = -1
    • v{i} = -1

    Subsamples

    To make the challenge more interesting you can also try to solve the problem using a subset of points, e.g. the first N. Here are some visualizations of the dataset using different amounts of points:

    50

    https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F158444%2Fbbf5393884480e3d24772344e079c898%2F50.png?generation=1579911143877077&alt=media" alt="50">

    100

    https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F158444%2F5a83f6f5f7c5791e3c1c8e9eba2d052b%2F100.png?generation=1579911238988368&alt=media" alt="100">

    200

    https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F158444%2F202098ed0da35c41ae45dfc32e865972%2F200.png?generation=1579911264286372&alt=media" alt="200">

    351

    https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F158444%2F5c733566f8d689c5e0fd300440d04da2%2Fmax.png?generation=1579911289750248&alt=media" alt="">

    Distribution

    This histogram of the distribution the number of points per image in the dataset can give you a general idea of how difficult each variation can be.

    https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F158444%2F9eb3b463f77a887dae83a7af0eb08c7d%2Flengths.png?generation=1579911380397412&alt=media" alt="">

  3. Data from: Current and projected research data storage needs of Agricultural...

    • catalog.data.gov
    • agdatacommons.nal.usda.gov
    • +2more
    Updated Apr 21, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Agricultural Research Service (2025). Current and projected research data storage needs of Agricultural Research Service researchers in 2016 [Dataset]. https://catalog.data.gov/dataset/current-and-projected-research-data-storage-needs-of-agricultural-research-service-researc-f33da
    Explore at:
    Dataset updated
    Apr 21, 2025
    Dataset provided by
    Agricultural Research Servicehttps://www.ars.usda.gov/
    Description

    The USDA Agricultural Research Service (ARS) recently established SCINet , which consists of a shared high performance computing resource, Ceres, and the dedicated high-speed Internet2 network used to access Ceres. Current and potential SCINet users are using and generating very large datasets so SCINet needs to be provisioned with adequate data storage for their active computing. It is not designed to hold data beyond active research phases. At the same time, the National Agricultural Library has been developing the Ag Data Commons, a research data catalog and repository designed for public data release and professional data curation. Ag Data Commons needs to anticipate the size and nature of data it will be tasked with handling. The ARS Web-enabled Databases Working Group, organized under the SCINet initiative, conducted a study to establish baseline data storage needs and practices, and to make projections that could inform future infrastructure design, purchases, and policies. The SCINet Web-enabled Databases Working Group helped develop the survey which is the basis for an internal report. While the report was for internal use, the survey and resulting data may be generally useful and are being released publicly. From October 24 to November 8, 2016 we administered a 17-question survey (Appendix A) by emailing a Survey Monkey link to all ARS Research Leaders, intending to cover data storage needs of all 1,675 SY (Category 1 and Category 4) scientists. We designed the survey to accommodate either individual researcher responses or group responses. Research Leaders could decide, based on their unit's practices or their management preferences, whether to delegate response to a data management expert in their unit, to all members of their unit, or to themselves collate responses from their unit before reporting in the survey. Larger storage ranges cover vastly different amounts of data so the implications here could be significant depending on whether the true amount is at the lower or higher end of the range. Therefore, we requested more detail from "Big Data users," those 47 respondents who indicated they had more than 10 to 100 TB or over 100 TB total current data (Q5). All other respondents are called "Small Data users." Because not all of these follow-up requests were successful, we used actual follow-up responses to estimate likely responses for those who did not respond. We defined active data as data that would be used within the next six months. All other data would be considered inactive, or archival. To calculate per person storage needs we used the high end of the reported range divided by 1 for an individual response, or by G, the number of individuals in a group response. For Big Data users we used the actual reported values or estimated likely values. Resources in this dataset:Resource Title: Appendix A: ARS data storage survey questions. File Name: Appendix A.pdfResource Description: The full list of questions asked with the possible responses. The survey was not administered using this PDF but the PDF was generated directly from the administered survey using the Print option under Design Survey. Asterisked questions were required. A list of Research Units and their associated codes was provided in a drop down not shown here. Resource Software Recommended: Adobe Acrobat,url: https://get.adobe.com/reader/ Resource Title: CSV of Responses from ARS Researcher Data Storage Survey. File Name: Machine-readable survey response data.csvResource Description: CSV file includes raw responses from the administered survey, as downloaded unfiltered from Survey Monkey, including incomplete responses. Also includes additional classification and calculations to support analysis. Individual email addresses and IP addresses have been removed. This information is that same data as in the Excel spreadsheet (also provided).Resource Title: Responses from ARS Researcher Data Storage Survey. File Name: Data Storage Survey Data for public release.xlsxResource Description: MS Excel worksheet that Includes raw responses from the administered survey, as downloaded unfiltered from Survey Monkey, including incomplete responses. Also includes additional classification and calculations to support analysis. Individual email addresses and IP addresses have been removed.Resource Software Recommended: Microsoft Excel,url: https://products.office.com/en-us/excel

  4. Fused Image dataset for convolutional neural Network-based crack Detection...

    • zenodo.org
    • data.niaid.nih.gov
    zip
    Updated Apr 20, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Shanglian Zhou; Shanglian Zhou; Carlos Canchila; Carlos Canchila; Wei Song; Wei Song (2023). Fused Image dataset for convolutional neural Network-based crack Detection (FIND) [Dataset]. http://doi.org/10.5281/zenodo.6383044
    Explore at:
    zipAvailable download formats
    Dataset updated
    Apr 20, 2023
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Shanglian Zhou; Shanglian Zhou; Carlos Canchila; Carlos Canchila; Wei Song; Wei Song
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The “Fused Image dataset for convolutional neural Network-based crack Detection” (FIND) is a large-scale image dataset with pixel-level ground truth crack data for deep learning-based crack segmentation analysis. It features four types of image data including raw intensity image, raw range (i.e., elevation) image, filtered range image, and fused raw image. The FIND dataset consists of 2500 image patches (dimension: 256x256 pixels) and their ground truth crack maps for each of the four data types.

    The images contained in this dataset were collected from multiple bridge decks and roadways under real-world conditions. A laser scanning device was adopted for data acquisition such that the captured raw intensity and raw range images have pixel-to-pixel location correspondence (i.e., spatial co-registration feature). The filtered range data were generated by applying frequency domain filtering to eliminate image disturbances (e.g., surface variations, and grooved patterns) from the raw range data [1]. The fused image data were obtained by combining the raw range and raw intensity data to achieve cross-domain feature correlation [2,3]. Please refer to [4] for a comprehensive benchmark study performed using the FIND dataset to investigate the impact from different types of image data on deep convolutional neural network (DCNN) performance.

    If you share or use this dataset, please cite [4] and [5] in any relevant documentation.

    In addition, an image dataset for crack classification has also been published at [6].

    References:

    [1] Shanglian Zhou, & Wei Song. (2020). Robust Image-Based Surface Crack Detection Using Range Data. Journal of Computing in Civil Engineering, 34(2), 04019054. https://doi.org/10.1061/(asce)cp.1943-5487.0000873

    [2] Shanglian Zhou, & Wei Song. (2021). Crack segmentation through deep convolutional neural networks and heterogeneous image fusion. Automation in Construction, 125. https://doi.org/10.1016/j.autcon.2021.103605

    [3] Shanglian Zhou, & Wei Song. (2020). Deep learning–based roadway crack classification with heterogeneous image data fusion. Structural Health Monitoring, 20(3), 1274-1293. https://doi.org/10.1177/1475921720948434

    [4] Shanglian Zhou, Carlos Canchila, & Wei Song. (2023). Deep learning-based crack segmentation for civil infrastructure: data types, architectures, and benchmarked performance. Automation in Construction, 146. https://doi.org/10.1016/j.autcon.2022.104678

    [5] (This dataset) Shanglian Zhou, Carlos Canchila, & Wei Song. (2022). Fused Image dataset for convolutional neural Network-based crack Detection (FIND) [Data set]. Zenodo. https://doi.org/10.5281/zenodo.6383044

    [6] Wei Song, & Shanglian Zhou. (2020). Laser-scanned roadway range image dataset (LRRD). Laser-scanned Range Image Dataset from Asphalt and Concrete Roadways for DCNN-based Crack Classification, DesignSafe-CI. https://doi.org/10.17603/ds2-bzv3-nc78

  5. housing

    • kaggle.com
    zip
    Updated Sep 22, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    HappyRautela (2023). housing [Dataset]. https://www.kaggle.com/datasets/happyrautela/housing
    Explore at:
    zip(809785 bytes)Available download formats
    Dataset updated
    Sep 22, 2023
    Authors
    HappyRautela
    Description

    The exercise after this contains questions that are based on the housing dataset.

    1. How many houses have a waterfront? a. 21000 b. 21450 c. 163 d. 173

    2. How many houses have 2 floors? a. 2692 b. 8241 c. 10680 d. 161

    3. How many houses built before 1960 have a waterfront? a. 80 b. 7309 c. 90 d. 92

    4. What is the price of the most expensive house having more than 4 bathrooms? a. 7700000 b. 187000 c. 290000 d. 399000

    5. For instance, if the ‘price’ column consists of outliers, how can you make the data clean and remove the redundancies? a. Calculate the IQR range and drop the values outside the range. b. Calculate the p-value and remove the values less than 0.05. c. Calculate the correlation coefficient of the price column and remove the values less than the correlation coefficient. d. Calculate the Z-score of the price column and remove the values less than the z-score.

    6. What are the various parameters that can be used to determine the dependent variables in the housing data to determine the price of the house? a. Correlation coefficients b. Z-score c. IQR Range d. Range of the Features

    7. If we get the r2 score as 0.38, what inferences can we make about the model and its efficiency? a. The model is 38% accurate, and shows poor efficiency. b. The model is showing 0.38% discrepancies in the outcomes. c. Low difference between observed and fitted values. d. High difference between observed and fitted values.

    8. If the metrics show that the p-value for the grade column is 0.092, what all inferences can we make about the grade column? a. Significant in presence of other variables. b. Highly significant in presence of other variables c. insignificance in presence of other variables d. None of the above

    9. If the Variance Inflation Factor value for a feature is considerably higher than the other features, what can we say about that column/feature? a. High multicollinearity b. Low multicollinearity c. Both A and B d. None of the above

  6. Simulation Data Set

    • catalog.data.gov
    • s.cnmilf.com
    Updated Nov 12, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. EPA Office of Research and Development (ORD) (2020). Simulation Data Set [Dataset]. https://catalog.data.gov/dataset/simulation-data-set
    Explore at:
    Dataset updated
    Nov 12, 2020
    Dataset provided by
    United States Environmental Protection Agencyhttp://www.epa.gov/
    Description

    These are simulated data without any identifying information or informative birth-level covariates. We also standardize the pollution exposures on each week by subtracting off the median exposure amount on a given week and dividing by the interquartile range (IQR) (as in the actual application to the true NC birth records data). The dataset that we provide includes weekly average pregnancy exposures that have already been standardized in this way while the medians and IQRs are not given. This further protects identifiability of the spatial locations used in the analysis. This dataset is not publicly accessible because: EPA cannot release personally identifiable information regarding living individuals, according to the Privacy Act and the Freedom of Information Act (FOIA). This dataset contains information about human research subjects. Because there is potential to identify individual participants and disclose personal information, either alone or in combination with other datasets, individual level data are not appropriate to post for public access. Restricted access may be granted to authorized persons by contacting the party listed. It can be accessed through the following means: File format: R workspace file; “Simulated_Dataset.RData”. Metadata (including data dictionary) • y: Vector of binary responses (1: adverse outcome, 0: control) • x: Matrix of covariates; one row for each simulated individual • z: Matrix of standardized pollution exposures • n: Number of simulated individuals • m: Number of exposure time periods (e.g., weeks of pregnancy) • p: Number of columns in the covariate design matrix • alpha_true: Vector of “true” critical window locations/magnitudes (i.e., the ground truth that we want to estimate) Code Abstract We provide R statistical software code (“CWVS_LMC.txt”) to fit the linear model of coregionalization (LMC) version of the Critical Window Variable Selection (CWVS) method developed in the manuscript. We also provide R code (“Results_Summary.txt”) to summarize/plot the estimated critical windows and posterior marginal inclusion probabilities. Description “CWVS_LMC.txt”: This code is delivered to the user in the form of a .txt file that contains R statistical software code. Once the “Simulated_Dataset.RData” workspace has been loaded into R, the text in the file can be used to identify/estimate critical windows of susceptibility and posterior marginal inclusion probabilities. “Results_Summary.txt”: This code is also delivered to the user in the form of a .txt file that contains R statistical software code. Once the “CWVS_LMC.txt” code is applied to the simulated dataset and the program has completed, this code can be used to summarize and plot the identified/estimated critical windows and posterior marginal inclusion probabilities (similar to the plots shown in the manuscript). Optional Information (complete as necessary) Required R packages: • For running “CWVS_LMC.txt”: • msm: Sampling from the truncated normal distribution • mnormt: Sampling from the multivariate normal distribution • BayesLogit: Sampling from the Polya-Gamma distribution • For running “Results_Summary.txt”: • plotrix: Plotting the posterior means and credible intervals Instructions for Use Reproducibility (Mandatory) What can be reproduced: The data and code can be used to identify/estimate critical windows from one of the actual simulated datasets generated under setting E4 from the presented simulation study. How to use the information: • Load the “Simulated_Dataset.RData” workspace • Run the code contained in “CWVS_LMC.txt” • Once the “CWVS_LMC.txt” code is complete, run “Results_Summary.txt”. Format: Below is the replication procedure for the attached data set for the portion of the analyses using a simulated data set: Data The data used in the application section of the manuscript consist of geocoded birth records from the North Carolina State Center for Health Statistics, 2005-2008. In the simulation study section of the manuscript, we simulate synthetic data that closely match some of the key features of the birth certificate data while maintaining confidentiality of any actual pregnant women. Availability Due to the highly sensitive and identifying information contained in the birth certificate data (including latitude/longitude and address of residence at delivery), we are unable to make the data from the application section publically available. However, we will make one of the simulated datasets available for any reader interested in applying the method to realistic simulated birth records data. This will also allow the user to become familiar with the required inputs of the model, how the data should be structured, and what type of output is obtained. While we cannot provide the application data here, access to the North Carolina birth records can be requested through the North Carolina State Center for Health Statistics, and requires an appropriate data use agreement. Description Permissions: These are simulated data without any identifying information or informative birth-level covariates. We also standardize the pollution exposures on each week by subtracting off the median exposure amount on a given week and dividing by the interquartile range (IQR) (as in the actual application to the true NC birth records data). The dataset that we provide includes weekly average pregnancy exposures that have already been standardized in this way while the medians and IQRs are not given. This further protects identifiability of the spatial locations used in the analysis. This dataset is associated with the following publication: Warren, J., W. Kong, T. Luben, and H. Chang. Critical Window Variable Selection: Estimating the Impact of Air Pollution on Very Preterm Birth. Biostatistics. Oxford University Press, OXFORD, UK, 1-30, (2019).

  7. SPORTS_DATA_ANALYSIS_ON_EXCEL

    • kaggle.com
    zip
    Updated Dec 12, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nil kamal Saha (2024). SPORTS_DATA_ANALYSIS_ON_EXCEL [Dataset]. https://www.kaggle.com/datasets/nilkamalsaha/sports-data-analysis-on-excel
    Explore at:
    zip(1203633 bytes)Available download formats
    Dataset updated
    Dec 12, 2024
    Authors
    Nil kamal Saha
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    PROJECT OBJECTIVE

    We are a part of XYZ Co Pvt Ltd company who is in the business of organizing the sports events at international level. Countries nominate sportsmen from different departments and our team has been given the responsibility to systematize the membership roster and generate different reports as per business requirements.

    Questions (KPIs)

    TASK 1: STANDARDIZING THE DATASET

    • Populate the FULLNAME consisting of the following fields ONLY, in the prescribed format: PREFIX FIRSTNAME LASTNAME.{Note: All UPPERCASE)
    • Get the COUNTRY NAME to which these sportsmen belong to. Make use of LOCATION sheet to get the required data
    • Populate the LANGUAGE_!poken by the sportsmen. Make use of LOCTION sheet to get the required data
    • Generate the EMAIL ADDRESS for those members, who speak English, in the prescribed format :lastname.firstnamel@xyz .org {Note: All lowercase) and for all other members, format should be lastname.firstname@xyz.com (Note: All lowercase)
    • Populate the SPORT LOCATION of the sport played by each player. Make use of SPORT sheet to get the required data

    TASK 2: DATA FORMATING

    • Display MEMBER IDas always 3 digit number {Note: 001,002 ...,D2D,..etc)
    • Format the BIRTHDATE as dd mmm'yyyy (Prescribed format example: 09 May' 1986)
    • Display the units for the WEIGHT column (Prescribed format example: 80 kg)
    • Format the SALARY to show the data In thousands. If SALARY is less than 100,000 then display data with 2 decimal places else display data with one decimal place. In both cases units should be thousands (k) e.g. 87670 -> 87.67 k and 12 250 -> 123.2 k

    TASK 3: SUMMARIZE DATA - PIVOT TABLE (Use SPORTSMEN worksheet after attempting TASK 1) • Create a PIVOT table in the worksheet ANALYSIS, starting at cell B3,with the following details:

    • In COLUMNS; Group : GENDER.
    • In ROWS; Group : COUNTRY (Note: use COUNTRY NAMES).
    • In VALUES; calculate the count of candidates from each COUNTRY and GENDER type, Remove GRAND TOTALs.

    TASK 4: SUMMARIZE DATA - EXCEL FUNCTIONS (Use SPORTSMEN worksheet after attempting TASK 1)

    • Create a SUMMARY table in the worksheet ANALYSIS,starting at cell G4, with the following details:

    • Starting from range RANGE H4; get the distinct GENDER. Use remove duplicates option and transpose the data.
    • Starting from range RANGE GS; get the distinct COUNTRY (Note: use COUNTRY NAMES).
    • In the cross table,get the count of candidates from each COUNTRY and GENDER type.

    TASK 5: GENERATE REPORT - PIVOT TABLE (Use SPORTSMEN worksheet after attempting TASK 1)

    • Create a PIVOT table report in the worksheet REPORT, starting at cell A3, with the following information:

    • Change the report layout to TABULAR form.
    • Remove expand and collapse buttons.
    • Remove GRAND TOTALs.
    • Allow user to filter the data by SPORT LOCATION.

    Process

    • Verify data for any missing values and anomalies, and sort out the same.
    • Made sure data is consistent and clean with respect to data type, data format and values used.
    • Created pivot tables according to the questions asked.
  8. Z

    Data from: FISBe: A real-world benchmark dataset for instance segmentation...

    • data.niaid.nih.gov
    • data-staging.niaid.nih.gov
    • +1more
    Updated Apr 2, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mais, Lisa; Hirsch, Peter; Managan, Claire; Kandarpa, Ramya; Rumberger, Josef Lorenz; Reinke, Annika; Maier-Hein, Lena; Ihrke, Gudrun; Kainmueller, Dagmar (2024). FISBe: A real-world benchmark dataset for instance segmentation of long-range thin filamentous structures [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_10875062
    Explore at:
    Dataset updated
    Apr 2, 2024
    Dataset provided by
    German Cancer Research Center
    Max Delbrück Center for Molecular Medicine
    Howard Hughes Medical Institute - Janelia Research Campus
    Max Delbrück Center
    Authors
    Mais, Lisa; Hirsch, Peter; Managan, Claire; Kandarpa, Ramya; Rumberger, Josef Lorenz; Reinke, Annika; Maier-Hein, Lena; Ihrke, Gudrun; Kainmueller, Dagmar
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    General

    For more details and the most up-to-date information please consult our project page: https://kainmueller-lab.github.io/fisbe.

    Summary

    A new dataset for neuron instance segmentation in 3d multicolor light microscopy data of fruit fly brains

    30 completely labeled (segmented) images

    71 partly labeled images

    altogether comprising ∼600 expert-labeled neuron instances (labeling a single neuron takes between 30-60 min on average, yet a difficult one can take up to 4 hours)

    To the best of our knowledge, the first real-world benchmark dataset for instance segmentation of long thin filamentous objects

    A set of metrics and a novel ranking score for respective meaningful method benchmarking

    An evaluation of three baseline methods in terms of the above metrics and score

    Abstract

    Instance segmentation of neurons in volumetric light microscopy images of nervous systems enables groundbreaking research in neuroscience by facilitating joint functional and morphological analyses of neural circuits at cellular resolution. Yet said multi-neuron light microscopy data exhibits extremely challenging properties for the task of instance segmentation: Individual neurons have long-ranging, thin filamentous and widely branching morphologies, multiple neurons are tightly inter-weaved, and partial volume effects, uneven illumination and noise inherent to light microscopy severely impede local disentangling as well as long-range tracing of individual neurons. These properties reflect a current key challenge in machine learning research, namely to effectively capture long-range dependencies in the data. While respective methodological research is buzzing, to date methods are typically benchmarked on synthetic datasets. To address this gap, we release the FlyLight Instance Segmentation Benchmark (FISBe) dataset, the first publicly available multi-neuron light microscopy dataset with pixel-wise annotations. In addition, we define a set of instance segmentation metrics for benchmarking that we designed to be meaningful with regard to downstream analyses. Lastly, we provide three baselines to kick off a competition that we envision to both advance the field of machine learning regarding methodology for capturing long-range data dependencies, and facilitate scientific discovery in basic neuroscience.

    Dataset documentation:

    We provide a detailed documentation of our dataset, following the Datasheet for Datasets questionnaire:

    FISBe Datasheet

    Our dataset originates from the FlyLight project, where the authors released a large image collection of nervous systems of ~74,000 flies, available for download under CC BY 4.0 license.

    Files

    fisbe_v1.0_{completely,partly}.zip

    contains the image and ground truth segmentation data; there is one zarr file per sample, see below for more information on how to access zarr files.

    fisbe_v1.0_mips.zip

    maximum intensity projections of all samples, for convenience.

    sample_list_per_split.txt

    a simple list of all samples and the subset they are in, for convenience.

    view_data.py

    a simple python script to visualize samples, see below for more information on how to use it.

    dim_neurons_val_and_test_sets.json

    a list of instance ids per sample that are considered to be of low intensity/dim; can be used for extended evaluation.

    Readme.md

    general information

    How to work with the image files

    Each sample consists of a single 3d MCFO image of neurons of the fruit fly.For each image, we provide a pixel-wise instance segmentation for all separable neurons.Each sample is stored as a separate zarr file (zarr is a file storage format for chunked, compressed, N-dimensional arrays based on an open-source specification.").The image data ("raw") and the segmentation ("gt_instances") are stored as two arrays within a single zarr file.The segmentation mask for each neuron is stored in a separate channel.The order of dimensions is CZYX.

    We recommend to work in a virtual environment, e.g., by using conda:

    conda create -y -n flylight-env -c conda-forge python=3.9conda activate flylight-env

    How to open zarr files

    Install the python zarr package:

    pip install zarr

    Opened a zarr file with:

    import zarrraw = zarr.open(, mode='r', path="volumes/raw")seg = zarr.open(, mode='r', path="volumes/gt_instances")

    optional:import numpy as npraw_np = np.array(raw)

    Zarr arrays are read lazily on-demand.Many functions that expect numpy arrays also work with zarr arrays.Optionally, the arrays can also explicitly be converted to numpy arrays.

    How to view zarr image files

    We recommend to use napari to view the image data.

    Install napari:

    pip install "napari[all]"

    Save the following Python script:

    import zarr, sys, napari

    raw = zarr.load(sys.argv[1], mode='r', path="volumes/raw")gts = zarr.load(sys.argv[1], mode='r', path="volumes/gt_instances")

    viewer = napari.Viewer(ndisplay=3)for idx, gt in enumerate(gts): viewer.add_labels( gt, rendering='translucent', blending='additive', name=f'gt_{idx}')viewer.add_image(raw[0], colormap="red", name='raw_r', blending='additive')viewer.add_image(raw[1], colormap="green", name='raw_g', blending='additive')viewer.add_image(raw[2], colormap="blue", name='raw_b', blending='additive')napari.run()

    Execute:

    python view_data.py /R9F03-20181030_62_B5.zarr

    Metrics

    S: Average of avF1 and C

    avF1: Average F1 Score

    C: Average ground truth coverage

    clDice_TP: Average true positives clDice

    FS: Number of false splits

    FM: Number of false merges

    tp: Relative number of true positives

    For more information on our selected metrics and formal definitions please see our paper.

    Baseline

    To showcase the FISBe dataset together with our selection of metrics, we provide evaluation results for three baseline methods, namely PatchPerPix (ppp), Flood Filling Networks (FFN) and a non-learnt application-specific color clustering from Duan et al..For detailed information on the methods and the quantitative results please see our paper.

    License

    The FlyLight Instance Segmentation Benchmark (FISBe) dataset is licensed under the Creative Commons Attribution 4.0 International (CC BY 4.0) license.

    Citation

    If you use FISBe in your research, please use the following BibTeX entry:

    @misc{mais2024fisbe, title = {FISBe: A real-world benchmark dataset for instance segmentation of long-range thin filamentous structures}, author = {Lisa Mais and Peter Hirsch and Claire Managan and Ramya Kandarpa and Josef Lorenz Rumberger and Annika Reinke and Lena Maier-Hein and Gudrun Ihrke and Dagmar Kainmueller}, year = 2024, eprint = {2404.00130}, archivePrefix ={arXiv}, primaryClass = {cs.CV} }

    Acknowledgments

    We thank Aljoscha Nern for providing unpublished MCFO images as well as Geoffrey W. Meissner and the entire FlyLight Project Team for valuablediscussions.P.H., L.M. and D.K. were supported by the HHMI Janelia Visiting Scientist Program.This work was co-funded by Helmholtz Imaging.

    Changelog

    There have been no changes to the dataset so far.All future change will be listed on the changelog page.

    Contributing

    If you would like to contribute, have encountered any issues or have any suggestions, please open an issue for the FISBe dataset in the accompanying github repository.

    All contributions are welcome!

  9. BLM ID Range Improvements Poly

    • catalog.data.gov
    • datasets.ai
    • +1more
    Updated Nov 11, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bureau of Land Management (2025). BLM ID Range Improvements Poly [Dataset]. https://catalog.data.gov/dataset/blm-id-range-improvements-poly-4bd25
    Explore at:
    Dataset updated
    Nov 11, 2025
    Dataset provided by
    Bureau of Land Managementhttp://www.blm.gov/
    Description

    This geodatabase of point, line and polygon features is an effort to consolidate all of the range improvement locations on BLM-managed land in Idaho into one database. Currently, the polygon feature class has some data for all of the BLM field offices except the Coeur d'Alene and Cottonwood field offices. Range improvements are structures intended to enhance rangeland resources, including wildlife, watershed, and livestock management. Examples of range improvements include water troughs, spring headboxes, culverts, fences, water pipelines, gates, wildlife guzzlers, artificial nest structures, reservoirs, developed springs, corrals, exclosures, etc. These structures were first tracked by the Bureau of Land Management (BLM) in the Job Documentation Report (JDR) System in the early 1960s, which was predominately a paper-based tracking system. In 1988 the JDRs were migrated into and replaced by the automated Range Improvement Project System (RIPS), and version 2.0 is currently being used today. It tracks inventory, status, objectives, treatment, maintenance cycle, maintenance inspection, monetary contributions and reporting. Not all range improvements are documented in the RIPS database; there may be some older range improvements that were built before the JDR tracking system was established. There also may be unauthorized projects that are not in RIPS. Official project files of paper maps, reports, NEPA documents, checklists, etc., document the status of each project and are physically kept in the office with management authority for that project area. In addition, project data is entered into the RIPS system to enable managers to access the data to track progress, run reports, analyze the data, etc. Before Geographic Information System technology most offices kept paper atlases or overlay systems that mapped the locations of the range improvements. The objective of this geodatabase is to migrate the location of historic range improvement projects into a GIS for geospatial use with other data and to centralize the range improvement data for the state. This data set is a work in progress and does not have all range improvement projects that are on BLM lands. Some field offices have not migrated their data into this database, and others are partially completed. New projects may have been built but have not been entered into the system. Historic or unauthorized projects may not have case files and are being mapped and documented as they are found. Many field offices are trying to verify the locations and status of range improvements with GPS, and locations may change or projects that have been abandoned or removed on the ground may be deleted. Attributes may be incomplete or inaccurate. This data was created using the standard for range improvements set forth in Idaho IM 2009-044, dated 6/30/2009. However, it does not have all of the fields the standard requires. Fields that are missing from the polygon feature class that are in the standard are: ALLOT_NO, POLY_TYPE, MGMT_AGCY, ADMIN_ST, and ADMIN_OFF. The polygon feature class also does not have a coincident line feature class, so some of the fields from the polygon arc feature class are included in the polygon feature class: COORD_SRC, COORD_SRC2, DEF_FET, DEF_FEAT2, ACCURACY, CREATE_DT, CREATE_BY, MODIFY_DT, MODIFY_BY, GPS_DATE, and DATAFILE. There is no National BLM standard for GIS range improvement data at this time. For more information contact us at blm_id_stateoffice@blm.gov.

  10. d

    TIGER/Line Shapefile, 2016, Series Information for the Address Range-Feature...

    • catalog.data.gov
    • s.cnmilf.com
    • +1more
    Updated Dec 2, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2020). TIGER/Line Shapefile, 2016, Series Information for the Address Range-Feature County-based Shapefile [Dataset]. https://catalog.data.gov/dataset/tiger-line-shapefile-2016-series-information-for-the-address-range-feature-county-based-shapefi
    Explore at:
    Dataset updated
    Dec 2, 2020
    Description

    The TIGER/Line shapefiles and related database files (.dbf) are an extract of selected geographic and cartographic information from the U.S. Census Bureau's Master Address File / Topologically Integrated Geographic Encoding and Referencing (MAF/TIGER) Database (MTDB). The MTDB represents a seamless national file with no overlaps or gaps between parts, however, each TIGER/Line shapefile is designed to stand alone as an independent data set, or they can be combined to cover the entire nation. The Address Ranges Feature Shapefile (ADDRFEAT.dbf) contains the geospatial edge geometry and attributes of all unsuppressed address ranges for a county or county equivalent area. The term "address range" refers to the collection of all possible structure numbers from the first structure number to the last structure number and all numbers of a specified parity in between along an edge side relative to the direction in which the edge is coded. Single-address address ranges have been suppressed to maintain the confidentiality of the addresses they describe. Multiple coincident address range feature edge records are represented in the shapefile if more than one left or right address ranges are associated to the edge. The ADDRFEAT shapefile contains a record for each address range to street name combination. Address range associated to more than one street name are also represented by multiple coincident address range feature edge records. Note that the ADDRFEAT shapefile includes all unsuppressed address ranges compared to the All Lines Shapefile (EDGES.shp) which only includes the most inclusive address range associated with each side of a street edge. The TIGER/Line shapefile contain potential address ranges, not individual addresses. The address ranges in the TIGER/Line Files are potential ranges that include the full range of possible structure numbers even though the actual structures may not exist.

  11. M

    Minnesota Pheasant Range

    • gisdata.mn.gov
    • data.wu.ac.at
    fgdb, gpkg, html +2
    Updated Sep 1, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Natural Resources Department (2022). Minnesota Pheasant Range [Dataset]. https://gisdata.mn.gov/dataset/env-pheasant-range-minnesota
    Explore at:
    fgdb, jpeg, gpkg, html, shpAvailable download formats
    Dataset updated
    Sep 1, 2022
    Dataset provided by
    Natural Resources Department
    Area covered
    Minnesota
    Description

    This dataset delineates the spatial range of wild pheasant populations in Minnesota as of 2002 by dividing the MN state boundary into 2 units: pheasant range and non-range.

  12. ECMWF Reanalysis v5

    • ecmwf.int
    application/x-grib
    Updated Dec 31, 1969
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    European Centre for Medium-Range Weather Forecasts (1969). ECMWF Reanalysis v5 [Dataset]. https://www.ecmwf.int/en/forecasts/dataset/ecmwf-reanalysis-v5
    Explore at:
    application/x-grib(1 datasets)Available download formats
    Dataset updated
    Dec 31, 1969
    Dataset authored and provided by
    European Centre for Medium-Range Weather Forecastshttp://ecmwf.int/
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    land and oceanic climate variables. The data cover the Earth on a 31km grid and resolve the atmosphere using 137 levels from the surface up to a height of 80km. ERA5 includes information about uncertainties for all variables at reduced spatial and temporal resolutions.

  13. Virginia Opossum Range - CWHR M001 [ds1799]

    • data-cdfw.opendata.arcgis.com
    • data.cnra.ca.gov
    • +5more
    Updated Mar 4, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    California Department of Fish and Wildlife (2020). Virginia Opossum Range - CWHR M001 [ds1799] [Dataset]. https://data-cdfw.opendata.arcgis.com/datasets/CDFW::virginia-opossum-range-cwhr-m001-ds1799
    Explore at:
    Dataset updated
    Mar 4, 2020
    Dataset authored and provided by
    California Department of Fish and Wildlifehttps://wildlife.ca.gov/
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Description

    Vector datasets of CWHR range maps are one component of California Wildlife Habitat Relationships (CWHR), a comprehensive information system and predictive model for Californias wildlife. The CWHR System was developed to support habitat conservation and management, land use planning, impact assessment, education, and research involving terrestrial vertebrates in California. CWHR contains information on life history, management status, geographic distribution, and habitat relationships for wildlife species known to occur regularly in California. Range maps represent the maximum, current geographic extent of each species within California. They were originally delineated at a scale of 1:5,000,000 by species-level experts and have gradually been revised at a scale of 1:1,000,000. For more information about CWHR, visit the CWHR webpage (https://www.wildlife.ca.gov/Data/CWHR). The webpage provides links to download CWHR data and user documents such as a look up table of available range maps including species code, species name, and range map revision history; a full set of CWHR GIS data; .pdf files of each range map or species life history accounts; and a User Guide.

  14. m

    USA POI & Foot Traffic Enriched Geospatial Dataset by Predik Data-Driven

    • app.mobito.io
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    USA POI & Foot Traffic Enriched Geospatial Dataset by Predik Data-Driven [Dataset]. https://app.mobito.io/data-product/usa-enriched-geospatial-framework-dataset
    Explore at:
    Area covered
    United States
    Description

    Our dataset provides detailed and precise insights into the business, commercial, and industrial aspects of any given area in the USA (Including Point of Interest (POI) Data and Foot Traffic. The dataset is divided into 150x150 sqm areas (geohash 7) and has over 50 variables. - Use it for different applications: Our combined dataset, which includes POI and foot traffic data, can be employed for various purposes. Different data teams use it to guide retailers and FMCG brands in site selection, fuel marketing intelligence, analyze trade areas, and assess company risk. Our dataset has also proven to be useful for real estate investment.- Get reliable data: Our datasets have been processed, enriched, and tested so your data team can use them more quickly and accurately.- Ideal for trainning ML models. The high quality of our geographic information layers results from more than seven years of work dedicated to the deep understanding and modeling of geospatial Big Data. Among the features that distinguished this dataset is the use of anonymized and user-compliant mobile device GPS location, enriched with other alternative and public data.- Easy to use: Our dataset is user-friendly and can be easily integrated to your current models. Also, we can deliver your data in different formats, like .csv, according to your analysis requirements. - Get personalized guidance: In addition to providing reliable datasets, we advise your analysts on their correct implementation.Our data scientists can guide your internal team on the optimal algorithms and models to get the most out of the information we provide (without compromising the security of your internal data).Answer questions like: - What places does my target user visit in a particular area? Which are the best areas to place a new POS?- What is the average yearly income of users in a particular area?- What is the influx of visits that my competition receives?- What is the volume of traffic surrounding my current POS?This dataset is useful for getting insights from industries like:- Retail & FMCG- Banking, Finance, and Investment- Car Dealerships- Real Estate- Convenience Stores- Pharma and medical laboratories- Restaurant chains and franchises- Clothing chains and franchisesOur dataset includes more than 50 variables, such as:- Number of pedestrians seen in the area.- Number of vehicles seen in the area.- Average speed of movement of the vehicles seen in the area.- Point of Interest (POIs) (in number and type) seen in the area (supermarkets, pharmacies, recreational locations, restaurants, offices, hotels, parking lots, wholesalers, financial services, pet services, shopping malls, among others). - Average yearly income range (anonymized and aggregated) of the devices seen in the area.Notes to better understand this dataset:- POI confidence means the average confidence of POIs in the area. In this case, POIs are any kind of location, such as a restaurant, a hotel, or a library. - Category confidences, for example"food_drinks_tobacco_retail_confidence" indicates how confident we are in the existence of food/drink/tobacco retail locations in the area. - We added predictions for The Home Depot and Lowe's Home Improvement stores in the dataset sample. These predictions were the result of a machine-learning model that was trained with the data. Knowing where the current stores are, we can find the most similar areas for new stores to open.How efficient is a Geohash?Geohash is a faster, cost-effective geofencing option that reduces input data load and provides actionable information. Its benefits include faster querying, reduced cost, minimal configuration, and ease of use.Geohash ranges from 1 to 12 characters. The dataset can be split into variable-size geohashes, with the default being geohash7 (150m x 150m).

  15. Job Dataset

    • kaggle.com
    zip
    Updated Sep 17, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ravender Singh Rana (2023). Job Dataset [Dataset]. https://www.kaggle.com/datasets/ravindrasinghrana/job-description-dataset
    Explore at:
    zip(479575920 bytes)Available download formats
    Dataset updated
    Sep 17, 2023
    Authors
    Ravender Singh Rana
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Job Dataset

    This dataset provides a comprehensive collection of synthetic job postings to facilitate research and analysis in the field of job market trends, natural language processing (NLP), and machine learning. Created for educational and research purposes, this dataset offers a diverse set of job listings across various industries and job types.

    Descriptions for each of the columns in the dataset:

    1. Job Id: A unique identifier for each job posting.
    2. Experience: The required or preferred years of experience for the job.
    3. Qualifications: The educational qualifications needed for the job.
    4. Salary Range: The range of salaries or compensation offered for the position.
    5. Location: The city or area where the job is located.
    6. Country: The country where the job is located.
    7. Latitude: The latitude coordinate of the job location.
    8. Longitude: The longitude coordinate of the job location.
    9. Work Type: The type of employment (e.g., full-time, part-time, contract).
    10. Company Size: The approximate size or scale of the hiring company.
    11. Job Posting Date: The date when the job posting was made public.
    12. Preference: Special preferences or requirements for applicants (e.g., Only Male or Only Female, or Both)
    13. Contact Person: The name of the contact person or recruiter for the job.
    14. Contact: Contact information for job inquiries.
    15. Job Title: The job title or position being advertised.
    16. Role: The role or category of the job (e.g., software developer, marketing manager).
    17. Job Portal: The platform or website where the job was posted.
    18. Job Description: A detailed description of the job responsibilities and requirements.
    19. Benefits: Information about benefits offered with the job (e.g., health insurance, retirement plans).
    20. Skills: The skills or qualifications required for the job.
    21. Responsibilities: Specific responsibilities and duties associated with the job.
    22. Company Name: The name of the hiring company.
    23. Company Profile: A brief overview of the company's background and mission.

    Potential Use Cases:

    • Building predictive models to forecast job market trends.
    • Enhancing job recommendation systems for job seekers.
    • Developing NLP models for resume parsing and job matching.
    • Analyzing regional job market disparities and opportunities.
    • Exploring salary prediction models for various job roles.

    Acknowledgements:

    We would like to express our gratitude to the Python Faker library for its invaluable contribution to the dataset generation process. Additionally, we appreciate the guidance provided by ChatGPT in fine-tuning the dataset, ensuring its quality, and adhering to ethical standards.

    Note:

    Please note that the examples provided are fictional and for illustrative purposes. You can tailor the descriptions and examples to match the specifics of your dataset. It is not suitable for real-world applications and should only be used within the scope of research and experimentation. You can also reach me via email at: rrana157@gmail.com

  16. m

    THVD (Talking Head Video Dataset)

    • data.mendeley.com
    Updated Apr 29, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mario Peedor (2025). THVD (Talking Head Video Dataset) [Dataset]. http://doi.org/10.17632/ykhw8r7bfx.2
    Explore at:
    Dataset updated
    Apr 29, 2025
    Authors
    Mario Peedor
    License

    Attribution-NonCommercial 3.0 (CC BY-NC 3.0)https://creativecommons.org/licenses/by-nc/3.0/
    License information was derived automatically

    Description

    About

    We provide a comprehensive talking-head video dataset with over 50,000 videos, totaling more than 500 hours of footage and featuring 20,841 unique identities from around the world.

    Distribution

    Detailing the format, size, and structure of the dataset: Data Volume: -Total Size: 2.7TB

    -Total Videos: 47,547

    -Identities Covered: 20,841

    -Resolution: 60% 4k(1980), 33% fullHD(1080)

    -Formats: MP4

    -Full-length videos with visible mouth movements in every frame.

    -Minimum face size of 400 pixels.

    -Video durations range from 20 seconds to 5 minutes.

    -Faces have not been cut out, full screen videos including backgrounds.

    Usage

    This dataset is ideal for a variety of applications:

    Face Recognition & Verification: Training and benchmarking facial recognition models.

    Action Recognition: Identifying human activities and behaviors.

    Re-Identification (Re-ID): Tracking identities across different videos and environments.

    Deepfake Detection: Developing methods to detect manipulated videos.

    Generative AI: Training high-resolution video generation models.

    Lip Syncing Applications: Enhancing AI-driven lip-syncing models for dubbing and virtual avatars.

    Background AI Applications: Developing AI models for automated background replacement, segmentation, and enhancement.

    Coverage

    Explaining the scope and coverage of the dataset:

    Geographic Coverage: Worldwide

    Time Range: Time range and size of the videos have been noted in the CSV file.

    Demographics: Includes information about age, gender, ethnicity, format, resolution, and file size.

    Languages Covered (Videos):

    English: 23,038 videos

    Portuguese: 1,346 videos

    Spanish: 677 videos

    Norwegian: 1,266 videos

    Swedish: 1,056 videos

    Korean: 848 videos

    Polish: 1,807 videos

    Indonesian: 1,163 videos

    French: 1,102 videos

    German: 1,276 videos

    Japanese: 1,433 videos

    Dutch: 1,666 videos

    Indian: 1,163 videos

    Czech: 590 videos

    Chinese: 685 videos

    Italian: 975 videos

    Philipeans: 920 videos

    Bulgaria: 340 videos

    Romanian: 1144 videos

    Arabic: 1691 videos

    Who Can Use It

    List examples of intended users and their use cases:

    Data Scientists: Training machine learning models for video-based AI applications.

    Researchers: Studying human behavior, facial analysis, or video AI advancements.

    Businesses: Developing facial recognition systems, video analytics, or AI-driven media applications.

    Additional Notes

    Ensure ethical usage and compliance with privacy regulations. The dataset’s quality and scale make it valuable for high-performance AI training. Potential preprocessing (cropping, down sampling) may be needed for different use cases. Dataset has not been completed yet and expands daily, please contact for most up to date CSV file. The dataset has been divided into 100GB zipped files and is hosted on a private server (with the option to upload to the cloud if needed). To verify the dataset's quality, please contact me for the full CSV file.

  17. Brown Creeper Range - CWHR B364 [ds1593]

    • data.cnra.ca.gov
    • data.ca.gov
    • +5more
    Updated Feb 15, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    California Department of Fish and Wildlife (2020). Brown Creeper Range - CWHR B364 [ds1593] [Dataset]. https://data.cnra.ca.gov/dataset/brown-creeper-range-cwhr-b364-ds1593
    Explore at:
    arcgis geoservices rest api, kml, zip, csv, geojson, htmlAvailable download formats
    Dataset updated
    Feb 15, 2020
    Dataset authored and provided by
    California Department of Fish and Wildlifehttps://wildlife.ca.gov/
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Vector datasets of CWHR range maps are one component of California Wildlife Habitat Relationships (CWHR), a comprehensive information system and predictive model for Californias wildlife. The CWHR System was developed to support habitat conservation and management, land use planning, impact assessment, education, and research involving terrestrial vertebrates in California. CWHR contains information on life history, management status, geographic distribution, and habitat relationships for wildlife species known to occur regularly in California. Range maps represent the maximum, current geographic extent of each species within California. They were originally delineated at a scale of 1:5,000,000 by species-level experts and have gradually been revised at a scale of 1:1,000,000. For more information about CWHR, visit the CWHR webpage (https://www.wildlife.ca.gov/Data/CWHR). The webpage provides links to download CWHR data and user documents such as a look up table of available range maps including species code, species name, and range map revision history; a full set of CWHR GIS data; .pdf files of each range map or species life history accounts; and a User Guide.

  18. Street Video Dataset

    • kaggle.com
    zip
    Updated Oct 18, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Macgence (2025). Street Video Dataset [Dataset]. https://www.kaggle.com/datasets/macgence/street-video-dataset
    Explore at:
    zip(80128 bytes)Available download formats
    Dataset updated
    Oct 18, 2025
    Authors
    Macgence
    Description

    Improve your Computer Vision models using our extensive collection of Street Video Dataset along the street from individuals. This dataset covers a broad range of demographics and scenarios, which will enhance the accuracy of facial recognition, Video Recognition features in your models. This specialized collection of Video data is meticulously curated to support research and development in the construction industry. This dataset provides a rich resource for training and evaluation purposes.

    Metadata Availability: Insights into Participant Details

    Each participant is accompanied by comprehensive metadata, which includes detailed information about their age, gender, location. Furthermore, this metadata encompasses details such as domain, topic, type, and outcome, providing valuable insights for both model development and evaluation purposes.

    Specifications:

    Type: Video Volume: 3000 Industry: Video Recognition File Format: MP4 Gender Distribution: 50/50 Age Range: 18 – 65

    These technical specifications ensure compatibility and optimal performance for a wide range of AI development applications.

    Insights into Image Data:

    The dataset comprises 3000 high-quality Video. Created through collaboration with a network of experts, it captures realistic, ensuring a balanced representation age, gender and demographics.

    License:

    Exclusively curated by Macgence, this Video dataset is available for commercial use, empowering AI developers.

    Updates and Customization:

    Consistent updates with fresh Video recorded in varied real-world scenarios guarantee ongoing relevance and precision. We offer customization options such as adjusting samples and providing datasets tailored to your specific criteria and needs.

    Looking for high-quality datasets to train your AI model? Contact us today to get the dataset you need—fast, reliable, and ready for deployment!

  19. d

    Street Network Database SND

    • catalog.data.gov
    • data.seattle.gov
    • +2more
    Updated Oct 4, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    City of Seattle ArcGIS Online (2025). Street Network Database SND [Dataset]. https://catalog.data.gov/dataset/street-network-database-snd-1712b
    Explore at:
    Dataset updated
    Oct 4, 2025
    Dataset provided by
    City of Seattle ArcGIS Online
    Description

    The pathway representation consists of segments and intersection elements. A segment is a linear graphic element that represents a continuous physical travel path terminated by path end (dead end) or physical intersection with other travel paths. Segments have one street name, one address range and one set of segment characteristics. A segment may have none or multiple alias street names. Segment types included are Freeways, Highways, Streets, Alleys (named only), Railroads, Walkways, and Bike lanes. SNDSEG_PV is a linear feature class representing the SND Segment Feature, with attributes for Street name, Address Range, Alias Street name and segment Characteristics objects. Part of the Address Range and all of Street name objects are logically shared with the Discrete Address Point-Master Address File layer. Appropriate uses include: Cartography - Used to depict the City's transportation network location and connections, typically on smaller scaled maps or images where a single line representation is appropriate. Used to depict specific classifications of roadway use, also typically at smaller scales. Used to label transportation network feature names typically on larger scaled maps. Used to label address ranges with associated transportation network features typically on larger scaled maps. Geocode reference - Used as a source for derived reference data for address validation and theoretical address location Address Range data repository - This data store is the City's address range repository defining address ranges in association with transportation network features. Polygon boundary reference - Used to define various area boundaries is other feature classes where coincident with the transportation network. Does not contain polygon features. Address based extracts - Used to create flat-file extracts typically indexed by address with reference to business data typically associated with transportation network features. Thematic linear location reference - By providing unique, stable identifiers for each linear feature, thematic data is associated to specific transportation network features via these identifiers. Thematic intersection location reference - By providing unique, stable identifiers for each intersection feature, thematic data is associated to specific transportation network features via these identifiers. Network route tracing - Used as source for derived reference data used to determine point to point travel paths or determine optimal stop allocation along a travel path. Topological connections with segments - Used to provide a specific definition of location for each transportation network feature. Also provides a specific definition of connection between each transportation network feature. (defines where the streets are and the relationship between them ie. 4th Ave is west of 5th Ave and 4th Ave does intersect with Cherry St) Event location reference - Used as source for derived reference data used to locate event and linear referencing.Data source is TRANSPO.SNDSEG_PV. Updated weekly.

  20. Hoary Bat Range - CWHR M034 [ds1830]

    • data.ca.gov
    • data.cnra.ca.gov
    • +5more
    Updated Nov 5, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    California Department of Fish and Wildlife (2025). Hoary Bat Range - CWHR M034 [ds1830] [Dataset]. https://data.ca.gov/dataset/hoary-bat-range-cwhr-m034-ds1830
    Explore at:
    zip, html, kml, csv, geojson, arcgis geoservices rest apiAvailable download formats
    Dataset updated
    Nov 5, 2025
    Dataset authored and provided by
    California Department of Fish and Wildlifehttps://wildlife.ca.gov/
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Vector datasets of CWHR range maps are one component of California Wildlife Habitat Relationships (CWHR), a comprehensive information system and predictive model for California's wildlife. The CWHR System was developed to support habitat conservation and management, land use planning, impact assessment, education, and research involving terrestrial vertebrates in California. CWHR contains information on life history, management status, geographic distribution, and habitat relationships for wildlife species known to occur regularly in California. Range maps represent the maximum, current geographic extent of each species within California. They were originally delineated at a scale of 1:5,000,000 by species-level experts and have gradually been revised at a scale of 1:1,000,000. For more information about CWHR, visit the CWHR webpage (https://www.wildlife.ca.gov/Data/CWHR). The webpage provides links to download CWHR data and user documents such as a look up table of available range maps including species code, species name, and range map revision history; a full set of CWHR GIS data; .pdf files of each range map or species life history accounts; and a User Guide.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
S M MEHEDI ZAMAN (2023). Path loss at 5G high frequency range in South Asia [Dataset]. https://www.kaggle.com/datasets/smmehedizaman/path-loss-at-5g-high-frequency-range-in-south-asia
Organization logo

Path loss at 5G high frequency range in South Asia

Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Apr 25, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
S M MEHEDI ZAMAN
License

Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically

Area covered
South Asia, Asia
Description

This dataset has been generated using NYUSIM 3.0 mm-Wave channel simulator software, which takes into account atmospheric data such as rain rate, humidity, barometric pressure, and temperature. The input data was collected over the course of a year in South Asia. As a result, the dataset provides an accurate representation of the seasonal variations in mm-wave channel characteristics in these areas. The dataset includes a total of 2835 records, each of which contains T-R Separation Distance (m), Time Delay (ns), Received Power (dBm), Phase (rad), Azimuth AoD (degree), Elevation AoD (degree), Azimuth AoA (degree), Elevation, AoA (degree), RMS Delay Spread (ns), Season, Frequency and Path Loss (dB). Four main seasons have been considered in this dataset: Spring, Summer, Fall, and Winter. Each season is subdivided into three parts (i.e., low, medium, and high), to accurately include the atmospheric variations in a season. To simulate the path loss, realistic Tx and Rx height, NLoS environment, and mean human blockage attenuation effects have been taken into consideration. The data has been preprocessed and normalized to ensure consistency and ease of use. Researchers in the field of mm-wave communications and networking can use this dataset to study the impact of atmospheric conditions on mm-wave channel characteristics and develop more accurate models for predicting channel behavior. The dataset can also be used to evaluate the performance of different communication protocols and signal processing techniques under varying weather conditions. Note that while the data was collected specifically in South Asia region, the high correlation between the weather patterns in this region and other areas means that the dataset may also be applicable to other regions with similar atmospheric conditions.

Acknowledgements The paper in which the dataset was proposed is available on: https://ieeexplore.ieee.org/abstract/document/10307972

Citation

If you use this dataset, please cite the following paper:

Rashed Hasan Ratul, S. M. Mehedi Zaman, Hasib Arman Chowdhury, Md. Zayed Hassan Sagor, Mohammad Tawhid Kawser, and Mirza Muntasir Nishat, “Atmospheric Influence on the Path Loss at High Frequencies for Deployment of 5G Cellular Communication Networks,” 2023 14th International Conference on Computing Communication and Networking Technologies (ICCCNT), 2023, pp. 1–6. https://doi.org/10.1109/ICCCNT56998.2023.10307972

BibTeX ```bibtex @inproceedings{Ratul2023Atmospheric, author = {Ratul, Rashed Hasan and Zaman, S. M. Mehedi and Chowdhury, Hasib Arman and Sagor, Md. Zayed Hassan and Kawser, Mohammad Tawhid and Nishat, Mirza Muntasir}, title = {Atmospheric Influence on the Path Loss at High Frequencies for Deployment of {5G} Cellular Communication Networks}, booktitle = {2023 14th International Conference on Computing Communication and Networking Technologies (ICCCNT)}, year = {2023}, pages = {1--6}, doi = {10.1109/ICCCNT56998.2023.10307972}, keywords = {Wireless communication; Fluctuations; Rain; 5G mobile communication; Atmospheric modeling; Simulation; Predictive models; 5G-NR; mm-wave propagation; path loss; atmospheric influence; NYUSIM; ML} }

Search
Clear search
Close search
Google apps
Main menu