Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The “Fused Image dataset for convolutional neural Network-based crack Detection” (FIND) is a large-scale image dataset with pixel-level ground truth crack data for deep learning-based crack segmentation analysis. It features four types of image data including raw intensity image, raw range (i.e., elevation) image, filtered range image, and fused raw image. The FIND dataset consists of 2500 image patches (dimension: 256x256 pixels) and their ground truth crack maps for each of the four data types.
The images contained in this dataset were collected from multiple bridge decks and roadways under real-world conditions. A laser scanning device was adopted for data acquisition such that the captured raw intensity and raw range images have pixel-to-pixel location correspondence (i.e., spatial co-registration feature). The filtered range data were generated by applying frequency domain filtering to eliminate image disturbances (e.g., surface variations, and grooved patterns) from the raw range data [1]. The fused image data were obtained by combining the raw range and raw intensity data to achieve cross-domain feature correlation [2,3]. Please refer to [4] for a comprehensive benchmark study performed using the FIND dataset to investigate the impact from different types of image data on deep convolutional neural network (DCNN) performance.
If you share or use this dataset, please cite [4] and [5] in any relevant documentation.
In addition, an image dataset for crack classification has also been published at [6].
References:
[1] Shanglian Zhou, & Wei Song. (2020). Robust Image-Based Surface Crack Detection Using Range Data. Journal of Computing in Civil Engineering, 34(2), 04019054. https://doi.org/10.1061/(asce)cp.1943-5487.0000873
[2] Shanglian Zhou, & Wei Song. (2021). Crack segmentation through deep convolutional neural networks and heterogeneous image fusion. Automation in Construction, 125. https://doi.org/10.1016/j.autcon.2021.103605
[3] Shanglian Zhou, & Wei Song. (2020). Deep learning–based roadway crack classification with heterogeneous image data fusion. Structural Health Monitoring, 20(3), 1274-1293. https://doi.org/10.1177/1475921720948434
[4] Shanglian Zhou, Carlos Canchila, & Wei Song. (2023). Deep learning-based crack segmentation for civil infrastructure: data types, architectures, and benchmarked performance. Automation in Construction, 146. https://doi.org/10.1016/j.autcon.2022.104678
[5] (This dataset) Shanglian Zhou, Carlos Canchila, & Wei Song. (2022). Fused Image dataset for convolutional neural Network-based crack Detection (FIND) [Data set]. Zenodo. https://doi.org/10.5281/zenodo.6383044
[6] Wei Song, & Shanglian Zhou. (2020). Laser-scanned roadway range image dataset (LRRD). Laser-scanned Range Image Dataset from Asphalt and Concrete Roadways for DCNN-based Crack Classification, DesignSafe-CI. https://doi.org/10.17603/ds2-bzv3-nc78
Facebook
TwitterImprove your Computer Vision models using our extensive collection of Street Video Dataset along the street from individuals. This dataset covers a broad range of demographics and scenarios, which will enhance the accuracy of facial recognition, Video Recognition features in your models. This specialized collection of Video data is meticulously curated to support research and development in the construction industry. This dataset provides a rich resource for training and evaluation purposes.
Each participant is accompanied by comprehensive metadata, which includes detailed information about their age, gender, location. Furthermore, this metadata encompasses details such as domain, topic, type, and outcome, providing valuable insights for both model development and evaluation purposes.
Type: Video Volume: 3000 Industry: Video Recognition File Format: MP4 Gender Distribution: 50/50 Age Range: 18 – 65
These technical specifications ensure compatibility and optimal performance for a wide range of AI development applications.
The dataset comprises 3000 high-quality Video. Created through collaboration with a network of experts, it captures realistic, ensuring a balanced representation age, gender and demographics.
Exclusively curated by Macgence, this Video dataset is available for commercial use, empowering AI developers.
Consistent updates with fresh Video recorded in varied real-world scenarios guarantee ongoing relevance and precision. We offer customization options such as adjusting samples and providing datasets tailored to your specific criteria and needs.
Looking for high-quality datasets to train your AI model? Contact us today to get the dataset you need—fast, reliable, and ready for deployment!
Facebook
TwitterDescription: This dataset (Version 10) contains a collection of research papers along with various attributes and metadata. It is a comprehensive and diverse dataset that can be used for a wide range of research and analysis tasks. The dataset encompasses papers from different fields of study, including computer science, mathematics, physics, and more.
Fields in the Dataset: - id: A unique identifier for each paper. - title: The title of the research paper. - authors: The list of authors involved in the paper. - venue: The journal or venue where the paper was published. - year: The year when the paper was published. - n_citation: The number of citations received by the paper. - references: A list of paper IDs that are cited by the current paper. - abstract: The abstract of the paper.
Example: - "id": "013ea675-bb58-42f8-a423-f5534546b2b1", - "title": "Prediction of consensus binding mode geometries for related chemical series of positive allosteric modulators of adenosine and muscarinic acetylcholine receptors", - "authors": ["Leon A. Sakkal", "Kyle Z. Rajkowski", "Roger S. Armen"], - "venue": "Journal of Computational Chemistry", - "year": 2017, - "n_citation": 0, - "references": ["4f4f200c-0764-4fef-9718-b8bccf303dba", "aa699fbf-fabe-40e4-bd68-46eaf333f7b1"], - "abstract": "This paper studies ..."
Facebook
TwitterThis point feature class contains 81,481 points arranged in a 270-meter spaced grid that covers the Spring Mountains and Sheep Range in Clark County, Nevada. Points are attributed with hydroclimate variables and ancillary data compiled to support efforts to characterize ecological zones.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Context
The dataset tabulates the population of South Range by gender across 18 age groups. It lists the male and female population in each age group along with the gender ratio for South Range. The dataset can be utilized to understand the population distribution of South Range by gender and age. For example, using this dataset, we can identify the largest age group for both Men and Women in South Range. Additionally, it can be used to see how the gender ratio changes from birth to senior most age group and male to female ratio across each age group for South Range.
Key observations
Largest age group (population): Male # 20-24 years (49) | Female # 20-24 years (50). Source: U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates.
When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates.
Age groups:
Scope of gender :
Please note that American Community Survey asks a question about the respondents current sex, but not about gender, sexual orientation, or sex at birth. The question is intended to capture data for biological sex, not gender. Respondents are supposed to respond with the answer as either of Male or Female. Our research and this dataset mirrors the data reported as Male and Female for gender distribution analysis.
Variables / Data Columns
Good to know
Margin of Error
Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.
Custom data
If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.
Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.
This dataset is a part of the main dataset for South Range Population by Gender. You can refer the same here
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
File name definitions:
'...v_50_175_250_300...' - dataset for velocity ranges [50, 175] + [250, 300] m/s
'...v_175_250...' - dataset for velocity range [175, 250] m/s
'ANNdevelop...' - used to perform 9 parametric sub-analyses where, in each one, many ANNs are developed (trained, validated and tested) and the one yielding the best results is selected
'ANNtest...' - used to test the best ANN from each aforementioned parametric sub-analysis, aiming to find the best ANN model; this dataset includes the 'ANNdevelop...' counterpart
Where to find the input (independent) and target (dependent) variable values for each dataset/excel ?
input values in 'IN' sheet
target values in 'TARGET' sheet
Where to find the results from the best ANN model (for each target/output variable and each velocity range)?
open the corresponding excel file and the expected (target) vs ANN (output) results are written in 'TARGET vs OUTPUT' sheet
Check reference below (to be added when the paper is published)
https://www.researchgate.net/publication/328849817_11_Neural_Networks_-_Max_Disp_-_Railway_Beams
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This file contains the ABO Product Images Dataset, consisting of 398,212 product images organized into 256 folders. Each folder represents a unique product with multiple images capturing different angles and views. The dataset covers a wide range of product categories and is suitable for research in product recognition, visual similarity analysis, and large-scale e-commerce inventory management.
Categories
The dataset spans a wide variety of product types commonly found in online retail, such as:
Apparel and accessories
Home and kitchen items
Electronics and gadgets
Personal care and beauty products
Tools and hardware
Sports and outdoor equipment
Toys and hobby items
Attributes
Products and their images exhibit a range of real-world visual attributes, including:
Color variations (neutral, vibrant, patterned)
Material types (plastic, metal, textile, glass, wood)
Shape and structural differences (cylindrical, box-shaped, curved, flat)
Angles of capture (front, back, side, top, perspective)
Scale differences between images of the same product
Varied lighting conditions
Background types (studio-style, neutral backgrounds, real-world capture)
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset contains 2,000 rows of data from coffee shops, offering detailed insights into factors that influence daily revenue. It includes key operational and environmental variables that provide a comprehensive view of how business activities and external conditions affect sales performance. Designed for use in predictive analytics and business optimization, this dataset is a valuable resource for anyone looking to understand the relationship between customer behavior, operational decisions, and revenue generation in the food and beverage industry.
The dataset features a variety of columns that capture the operational details of coffee shops, including customer activity, store operations, and external factors such as marketing spend and location foot traffic.
Number of Customers Per Day
Average Order Value ($)
Operating Hours Per Day
Number of Employees
Marketing Spend Per Day ($)
Location Foot Traffic (people/hour)
The dataset spans a wide variety of operational scenarios, from small neighborhood coffee shops with limited traffic to larger, high-traffic locations with extensive marketing budgets. This variety allows for exploring different predictive modeling strategies. Key insights that can be derived from the data include:
The dataset offers a wide range of applications, especially in predictive analytics, business optimization, and forecasting:
For coffee shop owners, managers, and analysts in the food and beverage industry, this dataset provides an essential tool for refining daily operations and boosting profitability. Insights gained from this data can help:
This dataset is also ideal for aspiring data scientists and machine learning practitioners looking to apply their skills to real-world business problems in the food and beverage sector.
The Coffee Shop Revenue Prediction Dataset is a versatile and comprehensive resource for understanding the dynamics of daily sales performance in coffee shops. With a focus on key operational factors, it is perfect for building predictive models, ...
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description of the Credit Card Eligibility Data: Determining Factors
The Credit Card Eligibility Dataset: Determining Factors is a comprehensive collection of variables aimed at understanding the factors that influence an individual's eligibility for a credit card. This dataset encompasses a wide range of demographic, financial, and personal attributes that are commonly considered by financial institutions when assessing an individual's suitability for credit.
Each row in the dataset represents a unique individual, identified by a unique ID, with associated attributes ranging from basic demographic information such as gender and age, to financial indicators like total income and employment status. Additionally, the dataset includes variables related to familial status, housing, education, and occupation, providing a holistic view of the individual's background and circumstances.
| Variable | Description |
|---|---|
| ID | An identifier for each individual (customer). |
| Gender | The gender of the individual. |
| Own_car | A binary feature indicating whether the individual owns a car. |
| Own_property | A binary feature indicating whether the individual owns a property. |
| Work_phone | A binary feature indicating whether the individual has a work phone. |
| Phone | A binary feature indicating whether the individual has a phone. |
| A binary feature indicating whether the individual has provided an email address. | |
| Unemployed | A binary feature indicating whether the individual is unemployed. |
| Num_children | The number of children the individual has. |
| Num_family | The total number of family members. |
| Account_length | The length of the individual's account with a bank or financial institution. |
| Total_income | The total income of the individual. |
| Age | The age of the individual. |
| Years_employed | The number of years the individual has been employed. |
| Income_type | The type of income (e.g., employed, self-employed, etc.). |
| Education_type | The education level of the individual. |
| Family_status | The family status of the individual. |
| Housing_type | The type of housing the individual lives in. |
| Occupation_type | The type of occupation the individual is engaged in. |
| Target | The target variable for the classification task, indicating whether the individual is eligible for a credit card or not (e.g., Yes/No, 1/0). |
Researchers, analysts, and financial institutions can leverage this dataset to gain insights into the key factors influencing credit card eligibility and to develop predictive models that assist in automating the credit assessment process. By understanding the relationship between various attributes and credit card eligibility, stakeholders can make more informed decisions, improve risk assessment strategies, and enhance customer targeting and segmentation efforts.
This dataset is valuable for a wide range of applications within the financial industry, including credit risk management, customer relationship management, and marketing analytics. Furthermore, it provides a valuable resource for academic research and educational purposes, enabling students and researchers to explore the intricate dynamics of credit card eligibility determination.
Facebook
TwitterThe USDA Agricultural Research Service (ARS) recently established SCINet , which consists of a shared high performance computing resource, Ceres, and the dedicated high-speed Internet2 network used to access Ceres. Current and potential SCINet users are using and generating very large datasets so SCINet needs to be provisioned with adequate data storage for their active computing. It is not designed to hold data beyond active research phases. At the same time, the National Agricultural Library has been developing the Ag Data Commons, a research data catalog and repository designed for public data release and professional data curation. Ag Data Commons needs to anticipate the size and nature of data it will be tasked with handling. The ARS Web-enabled Databases Working Group, organized under the SCINet initiative, conducted a study to establish baseline data storage needs and practices, and to make projections that could inform future infrastructure design, purchases, and policies. The SCINet Web-enabled Databases Working Group helped develop the survey which is the basis for an internal report. While the report was for internal use, the survey and resulting data may be generally useful and are being released publicly. From October 24 to November 8, 2016 we administered a 17-question survey (Appendix A) by emailing a Survey Monkey link to all ARS Research Leaders, intending to cover data storage needs of all 1,675 SY (Category 1 and Category 4) scientists. We designed the survey to accommodate either individual researcher responses or group responses. Research Leaders could decide, based on their unit's practices or their management preferences, whether to delegate response to a data management expert in their unit, to all members of their unit, or to themselves collate responses from their unit before reporting in the survey. Larger storage ranges cover vastly different amounts of data so the implications here could be significant depending on whether the true amount is at the lower or higher end of the range. Therefore, we requested more detail from "Big Data users," those 47 respondents who indicated they had more than 10 to 100 TB or over 100 TB total current data (Q5). All other respondents are called "Small Data users." Because not all of these follow-up requests were successful, we used actual follow-up responses to estimate likely responses for those who did not respond. We defined active data as data that would be used within the next six months. All other data would be considered inactive, or archival. To calculate per person storage needs we used the high end of the reported range divided by 1 for an individual response, or by G, the number of individuals in a group response. For Big Data users we used the actual reported values or estimated likely values. Resources in this dataset:Resource Title: Appendix A: ARS data storage survey questions. File Name: Appendix A.pdfResource Description: The full list of questions asked with the possible responses. The survey was not administered using this PDF but the PDF was generated directly from the administered survey using the Print option under Design Survey. Asterisked questions were required. A list of Research Units and their associated codes was provided in a drop down not shown here. Resource Software Recommended: Adobe Acrobat,url: https://get.adobe.com/reader/ Resource Title: CSV of Responses from ARS Researcher Data Storage Survey. File Name: Machine-readable survey response data.csvResource Description: CSV file includes raw responses from the administered survey, as downloaded unfiltered from Survey Monkey, including incomplete responses. Also includes additional classification and calculations to support analysis. Individual email addresses and IP addresses have been removed. This information is that same data as in the Excel spreadsheet (also provided).Resource Title: Responses from ARS Researcher Data Storage Survey. File Name: Data Storage Survey Data for public release.xlsxResource Description: MS Excel worksheet that Includes raw responses from the administered survey, as downloaded unfiltered from Survey Monkey, including incomplete responses. Also includes additional classification and calculations to support analysis. Individual email addresses and IP addresses have been removed.Resource Software Recommended: Microsoft Excel,url: https://products.office.com/en-us/excel
Facebook
TwitterThis dataset provides information about the number of properties, residents, and average property values for Range View Road cross streets in Valier, MT.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
General
For more details and the most up-to-date information please consult our project page: https://kainmueller-lab.github.io/fisbe.
Summary
A new dataset for neuron instance segmentation in 3d multicolor light microscopy data of fruit fly brains
30 completely labeled (segmented) images
71 partly labeled images
altogether comprising ∼600 expert-labeled neuron instances (labeling a single neuron takes between 30-60 min on average, yet a difficult one can take up to 4 hours)
To the best of our knowledge, the first real-world benchmark dataset for instance segmentation of long thin filamentous objects
A set of metrics and a novel ranking score for respective meaningful method benchmarking
An evaluation of three baseline methods in terms of the above metrics and score
Abstract
Instance segmentation of neurons in volumetric light microscopy images of nervous systems enables groundbreaking research in neuroscience by facilitating joint functional and morphological analyses of neural circuits at cellular resolution. Yet said multi-neuron light microscopy data exhibits extremely challenging properties for the task of instance segmentation: Individual neurons have long-ranging, thin filamentous and widely branching morphologies, multiple neurons are tightly inter-weaved, and partial volume effects, uneven illumination and noise inherent to light microscopy severely impede local disentangling as well as long-range tracing of individual neurons. These properties reflect a current key challenge in machine learning research, namely to effectively capture long-range dependencies in the data. While respective methodological research is buzzing, to date methods are typically benchmarked on synthetic datasets. To address this gap, we release the FlyLight Instance Segmentation Benchmark (FISBe) dataset, the first publicly available multi-neuron light microscopy dataset with pixel-wise annotations. In addition, we define a set of instance segmentation metrics for benchmarking that we designed to be meaningful with regard to downstream analyses. Lastly, we provide three baselines to kick off a competition that we envision to both advance the field of machine learning regarding methodology for capturing long-range data dependencies, and facilitate scientific discovery in basic neuroscience.
Dataset documentation:
We provide a detailed documentation of our dataset, following the Datasheet for Datasets questionnaire:
FISBe Datasheet
Our dataset originates from the FlyLight project, where the authors released a large image collection of nervous systems of ~74,000 flies, available for download under CC BY 4.0 license.
Files
fisbe_v1.0_{completely,partly}.zip
contains the image and ground truth segmentation data; there is one zarr file per sample, see below for more information on how to access zarr files.
fisbe_v1.0_mips.zip
maximum intensity projections of all samples, for convenience.
sample_list_per_split.txt
a simple list of all samples and the subset they are in, for convenience.
view_data.py
a simple python script to visualize samples, see below for more information on how to use it.
dim_neurons_val_and_test_sets.json
a list of instance ids per sample that are considered to be of low intensity/dim; can be used for extended evaluation.
Readme.md
general information
How to work with the image files
Each sample consists of a single 3d MCFO image of neurons of the fruit fly.For each image, we provide a pixel-wise instance segmentation for all separable neurons.Each sample is stored as a separate zarr file (zarr is a file storage format for chunked, compressed, N-dimensional arrays based on an open-source specification.").The image data ("raw") and the segmentation ("gt_instances") are stored as two arrays within a single zarr file.The segmentation mask for each neuron is stored in a separate channel.The order of dimensions is CZYX.
We recommend to work in a virtual environment, e.g., by using conda:
conda create -y -n flylight-env -c conda-forge python=3.9conda activate flylight-env
How to open zarr files
Install the python zarr package:
pip install zarr
Opened a zarr file with:
import zarrraw = zarr.open(, mode='r', path="volumes/raw")seg = zarr.open(, mode='r', path="volumes/gt_instances")
Zarr arrays are read lazily on-demand.Many functions that expect numpy arrays also work with zarr arrays.Optionally, the arrays can also explicitly be converted to numpy arrays.
How to view zarr image files
We recommend to use napari to view the image data.
Install napari:
pip install "napari[all]"
Save the following Python script:
import zarr, sys, napari
raw = zarr.load(sys.argv[1], mode='r', path="volumes/raw")gts = zarr.load(sys.argv[1], mode='r', path="volumes/gt_instances")
viewer = napari.Viewer(ndisplay=3)for idx, gt in enumerate(gts): viewer.add_labels( gt, rendering='translucent', blending='additive', name=f'gt_{idx}')viewer.add_image(raw[0], colormap="red", name='raw_r', blending='additive')viewer.add_image(raw[1], colormap="green", name='raw_g', blending='additive')viewer.add_image(raw[2], colormap="blue", name='raw_b', blending='additive')napari.run()
Execute:
python view_data.py /R9F03-20181030_62_B5.zarr
Metrics
S: Average of avF1 and C
avF1: Average F1 Score
C: Average ground truth coverage
clDice_TP: Average true positives clDice
FS: Number of false splits
FM: Number of false merges
tp: Relative number of true positives
For more information on our selected metrics and formal definitions please see our paper.
Baseline
To showcase the FISBe dataset together with our selection of metrics, we provide evaluation results for three baseline methods, namely PatchPerPix (ppp), Flood Filling Networks (FFN) and a non-learnt application-specific color clustering from Duan et al..For detailed information on the methods and the quantitative results please see our paper.
License
The FlyLight Instance Segmentation Benchmark (FISBe) dataset is licensed under the Creative Commons Attribution 4.0 International (CC BY 4.0) license.
Citation
If you use FISBe in your research, please use the following BibTeX entry:
@misc{mais2024fisbe, title = {FISBe: A real-world benchmark dataset for instance segmentation of long-range thin filamentous structures}, author = {Lisa Mais and Peter Hirsch and Claire Managan and Ramya Kandarpa and Josef Lorenz Rumberger and Annika Reinke and Lena Maier-Hein and Gudrun Ihrke and Dagmar Kainmueller}, year = 2024, eprint = {2404.00130}, archivePrefix ={arXiv}, primaryClass = {cs.CV} }
Acknowledgments
We thank Aljoscha Nern for providing unpublished MCFO images as well as Geoffrey W. Meissner and the entire FlyLight Project Team for valuablediscussions.P.H., L.M. and D.K. were supported by the HHMI Janelia Visiting Scientist Program.This work was co-funded by Helmholtz Imaging.
Changelog
There have been no changes to the dataset so far.All future change will be listed on the changelog page.
Contributing
If you would like to contribute, have encountered any issues or have any suggestions, please open an issue for the FISBe dataset in the accompanying github repository.
All contributions are welcome!
Facebook
TwitterGAP species range data are coarse representations of the total areal extent a species occupies, in other words the geographic limits within which a species can be found (Morrison and Hall 2002). These data provide the geographic extent within which the USGS Gap Analysis Project delineates areas of suitable habitat for terrestrial vertebrate species in their species habitat maps. The range maps are created by attributing a vector file derived from the 12-digit Hydrologic Unit Dataset (USDA NRCS 2009). Modifications to that dataset are described here < https://www.sciencebase.gov/catalog/item/56d496eee4b015c306f17a42>. Attribution of the season range for each species was based on the literature and online sources (See Cross Reference section of the metadata). Attribution for each hydrologic unit within the range included values for origin (native, introduced, reintroduced, vagrant), occurrence (extant, possibly present, potentially present, extirpated), reproductive use (breeding, non-breeding, both) and season (year-round, summer, winter, migratory, vagrant). These species range data provide the biological context within which to build our species distribution models. Versioning, Naming Conventions and Codes: A composite version code is employed to allow the user to track the spatial extent, the date of the ground conditions, and the iteration of the data set for that extent/date. For example, CONUS_2001v1 represents the spatial extent of the conterminous US (CONUS), the ground condition year of 2001, and the first iteration (v1) for that extent/date. In many cases, a GAP species code is used in conjunction with the version code to identify specific data sets or files (i.e. Cooper’s Hawk Habitat Map named bCOHAx_CONUS_2001v1_HabMap).
Facebook
TwitterDiscover the breadth of data collected by the state which is local in nature. Search by county and municipality and discover, explore, and download local data. With a click, find local data across a broad range of categories from health to transportation, from recreation to economic development; find local farmer’s markets, child care regulated facilities, craft beverages, solar installations, food service establishment inspections, and much more.
Facebook
Twitterhttps://crawlfeeds.com/privacy_policyhttps://crawlfeeds.com/privacy_policy
Explore the "Luxury Fashion Products Dataset from Mytheresa," a comprehensive collection of data on luxury fashion items from one of the world’s leading online luxury fashion retailers.
This dataset includes detailed information on a wide range of products, such as clothing, shoes, bags, accessories, and more from top luxury brands. Each product entry provides essential details, including product names, brands, categories, prices, sizes, colors, descriptions, and availability, offering valuable insights for researchers, data analysts, and fashion professionals.
Key Features:
Whether you're analyzing luxury fashion trends, researching consumer behavior, or developing new market strategies, the "Luxury Fashion Products Dataset from Mytheresa" is an invaluable resource that provides detailed insights and extensive coverage of luxury fashion items.
Facebook
TwitterTechsalerator’s News Event Data in Asia offers a detailed and expansive dataset designed to provide businesses, analysts, journalists, and researchers with comprehensive insights into significant news events across the Asian continent. This dataset captures and categorizes major events reported from a diverse range of news sources, including press releases, industry news sites, blogs, and PR platforms, offering valuable perspectives on regional developments, economic shifts, political changes, and cultural occurrences.
Key Features of the Dataset: Extensive Coverage:
The dataset aggregates news events from a wide range of sources such as company press releases, industry-specific news outlets, blogs, PR sites, and traditional media. This broad coverage ensures a diverse array of information from multiple reporting channels. Categorization of Events:
News events are categorized into various types including business and economic updates, political developments, technological advancements, legal and regulatory changes, and cultural events. This categorization helps users quickly find and analyze information relevant to their interests or sectors. Real-Time Updates:
The dataset is updated regularly to include the most current events, ensuring users have access to the latest news and can stay informed about recent developments as they happen. Geographic Segmentation:
Events are tagged with their respective countries and regions within Asia. This geographic segmentation allows users to filter and analyze news events based on specific locations, facilitating targeted research and analysis. Event Details:
Each event entry includes comprehensive details such as the date of occurrence, source of the news, a description of the event, and relevant keywords. This thorough detailing helps users understand the context and significance of each event. Historical Data:
The dataset includes historical news event data, enabling users to track trends and perform comparative analysis over time. This feature supports longitudinal studies and provides insights into the evolution of news events. Advanced Search and Filter Options:
Users can search and filter news events based on criteria such as date range, event type, location, and keywords. This functionality allows for precise and efficient retrieval of relevant information. Asian Countries and Territories Covered: Central Asia: Kazakhstan Kyrgyzstan Tajikistan Turkmenistan Uzbekistan East Asia: China Hong Kong (Special Administrative Region of China) Japan Mongolia North Korea South Korea Taiwan South Asia: Afghanistan Bangladesh Bhutan India Maldives Nepal Pakistan Sri Lanka Southeast Asia: Brunei Cambodia East Timor (Timor-Leste) Indonesia Laos Malaysia Myanmar (Burma) Philippines Singapore Thailand Vietnam Western Asia (Middle East): Armenia Azerbaijan Bahrain Cyprus Georgia Iraq Israel Jordan Kuwait Lebanon Oman Palestine Qatar Saudi Arabia Syria Turkey (partly in Europe, but often included in Asia contextually) United Arab Emirates Yemen Benefits of the Dataset: Strategic Insights: Businesses and analysts can use the dataset to gain insights into significant regional developments, economic conditions, and political changes, aiding in strategic decision-making and market analysis. Market and Industry Trends: The dataset provides valuable information on industry-specific trends and events, helping users understand market dynamics and identify emerging opportunities. Media and PR Monitoring: Journalists and PR professionals can track relevant news across Asia, enabling them to monitor media coverage, identify emerging stories, and manage public relations efforts effectively. Academic and Research Use: Researchers can utilize the dataset for longitudinal studies, trend analysis, and academic research on various topics related to Asian news and events. Techsalerator’s News Event Data in Asia is a crucial resource for accessing and analyzing significant news events across the continent. By offering detailed, categorized, and up-to-date information, it supports effective decision-making, research, and media monitoring across diverse sectors.
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
The Long-range Pedestrian Dataset is curated for the visual entertainment sector, featuring a collection of outdoor-collected images with a high resolution of 3840 x 2160 pixels. This dataset is focused on long-distance pedestrian imagery, with each target pedestrian precisely labeled with a bounding box that closely fits the boundary of the pedestrian target, providing detailed data for scene composition and character placement in visual content.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Login Data Set for Risk-Based Authentication
Synthesized login feature data of >33M login attempts and >3.3M users on a large-scale online service in Norway. Original data collected between February 2020 and February 2021.
This data sets aims to foster research and development for Risk-Based Authentication (RBA) systems. The data was synthesized from the real-world login behavior of more than 3.3M users at a large-scale single sign-on (SSO) online service in Norway.
The users used this SSO to access sensitive data provided by the online service, e.g., a cloud storage and billing information. We used this data set to study how the Freeman et al. (2016) RBA model behaves on a large-scale online service in the real world (see Publication). The synthesized data set can reproduce these results made on the original data set (see Study Reproduction). Beyond that, you can use this data set to evaluate and improve RBA algorithms under real-world conditions.
WARNING: The feature values are plausible, but still totally artificial. Therefore, you should NOT use this data set in productive systems, e.g., intrusion detection systems.
Overview
The data set contains the following features related to each login attempt on the SSO:
| Feature | Data Type | Description | Range or Example |
|---|---|---|---|
| IP Address | String | IP address belonging to the login attempt | 0.0.0.0 - 255.255.255.255 |
| Country | String | Country derived from the IP address | US |
| Region | String | Region derived from the IP address | New York |
| City | String | City derived from the IP address | Rochester |
| ASN | Integer | Autonomous system number derived from the IP address | 0 - 600000 |
| User Agent String | String | User agent string submitted by the client | Mozilla/5.0 (Windows NT 10.0; Win64; ... |
| OS Name and Version | String | Operating system name and version derived from the user agent string | Windows 10 |
| Browser Name and Version | String | Browser name and version derived from the user agent string | Chrome 70.0.3538 |
| Device Type | String | Device type derived from the user agent string | (mobile, desktop, tablet, bot, unknown)1 |
| User ID | Integer | Idenfication number related to the affected user account | [Random pseudonym] |
| Login Timestamp | Integer | Timestamp related to the login attempt | [64 Bit timestamp] |
| Round-Trip Time (RTT) [ms] | Integer | Server-side measured latency between client and server | 1 - 8600000 |
| Login Successful | Boolean | True: Login was successful, False: Login failed | (true, false) |
| Is Attack IP | Boolean | IP address was found in known attacker data set | (true, false) |
| Is Account Takeover | Boolean | Login attempt was identified as account takeover by incident response team of the online service | (true, false) |
Data Creation
As the data set targets RBA systems, especially the Freeman et al. (2016) model, the statistical feature probabilities between all users, globally and locally, are identical for the categorical data. All the other data was randomly generated while maintaining logical relations and timely order between the features.
The timestamps, however, are not identical and contain randomness. The feature values related to IP address and user agent string were randomly generated by publicly available data, so they were very likely not present in the real data set. The RTTs resemble real values but were randomly assigned among users per geolocation. Therefore, the RTT entries were probably in other positions in the original data set.
The country was randomly assigned per unique feature value. Based on that, we randomly assigned an ASN related to the country, and generated the IP addresses for this ASN. The cities and regions were derived from the generated IP addresses for privacy reasons and do not reflect the real logical relations from the original data set.
The device types are identical to the real data set. Based on that, we randomly assigned the OS, and based on the OS the browser information. From this information, we randomly generated the user agent string. Therefore, all the logical relations regarding the user agent are identical as in the real data set.
The RTT was randomly drawn from the login success status and synthesized geolocation data. We did this to ensure that the RTTs are realistic ones.
Regarding the Data Values
Due to unresolvable conflicts during the data creation, we had to assign some unrealistic IP addresses and ASNs that are not present in the real world. Nevertheless, these do not have any effects on the risk scores generated by the Freeman et al. (2016) model.
You can recognize them by the following values:
ASNs with values >= 500.000
IP addresses in the range 10.0.0.0 - 10.255.255.255 (10.0.0.0/8 CIDR range)
Study Reproduction
Based on our evaluation, this data set can reproduce our study results regarding the RBA behavior of an RBA model using the IP address (IP address, country, and ASN) and user agent string (Full string, OS name and version, browser name and version, device type) as features.
The calculated RTT significances for countries and regions inside Norway are not identical using this data set, but have similar tendencies. The same is true for the Median RTTs per country. This is due to the fact that the available number of entries per country, region, and city changed with the data creation procedure. However, the RTTs still reflect the real-world distributions of different geolocations by city.
See RESULTS.md for more details.
Ethics
By using the SSO service, the users agreed in the data collection and evaluation for research purposes. For study reproduction and fostering RBA research, we agreed with the data owner to create a synthesized data set that does not allow re-identification of customers.
The synthesized data set does not contain any sensitive data values, as the IP addresses, browser identifiers, login timestamps, and RTTs were randomly generated and assigned.
Publication
You can find more details on our conducted study in the following journal article:
Pump Up Password Security! Evaluating and Enhancing Risk-Based Authentication on a Real-World Large-Scale Online Service (2022)
Stephan Wiefling, Paul René Jørgensen, Sigurd Thunem, and Luigi Lo Iacono.
ACM Transactions on Privacy and Security
Bibtex
@article{Wiefling_Pump_2022,
author = {Wiefling, Stephan and Jørgensen, Paul René and Thunem, Sigurd and Lo Iacono, Luigi},
title = {Pump {Up} {Password} {Security}! {Evaluating} and {Enhancing} {Risk}-{Based} {Authentication} on a {Real}-{World} {Large}-{Scale} {Online} {Service}},
journal = {{ACM} {Transactions} on {Privacy} and {Security}},
doi = {10.1145/3546069},
publisher = {ACM},
year = {2022}
}
License
This data set and the contents of this repository are licensed under the Creative Commons Attribution 4.0 International (CC BY 4.0) license. See the LICENSE file for details. If the data set is used within a publication, the following journal article has to be cited as the source of the data set:
Stephan Wiefling, Paul René Jørgensen, Sigurd Thunem, and Luigi Lo Iacono: Pump Up Password Security! Evaluating and Enhancing Risk-Based Authentication on a Real-World Large-Scale Online Service. In: ACM Transactions on Privacy and Security (2022). doi: 10.1145/3546069
Few (invalid) user agents strings from the original data set could not be parsed, so their device type is empty. Perhaps this parse error is useful information for your studies, so we kept these 1526 entries.↩︎
Facebook
TwitterMammalian home range papers were compiled via an extensive literature search. All home range values were extracted from the literature including individual, group and population-level home range values. Associated values were also compiled including species names, methodological information on data collection, home-range estimation method, period of data collection, study coordinates and name of location, as well as species traits derived from the studies, such as body mass, life stage, reproductive status and locomotor habit.
We also provide an R package, which can be installed from https://github.com/SHoeks/HomeRange. The HomeRange R package provides functions for downloading the latest version of the HomeRange database and loading it as a standard dataframe into R, plotting several statistics of the database and finally attaching species traits (e.g. species average body mass, trophic level). from the CO...
Facebook
TwitterTechsalerator’s News Event Data in North America offers a comprehensive and detailed dataset designed to provide businesses, analysts, journalists, and researchers with a thorough view of significant news events across North America. This dataset captures and categorizes major events reported from a diverse range of news sources, including press releases, industry news sites, blogs, and PR platforms, providing valuable insights into regional developments, economic shifts, political changes, and cultural events.
Key Features of the Dataset: Extensive Coverage:
The dataset aggregates news events from a wide array of sources, including company press releases, industry-specific news outlets, blogs, PR sites, and traditional media. This broad coverage ensures a diverse range of information from multiple reporting channels. Categorization of Events:
News events are categorized into various types such as business and economic updates, political developments, technological advancements, legal and regulatory changes, and cultural events. This categorization helps users quickly find and analyze information relevant to their interests or sectors. Real-Time Updates:
The dataset is updated regularly to include the most current events, ensuring that users have access to up-to-date news and can stay informed about recent developments as they happen. Geographic Segmentation:
Events are tagged with their respective countries and territories within North America. This geographic segmentation allows users to filter and analyze news events based on specific locations, facilitating targeted research and analysis. Event Details:
Each event entry includes comprehensive details such as the date of occurrence, source of the news, a description of the event, and relevant keywords. This thorough detailing helps users understand the context and significance of each event. Historical Data:
The dataset includes historical news event data, enabling users to track trends and conduct comparative analysis over time. This feature supports longitudinal studies and provides insights into how news events evolve. Advanced Search and Filter Options:
Users can search and filter news events based on criteria such as date range, event type, location, and keywords. This functionality allows for precise and efficient retrieval of relevant information. North American Countries and Territories Covered: Countries: Canada Mexico United States Territories: American Samoa (U.S. territory) French Polynesia (French overseas collectivity; included for regional relevance) Guam (U.S. territory) New Caledonia (French special collectivity; included for regional relevance) Northern Mariana Islands (U.S. territory) Puerto Rico (U.S. territory) Saint Pierre and Miquelon (French overseas territory; geographically close to North America and included for regional comprehensiveness) Wallis and Futuna (French overseas collectivity; included for regional relevance) Benefits of the Dataset: Strategic Insights: Businesses and analysts can use the dataset to gain insights into significant regional developments, economic conditions, and political changes, aiding in strategic decision-making and market analysis. Market and Industry Trends: The dataset provides valuable information on industry-specific trends and events, helping users understand market dynamics and identify emerging opportunities. Media and PR Monitoring: Journalists and PR professionals can track relevant news across North America, enabling them to monitor media coverage, identify emerging stories, and manage public relations efforts effectively. Academic and Research Use: Researchers can utilize the dataset for longitudinal studies, trend analysis, and academic research on various topics related to North American news and events. Techsalerator’s News Event Data in North America is a crucial resource for accessing and analyzing significant news events across the continent. By providing detailed, categorized, and up-to-date information, it supports effective decision-making, research, and media monitoring across diverse sectors.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The “Fused Image dataset for convolutional neural Network-based crack Detection” (FIND) is a large-scale image dataset with pixel-level ground truth crack data for deep learning-based crack segmentation analysis. It features four types of image data including raw intensity image, raw range (i.e., elevation) image, filtered range image, and fused raw image. The FIND dataset consists of 2500 image patches (dimension: 256x256 pixels) and their ground truth crack maps for each of the four data types.
The images contained in this dataset were collected from multiple bridge decks and roadways under real-world conditions. A laser scanning device was adopted for data acquisition such that the captured raw intensity and raw range images have pixel-to-pixel location correspondence (i.e., spatial co-registration feature). The filtered range data were generated by applying frequency domain filtering to eliminate image disturbances (e.g., surface variations, and grooved patterns) from the raw range data [1]. The fused image data were obtained by combining the raw range and raw intensity data to achieve cross-domain feature correlation [2,3]. Please refer to [4] for a comprehensive benchmark study performed using the FIND dataset to investigate the impact from different types of image data on deep convolutional neural network (DCNN) performance.
If you share or use this dataset, please cite [4] and [5] in any relevant documentation.
In addition, an image dataset for crack classification has also been published at [6].
References:
[1] Shanglian Zhou, & Wei Song. (2020). Robust Image-Based Surface Crack Detection Using Range Data. Journal of Computing in Civil Engineering, 34(2), 04019054. https://doi.org/10.1061/(asce)cp.1943-5487.0000873
[2] Shanglian Zhou, & Wei Song. (2021). Crack segmentation through deep convolutional neural networks and heterogeneous image fusion. Automation in Construction, 125. https://doi.org/10.1016/j.autcon.2021.103605
[3] Shanglian Zhou, & Wei Song. (2020). Deep learning–based roadway crack classification with heterogeneous image data fusion. Structural Health Monitoring, 20(3), 1274-1293. https://doi.org/10.1177/1475921720948434
[4] Shanglian Zhou, Carlos Canchila, & Wei Song. (2023). Deep learning-based crack segmentation for civil infrastructure: data types, architectures, and benchmarked performance. Automation in Construction, 146. https://doi.org/10.1016/j.autcon.2022.104678
[5] (This dataset) Shanglian Zhou, Carlos Canchila, & Wei Song. (2022). Fused Image dataset for convolutional neural Network-based crack Detection (FIND) [Data set]. Zenodo. https://doi.org/10.5281/zenodo.6383044
[6] Wei Song, & Shanglian Zhou. (2020). Laser-scanned roadway range image dataset (LRRD). Laser-scanned Range Image Dataset from Asphalt and Concrete Roadways for DCNN-based Crack Classification, DesignSafe-CI. https://doi.org/10.17603/ds2-bzv3-nc78