100+ datasets found

Fused Image dataset for convolutional neural Network-based crack Detection...
zenodo.org
data.niaid.nih.gov
zip
Updated Apr 20, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Shanglian Zhou; Shanglian Zhou; Carlos Canchila; Carlos Canchila; Wei Song; Wei Song (2023). Fused Image dataset for convolutional neural Network-based crack Detection (FIND) [Dataset]. http://doi.org/10.5281/zenodo.6383044
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.6383044
Dataset updated
Apr 20, 2023
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Shanglian Zhou; Shanglian Zhou; Carlos Canchila; Carlos Canchila; Wei Song; Wei Song
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The “Fused Image dataset for convolutional neural Network-based crack Detection” (FIND) is a large-scale image dataset with pixel-level ground truth crack data for deep learning-based crack segmentation analysis. It features four types of image data including raw intensity image, raw range (i.e., elevation) image, filtered range image, and fused raw image. The FIND dataset consists of 2500 image patches (dimension: 256x256 pixels) and their ground truth crack maps for each of the four data types.

The images contained in this dataset were collected from multiple bridge decks and roadways under real-world conditions. A laser scanning device was adopted for data acquisition such that the captured raw intensity and raw range images have pixel-to-pixel location correspondence (i.e., spatial co-registration feature). The filtered range data were generated by applying frequency domain filtering to eliminate image disturbances (e.g., surface variations, and grooved patterns) from the raw range data [1]. The fused image data were obtained by combining the raw range and raw intensity data to achieve cross-domain feature correlation [2,3]. Please refer to [4] for a comprehensive benchmark study performed using the FIND dataset to investigate the impact from different types of image data on deep convolutional neural network (DCNN) performance.

If you share or use this dataset, please cite [4] and [5] in any relevant documentation.

In addition, an image dataset for crack classification has also been published at [6].

References:

[1] Shanglian Zhou, & Wei Song. (2020). Robust Image-Based Surface Crack Detection Using Range Data. Journal of Computing in Civil Engineering, 34(2), 04019054. https://doi.org/10.1061/(asce)cp.1943-5487.0000873

[2] Shanglian Zhou, & Wei Song. (2021). Crack segmentation through deep convolutional neural networks and heterogeneous image fusion. Automation in Construction, 125. https://doi.org/10.1016/j.autcon.2021.103605

[3] Shanglian Zhou, & Wei Song. (2020). Deep learning–based roadway crack classification with heterogeneous image data fusion. Structural Health Monitoring, 20(3), 1274-1293. https://doi.org/10.1177/1475921720948434

[4] Shanglian Zhou, Carlos Canchila, & Wei Song. (2023). Deep learning-based crack segmentation for civil infrastructure: data types, architectures, and benchmarked performance. Automation in Construction, 146. https://doi.org/10.1016/j.autcon.2022.104678

[5] (This dataset) Shanglian Zhou, Carlos Canchila, & Wei Song. (2022). Fused Image dataset for convolutional neural Network-based crack Detection (FIND) [Data set]. Zenodo. https://doi.org/10.5281/zenodo.6383044

[6] Wei Song, & Shanglian Zhou. (2020). Laser-scanned roadway range image dataset (LRRD). Laser-scanned Range Image Dataset from Asphalt and Concrete Roadways for DCNN-based Crack Classification, DesignSafe-CI. https://doi.org/10.17603/ds2-bzv3-nc78
Street Video Dataset
kaggle.com
zip
Updated Oct 18, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Macgence (2025). Street Video Dataset [Dataset]. https://www.kaggle.com/datasets/macgence/street-video-dataset
Explore at:
zip(80128 bytes)Available download formats
Dataset updated
Oct 18, 2025
Authors
Macgence
Description
Improve your Computer Vision models using our extensive collection of Street Video Dataset along the street from individuals. This dataset covers a broad range of demographics and scenarios, which will enhance the accuracy of facial recognition, Video Recognition features in your models. This specialized collection of Video data is meticulously curated to support research and development in the construction industry. This dataset provides a rich resource for training and evaluation purposes.

Metadata Availability: Insights into Participant Details

Each participant is accompanied by comprehensive metadata, which includes detailed information about their age, gender, location. Furthermore, this metadata encompasses details such as domain, topic, type, and outcome, providing valuable insights for both model development and evaluation purposes.

Specifications:

Type: Video Volume: 3000 Industry: Video Recognition File Format: MP4 Gender Distribution: 50/50 Age Range: 18 – 65

These technical specifications ensure compatibility and optimal performance for a wide range of AI development applications.

Insights into Image Data:

The dataset comprises 3000 high-quality Video. Created through collaboration with a network of experts, it captures realistic, ensuring a balanced representation age, gender and demographics.

License:

Exclusively curated by Macgence, this Video dataset is available for commercial use, empowering AI developers.

Updates and Customization:

Consistent updates with fresh Video recorded in varied real-world scenarios guarantee ongoing relevance and precision. We offer customization options such as adjusting samples and providing datasets tailored to your specific criteria and needs.

Looking for high-quality datasets to train your AI model? Contact us today to get the dataset you need—fast, reliable, and ready for deployment!
Research Papers Dataset
kaggle.com
zip
Updated May 8, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
NECHBA MOHAMMED (2023). Research Papers Dataset [Dataset]. https://www.kaggle.com/datasets/nechbamohammed/research-papers-dataset
Explore at:
zip(619131172 bytes)Available download formats
Dataset updated
May 8, 2023
Authors
NECHBA MOHAMMED
Description
Description: This dataset (Version 10) contains a collection of research papers along with various attributes and metadata. It is a comprehensive and diverse dataset that can be used for a wide range of research and analysis tasks. The dataset encompasses papers from different fields of study, including computer science, mathematics, physics, and more.

Fields in the Dataset: - id: A unique identifier for each paper. - title: The title of the research paper. - authors: The list of authors involved in the paper. - venue: The journal or venue where the paper was published. - year: The year when the paper was published. - n_citation: The number of citations received by the paper. - references: A list of paper IDs that are cited by the current paper. - abstract: The abstract of the paper.

Example: - "id": "013ea675-bb58-42f8-a423-f5534546b2b1", - "title": "Prediction of consensus binding mode geometries for related chemical series of positive allosteric modulators of adenosine and muscarinic acetylcholine receptors", - "authors": ["Leon A. Sakkal", "Kyle Z. Rajkowski", "Roger S. Armen"], - "venue": "Journal of Computational Chemistry", - "year": 2017, - "n_citation": 0, - "references": ["4f4f200c-0764-4fef-9718-b8bccf303dba", "aa699fbf-fabe-40e4-bd68-46eaf333f7b1"], - "abstract": "This paper studies ..."

Cite: https://www.aminer.cn/citation
d
Geospatial Database of Hydroclimate Variables, Spring Mountains and Sheep...
catalog.data.gov
data.usgs.gov
+2more
Updated Oct 22, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. Geological Survey (2025). Geospatial Database of Hydroclimate Variables, Spring Mountains and Sheep Range, Clark County, Nevada [Dataset]. https://catalog.data.gov/dataset/geospatial-database-of-hydroclimate-variables-spring-mountains-and-sheep-range-clark-count
Explore at:
Dataset updated
Oct 22, 2025
Dataset provided by
U.S. Geological Survey
Area covered
Spring Mountains, Sheep Range, Nevada, Clark County
Description
This point feature class contains 81,481 points arranged in a 270-meter spaced grid that covers the Spring Mountains and Sheep Range in Clark County, Nevada. Points are attributed with hydroclimate variables and ancillary data compiled to support efforts to characterize ecological zones.
N
South Range, MI Population Breakdown by Gender and Age Dataset: Male and...
neilsberg.com
csv, json
Updated Feb 24, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Neilsberg Research (2025). South Range, MI Population Breakdown by Gender and Age Dataset: Male and Female Population Distribution Across 18 Age Groups // 2025 Edition [Dataset]. https://www.neilsberg.com/research/datasets/e200fba9-f25d-11ef-8c1b-3860777c1fe6/
Explore at:
csv, jsonAvailable download formats
Dataset updated
Feb 24, 2025
Dataset authored and provided by
Neilsberg Research
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
South Range, Michigan
Variables measured
Male and Female Population Under 5 Years, Male and Female Population over 85 years, Male and Female Population Between 5 and 9 years, Male and Female Population Between 10 and 14 years, Male and Female Population Between 15 and 19 years, Male and Female Population Between 20 and 24 years, Male and Female Population Between 25 and 29 years, Male and Female Population Between 30 and 34 years, Male and Female Population Between 35 and 39 years, Male and Female Population Between 40 and 44 years, and 8 more
Measurement technique
The data presented in this dataset is derived from the latest U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates. To measure the three variables, namely (a) Population (Male), (b) Population (Female), and (c) Gender Ratio (Males per 100 Females), we initially analyzed and categorized the data for each of the gender classifications (biological sex) reported by the US Census Bureau across 18 age groups, ranging from under 5 years to 85 years and above. These age groups are described above in the variables section. For further information regarding these estimates, please feel free to reach out to us via email at research@neilsberg.com.
Dataset funded by
Neilsberg Research
Description
About this dataset

Context

The dataset tabulates the population of South Range by gender across 18 age groups. It lists the male and female population in each age group along with the gender ratio for South Range. The dataset can be utilized to understand the population distribution of South Range by gender and age. For example, using this dataset, we can identify the largest age group for both Men and Women in South Range. Additionally, it can be used to see how the gender ratio changes from birth to senior most age group and male to female ratio across each age group for South Range.

Key observations

Largest age group (population): Male # 20-24 years (49) | Female # 20-24 years (50). Source: U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates.

Content

When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates.

Age groups:

Under 5 years

5 to 9 years

10 to 14 years

15 to 19 years

20 to 24 years

25 to 29 years

30 to 34 years

35 to 39 years

40 to 44 years

45 to 49 years

50 to 54 years

55 to 59 years

60 to 64 years

65 to 69 years

70 to 74 years

75 to 79 years

80 to 84 years

85 years and over

Scope of gender :

Please note that American Community Survey asks a question about the respondents current sex, but not about gender, sexual orientation, or sex at birth. The question is intended to capture data for biological sex, not gender. Respondents are supposed to respond with the answer as either of Male or Female. Our research and this dataset mirrors the data reported as Male and Female for gender distribution analysis.

Variables / Data Columns

Age Group: This column displays the age group for the South Range population analysis. Total expected values are 18 and are define above in the age groups section.

Population (Male): The male population in the South Range is shown in the following column.

Population (Female): The female population in the South Range is shown in the following column.

Gender Ratio: Also known as the sex ratio, this column displays the number of males per 100 females in South Range for each age group.

Good to know

Margin of Error

Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.

Custom data

If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.

Inspiration

Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.

Recommended for further research

This dataset is a part of the main dataset for South Range Population by Gender. You can refer the same here
Z
ANN development + final testing datasets
data.niaid.nih.gov
resodate.org
+1more
Updated Jan 24, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Authors (2020). ANN development + final testing datasets [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_1445865
Explore at:
Dataset updated
Jan 24, 2020
Authors
Authors
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
File name definitions:

'...v_50_175_250_300...' - dataset for velocity ranges [50, 175] + [250, 300] m/s

'...v_175_250...' - dataset for velocity range [175, 250] m/s

'ANNdevelop...' - used to perform 9 parametric sub-analyses where, in each one, many ANNs are developed (trained, validated and tested) and the one yielding the best results is selected

'ANNtest...' - used to test the best ANN from each aforementioned parametric sub-analysis, aiming to find the best ANN model; this dataset includes the 'ANNdevelop...' counterpart

Where to find the input (independent) and target (dependent) variable values for each dataset/excel ?

input values in 'IN' sheet

target values in 'TARGET' sheet

Where to find the results from the best ANN model (for each target/output variable and each velocity range)?

open the corresponding excel file and the expected (target) vs ANN (output) results are written in 'TARGET vs OUTPUT' sheet

Check reference below (to be added when the paper is published)

https://www.researchgate.net/publication/328849817_11_Neural_Networks_-_Max_Disp_-_Railway_Beams
Data from: Product Images Dataset
kaggle.com
zip
Updated Nov 26, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Devops (2025). Product Images Dataset [Dataset]. https://www.kaggle.com/datasets/freshersstaff/product-images-dataset
Explore at:
zip(2920601904 bytes)Available download formats
Dataset updated
Nov 26, 2025
Authors
Devops
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
This file contains the ABO Product Images Dataset, consisting of 398,212 product images organized into 256 folders. Each folder represents a unique product with multiple images capturing different angles and views. The dataset covers a wide range of product categories and is suitable for research in product recognition, visual similarity analysis, and large-scale e-commerce inventory management.

Categories

The dataset spans a wide variety of product types commonly found in online retail, such as:

Apparel and accessories

Home and kitchen items

Electronics and gadgets

Personal care and beauty products

Tools and hardware

Sports and outdoor equipment

Toys and hobby items

Attributes

Products and their images exhibit a range of real-world visual attributes, including:

Color variations (neutral, vibrant, patterned)

Material types (plastic, metal, textile, glass, wood)

Shape and structural differences (cylindrical, box-shaped, curved, flat)

Angles of capture (front, back, side, top, perspective)

Scale differences between images of the same product

Varied lighting conditions

Background types (studio-style, neutral backgrounds, real-world capture)
Coffee Shop Daily Revenue Prediction Dataset
kaggle.com
zip
Updated Feb 7, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Himel Sarder (2025). Coffee Shop Daily Revenue Prediction Dataset [Dataset]. https://www.kaggle.com/datasets/himelsarder/coffee-shop-daily-revenue-prediction-dataset
Explore at:
zip(30259 bytes)Available download formats
Dataset updated
Feb 7, 2025
Authors
Himel Sarder
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Dataset Overview

This dataset contains 2,000 rows of data from coffee shops, offering detailed insights into factors that influence daily revenue. It includes key operational and environmental variables that provide a comprehensive view of how business activities and external conditions affect sales performance. Designed for use in predictive analytics and business optimization, this dataset is a valuable resource for anyone looking to understand the relationship between customer behavior, operational decisions, and revenue generation in the food and beverage industry.

Columns & Variables

The dataset features a variety of columns that capture the operational details of coffee shops, including customer activity, store operations, and external factors such as marketing spend and location foot traffic.

Number of Customers Per Day

The total number of customers visiting the coffee shop on any given day.

Range: 50 - 500 customers.

Average Order Value ($)

The average dollar amount spent by each customer during their visit.

Range: $2.50 - $10.00.

Operating Hours Per Day

The total number of hours the coffee shop is open for business each day.

Range: 6 - 18 hours.

Number of Employees

The number of employees working on a given day. This can influence service speed, customer satisfaction, and ultimately, sales.

Range: 2 - 15 employees.

Marketing Spend Per Day ($)

The amount of money spent on marketing campaigns or promotions on any given day.

Range: $10 - $500 per day.

Location Foot Traffic (people/hour)

The number of people passing by the coffee shop per hour, a variable indicative of the shop's location and its potential to attract customers.

Range: 50 - 1000 people per hour.

Target Variable

Daily Revenue ($)

This is the dependent variable representing the total revenue generated by the coffee shop each day.

It is calculated as a combination of customer visits, average spending, and other operational factors like marketing spend and staff availability.

Range: $200 - $10,000 per day.

Data Distribution & Insights

The dataset spans a wide variety of operational scenarios, from small neighborhood coffee shops with limited traffic to larger, high-traffic locations with extensive marketing budgets. This variety allows for exploring different predictive modeling strategies. Key insights that can be derived from the data include:

The effect of marketing spend on daily revenue.

The correlation between customer count and daily sales.

The relationship between staffing levels and revenue generation.

The influence of foot traffic and operating hours on customer behavior.

Use Cases & Applications

The dataset offers a wide range of applications, especially in predictive analytics, business optimization, and forecasting:

Predictive Modeling: Use machine learning models such as regression, decision trees, or neural networks to predict daily revenue based on operational data.

Business Strategy Development: Analyze how changes in marketing spend, staff numbers, or operating hours can optimize revenue and improve efficiency.

Customer Insights: Identify patterns in customer behavior related to shop operations and external factors like foot traffic and marketing campaigns.

Resource Allocation: Determine optimal staffing levels and marketing budgets based on predicted sales, improving overall profitability.

Real-World Applications in the Food & Beverage Industry

For coffee shop owners, managers, and analysts in the food and beverage industry, this dataset provides an essential tool for refining daily operations and boosting profitability. Insights gained from this data can help:

Optimize Marketing Campaigns: Evaluate the effectiveness of daily or seasonal marketing campaigns on revenue.

Staff Scheduling: Predict busy days and ensure that the right number of employees are scheduled to maximize efficiency.

Revenue Forecasting: Provide accurate revenue projections that can assist with financial planning and decision-making.

Operational Efficiency: Discover the most profitable operating hours and adjust business hours accordingly.

This dataset is also ideal for aspiring data scientists and machine learning practitioners looking to apply their skills to real-world business problems in the food and beverage sector.

Conclusion

The Coffee Shop Revenue Prediction Dataset is a versatile and comprehensive resource for understanding the dynamics of daily sales performance in coffee shops. With a focus on key operational factors, it is perfect for building predictive models, ...

Credit Card Eligibility Data: Determining Factors

kaggle.com

zip

Updated May 18, 2024

Facebook

Twitter

Click to copy link

Link copied

Cite

Rohit Sharma (2024). Credit Card Eligibility Data: Determining Factors [Dataset]. https://www.kaggle.com/datasets/rohit265/credit-card-eligibility-data-determining-factors

Explore at:

zip(303227 bytes)Available download formats

Dataset updated

May 18, 2024

Authors

Rohit Sharma

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Description of the Credit Card Eligibility Data: Determining Factors

The Credit Card Eligibility Dataset: Determining Factors is a comprehensive collection of variables aimed at understanding the factors that influence an individual's eligibility for a credit card. This dataset encompasses a wide range of demographic, financial, and personal attributes that are commonly considered by financial institutions when assessing an individual's suitability for credit.

Each row in the dataset represents a unique individual, identified by a unique ID, with associated attributes ranging from basic demographic information such as gender and age, to financial indicators like total income and employment status. Additionally, the dataset includes variables related to familial status, housing, education, and occupation, providing a holistic view of the individual's background and circumstances.

Variable	Description
ID	An identifier for each individual (customer).
Gender	The gender of the individual.
Own_car	A binary feature indicating whether the individual owns a car.
Own_property	A binary feature indicating whether the individual owns a property.
Work_phone	A binary feature indicating whether the individual has a work phone.
Phone	A binary feature indicating whether the individual has a phone.
Email	A binary feature indicating whether the individual has provided an email address.
Unemployed	A binary feature indicating whether the individual is unemployed.
Num_children	The number of children the individual has.
Num_family	The total number of family members.
Account_length	The length of the individual's account with a bank or financial institution.
Total_income	The total income of the individual.
Age	The age of the individual.
Years_employed	The number of years the individual has been employed.
Income_type	The type of income (e.g., employed, self-employed, etc.).
Education_type	The education level of the individual.
Family_status	The family status of the individual.
Housing_type	The type of housing the individual lives in.
Occupation_type	The type of occupation the individual is engaged in.
Target	The target variable for the classification task, indicating whether the individual is eligible for a credit card or not (e.g., Yes/No, 1/0).

Researchers, analysts, and financial institutions can leverage this dataset to gain insights into the key factors influencing credit card eligibility and to develop predictive models that assist in automating the credit assessment process. By understanding the relationship between various attributes and credit card eligibility, stakeholders can make more informed decisions, improve risk assessment strategies, and enhance customer targeting and segmentation efforts.

This dataset is valuable for a wide range of applications within the financial industry, including credit risk management, customer relationship management, and marketing analytics. Furthermore, it provides a valuable resource for academic research and educational purposes, enabling students and researchers to explore the intricate dynamics of credit card eligibility determination.

Data from: Current and projected research data storage needs of Agricultural...
catalog.data.gov
agdatacommons.nal.usda.gov
+2more
Updated Apr 21, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Agricultural Research Service (2025). Current and projected research data storage needs of Agricultural Research Service researchers in 2016 [Dataset]. https://catalog.data.gov/dataset/current-and-projected-research-data-storage-needs-of-agricultural-research-service-researc-f33da
Explore at:
Dataset updated
Apr 21, 2025
Dataset provided by
Agricultural Research Servicehttps://www.ars.usda.gov/
Description
The USDA Agricultural Research Service (ARS) recently established SCINet , which consists of a shared high performance computing resource, Ceres, and the dedicated high-speed Internet2 network used to access Ceres. Current and potential SCINet users are using and generating very large datasets so SCINet needs to be provisioned with adequate data storage for their active computing. It is not designed to hold data beyond active research phases. At the same time, the National Agricultural Library has been developing the Ag Data Commons, a research data catalog and repository designed for public data release and professional data curation. Ag Data Commons needs to anticipate the size and nature of data it will be tasked with handling. The ARS Web-enabled Databases Working Group, organized under the SCINet initiative, conducted a study to establish baseline data storage needs and practices, and to make projections that could inform future infrastructure design, purchases, and policies. The SCINet Web-enabled Databases Working Group helped develop the survey which is the basis for an internal report. While the report was for internal use, the survey and resulting data may be generally useful and are being released publicly. From October 24 to November 8, 2016 we administered a 17-question survey (Appendix A) by emailing a Survey Monkey link to all ARS Research Leaders, intending to cover data storage needs of all 1,675 SY (Category 1 and Category 4) scientists. We designed the survey to accommodate either individual researcher responses or group responses. Research Leaders could decide, based on their unit's practices or their management preferences, whether to delegate response to a data management expert in their unit, to all members of their unit, or to themselves collate responses from their unit before reporting in the survey. Larger storage ranges cover vastly different amounts of data so the implications here could be significant depending on whether the true amount is at the lower or higher end of the range. Therefore, we requested more detail from "Big Data users," those 47 respondents who indicated they had more than 10 to 100 TB or over 100 TB total current data (Q5). All other respondents are called "Small Data users." Because not all of these follow-up requests were successful, we used actual follow-up responses to estimate likely responses for those who did not respond. We defined active data as data that would be used within the next six months. All other data would be considered inactive, or archival. To calculate per person storage needs we used the high end of the reported range divided by 1 for an individual response, or by G, the number of individuals in a group response. For Big Data users we used the actual reported values or estimated likely values. Resources in this dataset:Resource Title: Appendix A: ARS data storage survey questions. File Name: Appendix A.pdfResource Description: The full list of questions asked with the possible responses. The survey was not administered using this PDF but the PDF was generated directly from the administered survey using the Print option under Design Survey. Asterisked questions were required. A list of Research Units and their associated codes was provided in a drop down not shown here. Resource Software Recommended: Adobe Acrobat,url: https://get.adobe.com/reader/ Resource Title: CSV of Responses from ARS Researcher Data Storage Survey. File Name: Machine-readable survey response data.csvResource Description: CSV file includes raw responses from the administered survey, as downloaded unfiltered from Survey Monkey, including incomplete responses. Also includes additional classification and calculations to support analysis. Individual email addresses and IP addresses have been removed. This information is that same data as in the Excel spreadsheet (also provided).Resource Title: Responses from ARS Researcher Data Storage Survey. File Name: Data Storage Survey Data for public release.xlsxResource Description: MS Excel worksheet that Includes raw responses from the administered survey, as downloaded unfiltered from Survey Monkey, including incomplete responses. Also includes additional classification and calculations to support analysis. Individual email addresses and IP addresses have been removed.Resource Software Recommended: Microsoft Excel,url: https://products.office.com/en-us/excel
o
Range View Road Cross Street Data in Valier, MT
ownerly.com
Updated Dec 11, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ownerly (2021). Range View Road Cross Street Data in Valier, MT [Dataset]. https://www.ownerly.com/mt/valier/range-view-rd-home-details
Explore at:
Dataset updated
Dec 11, 2021
Dataset authored and provided by
Ownerly
Area covered
Valier, Montana, Range View Road
Description
This dataset provides information about the number of properties, residents, and average property values for Range View Road cross streets in Valier, MT.
Z
Data from: FISBe: A real-world benchmark dataset for instance segmentation...
data.niaid.nih.gov
data-staging.niaid.nih.gov
+1more
Updated Apr 2, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mais, Lisa; Hirsch, Peter; Managan, Claire; Kandarpa, Ramya; Rumberger, Josef Lorenz; Reinke, Annika; Maier-Hein, Lena; Ihrke, Gudrun; Kainmueller, Dagmar (2024). FISBe: A real-world benchmark dataset for instance segmentation of long-range thin filamentous structures [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_10875062
Explore at:
Dataset updated
Apr 2, 2024
Dataset provided by
German Cancer Research Center
Max Delbrück Center for Molecular Medicine
Max Delbrück Center
Howard Hughes Medical Institute - Janelia Research Campus
Authors
Mais, Lisa; Hirsch, Peter; Managan, Claire; Kandarpa, Ramya; Rumberger, Josef Lorenz; Reinke, Annika; Maier-Hein, Lena; Ihrke, Gudrun; Kainmueller, Dagmar
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
General

For more details and the most up-to-date information please consult our project page: https://kainmueller-lab.github.io/fisbe.

Summary

A new dataset for neuron instance segmentation in 3d multicolor light microscopy data of fruit fly brains

30 completely labeled (segmented) images

71 partly labeled images

altogether comprising ∼600 expert-labeled neuron instances (labeling a single neuron takes between 30-60 min on average, yet a difficult one can take up to 4 hours)

To the best of our knowledge, the first real-world benchmark dataset for instance segmentation of long thin filamentous objects

A set of metrics and a novel ranking score for respective meaningful method benchmarking

An evaluation of three baseline methods in terms of the above metrics and score

Abstract

Instance segmentation of neurons in volumetric light microscopy images of nervous systems enables groundbreaking research in neuroscience by facilitating joint functional and morphological analyses of neural circuits at cellular resolution. Yet said multi-neuron light microscopy data exhibits extremely challenging properties for the task of instance segmentation: Individual neurons have long-ranging, thin filamentous and widely branching morphologies, multiple neurons are tightly inter-weaved, and partial volume effects, uneven illumination and noise inherent to light microscopy severely impede local disentangling as well as long-range tracing of individual neurons. These properties reflect a current key challenge in machine learning research, namely to effectively capture long-range dependencies in the data. While respective methodological research is buzzing, to date methods are typically benchmarked on synthetic datasets. To address this gap, we release the FlyLight Instance Segmentation Benchmark (FISBe) dataset, the first publicly available multi-neuron light microscopy dataset with pixel-wise annotations. In addition, we define a set of instance segmentation metrics for benchmarking that we designed to be meaningful with regard to downstream analyses. Lastly, we provide three baselines to kick off a competition that we envision to both advance the field of machine learning regarding methodology for capturing long-range data dependencies, and facilitate scientific discovery in basic neuroscience.

Dataset documentation:

We provide a detailed documentation of our dataset, following the Datasheet for Datasets questionnaire:

FISBe Datasheet

Our dataset originates from the FlyLight project, where the authors released a large image collection of nervous systems of ~74,000 flies, available for download under CC BY 4.0 license.

Files

fisbe_v1.0_{completely,partly}.zip

contains the image and ground truth segmentation data; there is one zarr file per sample, see below for more information on how to access zarr files.

fisbe_v1.0_mips.zip

maximum intensity projections of all samples, for convenience.

sample_list_per_split.txt

a simple list of all samples and the subset they are in, for convenience.

view_data.py

a simple python script to visualize samples, see below for more information on how to use it.

dim_neurons_val_and_test_sets.json

a list of instance ids per sample that are considered to be of low intensity/dim; can be used for extended evaluation.

Readme.md

general information

How to work with the image files

Each sample consists of a single 3d MCFO image of neurons of the fruit fly.For each image, we provide a pixel-wise instance segmentation for all separable neurons.Each sample is stored as a separate zarr file (zarr is a file storage format for chunked, compressed, N-dimensional arrays based on an open-source specification.").The image data ("raw") and the segmentation ("gt_instances") are stored as two arrays within a single zarr file.The segmentation mask for each neuron is stored in a separate channel.The order of dimensions is CZYX.

We recommend to work in a virtual environment, e.g., by using conda:

conda create -y -n flylight-env -c conda-forge python=3.9conda activate flylight-env

How to open zarr files

Install the python zarr package:

pip install zarr

Opened a zarr file with:

import zarrraw = zarr.open(, mode='r', path="volumes/raw")seg = zarr.open(, mode='r', path="volumes/gt_instances")

optional:import numpy as npraw_np = np.array(raw)

Zarr arrays are read lazily on-demand.Many functions that expect numpy arrays also work with zarr arrays.Optionally, the arrays can also explicitly be converted to numpy arrays.

How to view zarr image files

We recommend to use napari to view the image data.

Install napari:

pip install "napari[all]"

Save the following Python script:

import zarr, sys, napari

raw = zarr.load(sys.argv[1], mode='r', path="volumes/raw")gts = zarr.load(sys.argv[1], mode='r', path="volumes/gt_instances")

viewer = napari.Viewer(ndisplay=3)for idx, gt in enumerate(gts): viewer.add_labels( gt, rendering='translucent', blending='additive', name=f'gt_{idx}')viewer.add_image(raw[0], colormap="red", name='raw_r', blending='additive')viewer.add_image(raw[1], colormap="green", name='raw_g', blending='additive')viewer.add_image(raw[2], colormap="blue", name='raw_b', blending='additive')napari.run()

Execute:

python view_data.py /R9F03-20181030_62_B5.zarr

Metrics

S: Average of avF1 and C

avF1: Average F1 Score

C: Average ground truth coverage

clDice_TP: Average true positives clDice

FS: Number of false splits

FM: Number of false merges

tp: Relative number of true positives

For more information on our selected metrics and formal definitions please see our paper.

Baseline

To showcase the FISBe dataset together with our selection of metrics, we provide evaluation results for three baseline methods, namely PatchPerPix (ppp), Flood Filling Networks (FFN) and a non-learnt application-specific color clustering from Duan et al..For detailed information on the methods and the quantitative results please see our paper.

License

The FlyLight Instance Segmentation Benchmark (FISBe) dataset is licensed under the Creative Commons Attribution 4.0 International (CC BY 4.0) license.

Citation

If you use FISBe in your research, please use the following BibTeX entry:

@misc{mais2024fisbe, title = {FISBe: A real-world benchmark dataset for instance segmentation of long-range thin filamentous structures}, author = {Lisa Mais and Peter Hirsch and Claire Managan and Ramya Kandarpa and Josef Lorenz Rumberger and Annika Reinke and Lena Maier-Hein and Gudrun Ihrke and Dagmar Kainmueller}, year = 2024, eprint = {2404.00130}, archivePrefix ={arXiv}, primaryClass = {cs.CV} }

Acknowledgments

We thank Aljoscha Nern for providing unpublished MCFO images as well as Geoffrey W. Meissner and the entire FlyLight Project Team for valuablediscussions.P.H., L.M. and D.K. were supported by the HHMI Janelia Visiting Scientist Program.This work was co-funded by Helmholtz Imaging.

Changelog

There have been no changes to the dataset so far.All future change will be listed on the changelog page.

Contributing

If you would like to contribute, have encountered any issues or have any suggestions, please open an issue for the FISBe dataset in the accompanying github repository.

All contributions are welcome!
d
Data from: U.S. Geological Survey - Gap Analysis Project Species Range Maps...
catalog.data.gov
data.usgs.gov
Updated Nov 20, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. Geological Survey (2025). U.S. Geological Survey - Gap Analysis Project Species Range Maps CONUS_2001 [Dataset]. https://catalog.data.gov/dataset/u-s-geological-survey-gap-analysis-project-species-range-maps-conus-2001
Explore at:
Dataset updated
Nov 20, 2025
Dataset provided by
United States Geological Surveyhttp://www.usgs.gov/
Description
GAP species range data are coarse representations of the total areal extent a species occupies, in other words the geographic limits within which a species can be found (Morrison and Hall 2002). These data provide the geographic extent within which the USGS Gap Analysis Project delineates areas of suitable habitat for terrestrial vertebrate species in their species habitat maps. The range maps are created by attributing a vector file derived from the 12-digit Hydrologic Unit Dataset (USDA NRCS 2009). Modifications to that dataset are described here < https://www.sciencebase.gov/catalog/item/56d496eee4b015c306f17a42>. Attribution of the season range for each species was based on the literature and online sources (See Cross Reference section of the metadata). Attribution for each hydrologic unit within the range included values for origin (native, introduced, reintroduced, vagrant), occurrence (extant, possibly present, potentially present, extirpated), reproductive use (breeding, non-breeding, both) and season (year-round, summer, winter, migratory, vagrant). These species range data provide the biological context within which to build our species distribution models. Versioning, Naming Conventions and Codes: A composite version code is employed to allow the user to track the spatial extent, the date of the ground conditions, and the iteration of the data set for that extent/date. For example, CONUS_2001v1 represents the spatial extent of the conterminous US (CONUS), the ground condition year of 2001, and the first iteration (v1) for that extent/date. In many cases, a GAP species code is used in conjunction with the version code to identify specific data sets or files (i.e. Cooper’s Hawk Habitat Map named bCOHAx_CONUS_2001v1_HabMap).
d
Local Data Index
catalog.data.gov
s.cnmilf.com
+2more
Updated Jul 12, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
data.ny.gov (2025). Local Data Index [Dataset]. https://catalog.data.gov/dataset/local-data-index
Explore at:
Dataset updated
Jul 12, 2025
Dataset provided by
data.ny.gov
Description
Discover the breadth of data collected by the state which is local in nature. Search by county and municipality and discover, explore, and download local data. With a click, find local data across a broad range of categories from health to transportation, from recreation to economic development; find local farmer’s markets, child care regulated facilities, craft beverages, solar installations, food service establishment inspections, and much more.
c
Luxury fashion products dataset from mytheresa
crawlfeeds.com
csv, zip
Updated Aug 26, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Crawl Feeds (2024). Luxury fashion products dataset from mytheresa [Dataset]. https://crawlfeeds.com/datasets/luxury-fashion-products-dataset-from-mytheresa
Explore at:
zip, csvAvailable download formats
Dataset updated
Aug 26, 2024
Dataset authored and provided by
Crawl Feeds
License
https://crawlfeeds.com/privacy_policyhttps://crawlfeeds.com/privacy_policy
Description
Explore the "Luxury Fashion Products Dataset from Mytheresa," a comprehensive collection of data on luxury fashion items from one of the world’s leading online luxury fashion retailers.

This dataset includes detailed information on a wide range of products, such as clothing, shoes, bags, accessories, and more from top luxury brands. Each product entry provides essential details, including product names, brands, categories, prices, sizes, colors, descriptions, and availability, offering valuable insights for researchers, data analysts, and fashion professionals.

Key Features:

Extensive Product Range: Contains a wide array of luxury fashion items from Mytheresa, covering various categories like clothing, footwear, bags, and accessories from top luxury brands.

Detailed Product Information: Each entry includes key details such as product name, brand, category, price, size, color, and description, allowing for in-depth analysis of luxury fashion trends.

Ideal for Market Analysis: Perfect for researchers, data scientists, and e-commerce professionals interested in analyzing consumer preferences, studying market trends, or optimizing inventory strategies in the luxury fashion sector.

Rich Source of Fashion Data: Provides a comprehensive overview of the luxury fashion market, helping professionals stay updated on the latest trends, popular brands, and pricing strategies.

Whether you're analyzing luxury fashion trends, researching consumer behavior, or developing new market strategies, the "Luxury Fashion Products Dataset from Mytheresa" is an invaluable resource that provides detailed insights and extensive coverage of luxury fashion items.
News Events Data in Asia ( Techsalerator)
datarade.ai
Updated Jul 9, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Techsalerator (2024). News Events Data in Asia ( Techsalerator) [Dataset]. https://datarade.ai/data-products/news-events-data-in-asia-techsalerator-techsalerator
Explore at:
.json, .csv, .xls, .txtAvailable download formats
Dataset updated
Jul 9, 2024
Dataset provided by
Techsalerator LLC
Authors
Techsalerator
Area covered
Kyrgyzstan, Timor-Leste, United Arab Emirates, Brunei Darussalam, Kazakhstan, Maldives, China, Hong Kong, Uzbekistan, Iran (Islamic Republic of)
Description
Techsalerator’s News Event Data in Asia offers a detailed and expansive dataset designed to provide businesses, analysts, journalists, and researchers with comprehensive insights into significant news events across the Asian continent. This dataset captures and categorizes major events reported from a diverse range of news sources, including press releases, industry news sites, blogs, and PR platforms, offering valuable perspectives on regional developments, economic shifts, political changes, and cultural occurrences.

Key Features of the Dataset: Extensive Coverage:

The dataset aggregates news events from a wide range of sources such as company press releases, industry-specific news outlets, blogs, PR sites, and traditional media. This broad coverage ensures a diverse array of information from multiple reporting channels. Categorization of Events:

News events are categorized into various types including business and economic updates, political developments, technological advancements, legal and regulatory changes, and cultural events. This categorization helps users quickly find and analyze information relevant to their interests or sectors. Real-Time Updates:

The dataset is updated regularly to include the most current events, ensuring users have access to the latest news and can stay informed about recent developments as they happen. Geographic Segmentation:

Events are tagged with their respective countries and regions within Asia. This geographic segmentation allows users to filter and analyze news events based on specific locations, facilitating targeted research and analysis. Event Details:

Each event entry includes comprehensive details such as the date of occurrence, source of the news, a description of the event, and relevant keywords. This thorough detailing helps users understand the context and significance of each event. Historical Data:

The dataset includes historical news event data, enabling users to track trends and perform comparative analysis over time. This feature supports longitudinal studies and provides insights into the evolution of news events. Advanced Search and Filter Options:

Users can search and filter news events based on criteria such as date range, event type, location, and keywords. This functionality allows for precise and efficient retrieval of relevant information. Asian Countries and Territories Covered: Central Asia: Kazakhstan Kyrgyzstan Tajikistan Turkmenistan Uzbekistan East Asia: China Hong Kong (Special Administrative Region of China) Japan Mongolia North Korea South Korea Taiwan South Asia: Afghanistan Bangladesh Bhutan India Maldives Nepal Pakistan Sri Lanka Southeast Asia: Brunei Cambodia East Timor (Timor-Leste) Indonesia Laos Malaysia Myanmar (Burma) Philippines Singapore Thailand Vietnam Western Asia (Middle East): Armenia Azerbaijan Bahrain Cyprus Georgia Iraq Israel Jordan Kuwait Lebanon Oman Palestine Qatar Saudi Arabia Syria Turkey (partly in Europe, but often included in Asia contextually) United Arab Emirates Yemen Benefits of the Dataset: Strategic Insights: Businesses and analysts can use the dataset to gain insights into significant regional developments, economic conditions, and political changes, aiding in strategic decision-making and market analysis. Market and Industry Trends: The dataset provides valuable information on industry-specific trends and events, helping users understand market dynamics and identify emerging opportunities. Media and PR Monitoring: Journalists and PR professionals can track relevant news across Asia, enabling them to monitor media coverage, identify emerging stories, and manage public relations efforts effectively. Academic and Research Use: Researchers can utilize the dataset for longitudinal studies, trend analysis, and academic research on various topics related to Asian news and events. Techsalerator’s News Event Data in Asia is a crucial resource for accessing and analyzing significant news events across the continent. By offering detailed, categorized, and up-to-date information, it supports effective decision-making, research, and media monitoring across diverse sectors.
s
Long-range Pedestrian Dataset
shaip.com
tl.shaip.com
+1more
json
Updated Nov 26, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Shaip (2024). Long-range Pedestrian Dataset [Dataset]. https://www.shaip.com/offerings/human-animal-segmentation-datasets/
Explore at:
jsonAvailable download formats
Dataset updated
Nov 26, 2024
Dataset authored and provided by
Shaip
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
The Long-range Pedestrian Dataset is curated for the visual entertainment sector, featuring a collection of outdoor-collected images with a high resolution of 3840 x 2160 pixels. This dataset is focused on long-distance pedestrian imagery, with each target pedestrian precisely labeled with a bounding box that closely fits the boundary of the pedestrian target, providing detailed data for scene composition and character placement in visual content.

Data from: Login Data Set for Risk-Based Authentication

zenodo.org
data.niaid.nih.gov
+1more

zip

Updated Jun 30, 2022

Facebook

Twitter

Click to copy link

Link copied

Cite

Stephan Wiefling; Stephan Wiefling; Paul René Jørgensen; Paul René Jørgensen; Sigurd Thunem; Sigurd Thunem; Luigi Lo Iacono; Luigi Lo Iacono (2022). Login Data Set for Risk-Based Authentication [Dataset]. http://doi.org/10.5281/zenodo.6782156

Explore at:

zipAvailable download formats

Unique identifier

https://doi.org/10.5281/zenodo.6782156

Dataset updated

Jun 30, 2022

Dataset provided by

Zenodohttp://zenodo.org/

Authors

Stephan Wiefling; Stephan Wiefling; Paul René Jørgensen; Paul René Jørgensen; Sigurd Thunem; Sigurd Thunem; Luigi Lo Iacono; Luigi Lo Iacono

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Login Data Set for Risk-Based Authentication

Synthesized login feature data of >33M login attempts and >3.3M users on a large-scale online service in Norway. Original data collected between February 2020 and February 2021.

This data sets aims to foster research and development for Risk-Based Authentication (RBA) systems. The data was synthesized from the real-world login behavior of more than 3.3M users at a large-scale single sign-on (SSO) online service in Norway.

The users used this SSO to access sensitive data provided by the online service, e.g., a cloud storage and billing information. We used this data set to study how the Freeman et al. (2016) RBA model behaves on a large-scale online service in the real world (see Publication). The synthesized data set can reproduce these results made on the original data set (see Study Reproduction). Beyond that, you can use this data set to evaluate and improve RBA algorithms under real-world conditions.

WARNING: The feature values are plausible, but still totally artificial. Therefore, you should NOT use this data set in productive systems, e.g., intrusion detection systems.

Overview

The data set contains the following features related to each login attempt on the SSO:

Feature	Data Type	Description	Range or Example
IP Address	String	IP address belonging to the login attempt	0.0.0.0 - 255.255.255.255
Country	String	Country derived from the IP address	US
Region	String	Region derived from the IP address	New York
City	String	City derived from the IP address	Rochester
ASN	Integer	Autonomous system number derived from the IP address	0 - 600000
User Agent String	String	User agent string submitted by the client	Mozilla/5.0 (Windows NT 10.0; Win64; ...
OS Name and Version	String	Operating system name and version derived from the user agent string	Windows 10
Browser Name and Version	String	Browser name and version derived from the user agent string	Chrome 70.0.3538
Device Type	String	Device type derived from the user agent string	(`mobile`, `desktop`, `tablet`, `bot`, `unknown`)¹
User ID	Integer	Idenfication number related to the affected user account	[Random pseudonym]
Login Timestamp	Integer	Timestamp related to the login attempt	[64 Bit timestamp]
Round-Trip Time (RTT) [ms]	Integer	Server-side measured latency between client and server	1 - 8600000
Login Successful	Boolean	`True`: Login was successful, `False`: Login failed	(`true`, `false`)
Is Attack IP	Boolean	IP address was found in known attacker data set	(`true`, `false`)
Is Account Takeover	Boolean	Login attempt was identified as account takeover by incident response team of the online service	(`true`, `false`)

Data Creation

As the data set targets RBA systems, especially the Freeman et al. (2016) model, the statistical feature probabilities between all users, globally and locally, are identical for the categorical data. All the other data was randomly generated while maintaining logical relations and timely order between the features.

The timestamps, however, are not identical and contain randomness. The feature values related to IP address and user agent string were randomly generated by publicly available data, so they were very likely not present in the real data set. The RTTs resemble real values but were randomly assigned among users per geolocation. Therefore, the RTT entries were probably in other positions in the original data set.

The country was randomly assigned per unique feature value. Based on that, we randomly assigned an ASN related to the country, and generated the IP addresses for this ASN. The cities and regions were derived from the generated IP addresses for privacy reasons and do not reflect the real logical relations from the original data set.
The device types are identical to the real data set. Based on that, we randomly assigned the OS, and based on the OS the browser information. From this information, we randomly generated the user agent string. Therefore, all the logical relations regarding the user agent are identical as in the real data set.
The RTT was randomly drawn from the login success status and synthesized geolocation data. We did this to ensure that the RTTs are realistic ones.

Regarding the Data Values

Due to unresolvable conflicts during the data creation, we had to assign some unrealistic IP addresses and ASNs that are not present in the real world. Nevertheless, these do not have any effects on the risk scores generated by the Freeman et al. (2016) model.

You can recognize them by the following values:

ASNs with values >= 500.000
IP addresses in the range 10.0.0.0 - 10.255.255.255 (10.0.0.0/8 CIDR range)

Study Reproduction

Based on our evaluation, this data set can reproduce our study results regarding the RBA behavior of an RBA model using the IP address (IP address, country, and ASN) and user agent string (Full string, OS name and version, browser name and version, device type) as features.

The calculated RTT significances for countries and regions inside Norway are not identical using this data set, but have similar tendencies. The same is true for the Median RTTs per country. This is due to the fact that the available number of entries per country, region, and city changed with the data creation procedure. However, the RTTs still reflect the real-world distributions of different geolocations by city.

See RESULTS.md for more details.

Ethics

By using the SSO service, the users agreed in the data collection and evaluation for research purposes. For study reproduction and fostering RBA research, we agreed with the data owner to create a synthesized data set that does not allow re-identification of customers.

The synthesized data set does not contain any sensitive data values, as the IP addresses, browser identifiers, login timestamps, and RTTs were randomly generated and assigned.

Publication

You can find more details on our conducted study in the following journal article:

Pump Up Password Security! Evaluating and Enhancing Risk-Based Authentication on a Real-World Large-Scale Online Service (2022)
Stephan Wiefling, Paul René Jørgensen, Sigurd Thunem, and Luigi Lo Iacono.
ACM Transactions on Privacy and Security

Bibtex

@article{Wiefling_Pump_2022,
 author = {Wiefling, Stephan and Jørgensen, Paul René and Thunem, Sigurd and Lo Iacono, Luigi},
 title = {Pump {Up} {Password} {Security}! {Evaluating} and {Enhancing} {Risk}-{Based} {Authentication} on a {Real}-{World} {Large}-{Scale} {Online} {Service}},
 journal = {{ACM} {Transactions} on {Privacy} and {Security}},
 doi = {10.1145/3546069},
 publisher = {ACM},
 year  = {2022}
}

License

This data set and the contents of this repository are licensed under the Creative Commons Attribution 4.0 International (CC BY 4.0) license. See the LICENSE file for details. If the data set is used within a publication, the following journal article has to be cited as the source of the data set:

Stephan Wiefling, Paul René Jørgensen, Sigurd Thunem, and Luigi Lo Iacono: Pump Up Password Security! Evaluating and Enhancing Risk-Based Authentication on a Real-World Large-Scale Online Service. In: ACM Transactions on Privacy and Security (2022). doi: 10.1145/3546069

Few (invalid) user agents strings from the original data set could not be parsed, so their device type is empty. Perhaps this parse error is useful information for your studies, so we kept these 1526 entries.↩︎

d
Data from: HomeRange: A global database of mammalian home ranges
datadryad.org
data.niaid.nih.gov
+1more
zip
Updated Dec 5, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Maarten Broekman; Selwyn Hoeks; Rosa Freriks; Merel Langendoen; Katharina Runge; Ecaterina Savenco; Ruben ter Harmsel; Mark Huijbregts; Marlee Tucker (2022). HomeRange: A global database of mammalian home ranges [Dataset]. http://doi.org/10.5061/dryad.d2547d85x
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5061/dryad.d2547d85x
Dataset updated
Dec 5, 2022
Dataset provided by
Dryad
Authors
Maarten Broekman; Selwyn Hoeks; Rosa Freriks; Merel Langendoen; Katharina Runge; Ecaterina Savenco; Ruben ter Harmsel; Mark Huijbregts; Marlee Tucker
Time period covered
Nov 28, 2022
Description
Title of Dataset: HomeRange: A global database of mammalian home ranges

Mammalian home range papers were compiled via an extensive literature search. All home range values were extracted from the literature including individual, group and population-level home range values. Associated values were also compiled including species names, methodological information on data collection, home-range estimation method, period of data collection, study coordinates and name of location, as well as species traits derived from the studies, such as body mass, life stage, reproductive status and locomotor habit.

We also provide an R package, which can be installed from https://github.com/SHoeks/HomeRange. The HomeRange R package provides functions for downloading the latest version of the HomeRange database and loading it as a standard dataframe into R, plotting several statistics of the database and finally attaching species traits (e.g. species average body mass, trophic level). from the CO...
News Events Data in North America ( Techsalerator)
datarade.ai
Updated Jun 25, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Techsalerator (2024). News Events Data in North America ( Techsalerator) [Dataset]. https://datarade.ai/data-products/news-events-data-in-north-america-techsalerator-techsalerator
Explore at:
.json, .csv, .xls, .txtAvailable download formats
Dataset updated
Jun 25, 2024
Dataset provided by
Techsalerator LLC
Authors
Techsalerator
Area covered
Canada, United States
Description
Techsalerator’s News Event Data in North America offers a comprehensive and detailed dataset designed to provide businesses, analysts, journalists, and researchers with a thorough view of significant news events across North America. This dataset captures and categorizes major events reported from a diverse range of news sources, including press releases, industry news sites, blogs, and PR platforms, providing valuable insights into regional developments, economic shifts, political changes, and cultural events.

Key Features of the Dataset: Extensive Coverage:

The dataset aggregates news events from a wide array of sources, including company press releases, industry-specific news outlets, blogs, PR sites, and traditional media. This broad coverage ensures a diverse range of information from multiple reporting channels. Categorization of Events:

News events are categorized into various types such as business and economic updates, political developments, technological advancements, legal and regulatory changes, and cultural events. This categorization helps users quickly find and analyze information relevant to their interests or sectors. Real-Time Updates:

The dataset is updated regularly to include the most current events, ensuring that users have access to up-to-date news and can stay informed about recent developments as they happen. Geographic Segmentation:

Events are tagged with their respective countries and territories within North America. This geographic segmentation allows users to filter and analyze news events based on specific locations, facilitating targeted research and analysis. Event Details:

Each event entry includes comprehensive details such as the date of occurrence, source of the news, a description of the event, and relevant keywords. This thorough detailing helps users understand the context and significance of each event. Historical Data:

The dataset includes historical news event data, enabling users to track trends and conduct comparative analysis over time. This feature supports longitudinal studies and provides insights into how news events evolve. Advanced Search and Filter Options:

Users can search and filter news events based on criteria such as date range, event type, location, and keywords. This functionality allows for precise and efficient retrieval of relevant information. North American Countries and Territories Covered: Countries: Canada Mexico United States Territories: American Samoa (U.S. territory) French Polynesia (French overseas collectivity; included for regional relevance) Guam (U.S. territory) New Caledonia (French special collectivity; included for regional relevance) Northern Mariana Islands (U.S. territory) Puerto Rico (U.S. territory) Saint Pierre and Miquelon (French overseas territory; geographically close to North America and included for regional comprehensiveness) Wallis and Futuna (French overseas collectivity; included for regional relevance) Benefits of the Dataset: Strategic Insights: Businesses and analysts can use the dataset to gain insights into significant regional developments, economic conditions, and political changes, aiding in strategic decision-making and market analysis. Market and Industry Trends: The dataset provides valuable information on industry-specific trends and events, helping users understand market dynamics and identify emerging opportunities. Media and PR Monitoring: Journalists and PR professionals can track relevant news across North America, enabling them to monitor media coverage, identify emerging stories, and manage public relations efforts effectively. Academic and Research Use: Researchers can utilize the dataset for longitudinal studies, trend analysis, and academic research on various topics related to North American news and events. Techsalerator’s News Event Data in North America is a crucial resource for accessing and analyzing significant news events across the continent. By providing detailed, categorized, and up-to-date information, it supports effective decision-making, research, and media monitoring across diverse sectors.

Facebook

Twitter

Click to copy link

Link copied

Cite

Shanglian Zhou; Shanglian Zhou; Carlos Canchila; Carlos Canchila; Wei Song; Wei Song (2023). Fused Image dataset for convolutional neural Network-based crack Detection (FIND) [Dataset]. http://doi.org/10.5281/zenodo.6383044

Fused Image dataset for convolutional neural Network-based crack Detection (FIND)

Explore at:

3 scholarly articles cite this dataset (View in Google Scholar)

zipAvailable download formats

Unique identifier

https://doi.org/10.5281/zenodo.6383044

Dataset updated

Apr 20, 2023

Dataset provided by

Zenodohttp://zenodo.org/

Authors

Shanglian Zhou; Shanglian Zhou; Carlos Canchila; Carlos Canchila; Wei Song; Wei Song

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

The “Fused Image dataset for convolutional neural Network-based crack Detection” (FIND) is a large-scale image dataset with pixel-level ground truth crack data for deep learning-based crack segmentation analysis. It features four types of image data including raw intensity image, raw range (i.e., elevation) image, filtered range image, and fused raw image. The FIND dataset consists of 2500 image patches (dimension: 256x256 pixels) and their ground truth crack maps for each of the four data types.

The images contained in this dataset were collected from multiple bridge decks and roadways under real-world conditions. A laser scanning device was adopted for data acquisition such that the captured raw intensity and raw range images have pixel-to-pixel location correspondence (i.e., spatial co-registration feature). The filtered range data were generated by applying frequency domain filtering to eliminate image disturbances (e.g., surface variations, and grooved patterns) from the raw range data [1]. The fused image data were obtained by combining the raw range and raw intensity data to achieve cross-domain feature correlation [2,3]. Please refer to [4] for a comprehensive benchmark study performed using the FIND dataset to investigate the impact from different types of image data on deep convolutional neural network (DCNN) performance.

If you share or use this dataset, please cite [4] and [5] in any relevant documentation.

In addition, an image dataset for crack classification has also been published at [6].

References:

[1] Shanglian Zhou, & Wei Song. (2020). Robust Image-Based Surface Crack Detection Using Range Data. Journal of Computing in Civil Engineering, 34(2), 04019054. https://doi.org/10.1061/(asce)cp.1943-5487.0000873

[2] Shanglian Zhou, & Wei Song. (2021). Crack segmentation through deep convolutional neural networks and heterogeneous image fusion. Automation in Construction, 125. https://doi.org/10.1016/j.autcon.2021.103605

[3] Shanglian Zhou, & Wei Song. (2020). Deep learning–based roadway crack classification with heterogeneous image data fusion. Structural Health Monitoring, 20(3), 1274-1293. https://doi.org/10.1177/1475921720948434

[4] Shanglian Zhou, Carlos Canchila, & Wei Song. (2023). Deep learning-based crack segmentation for civil infrastructure: data types, architectures, and benchmarked performance. Automation in Construction, 146. https://doi.org/10.1016/j.autcon.2022.104678

[5] (This dataset) Shanglian Zhou, Carlos Canchila, & Wei Song. (2022). Fused Image dataset for convolutional neural Network-based crack Detection (FIND) [Data set]. Zenodo. https://doi.org/10.5281/zenodo.6383044

[6] Wei Song, & Shanglian Zhou. (2020). Laser-scanned roadway range image dataset (LRRD). Laser-scanned Range Image Dataset from Asphalt and Concrete Roadways for DCNN-based Crack Classification, DesignSafe-CI. https://doi.org/10.17603/ds2-bzv3-nc78

Clear search

Close search

Google apps

Main menu

Fused Image dataset for convolutional neural Network-based crack Detection...

Street Video Dataset

Metadata Availability: Insights into Participant Details

Specifications:

Insights into Image Data:

License:

Updates and Customization:

Research Papers Dataset

Geospatial Database of Hydroclimate Variables, Spring Mountains and Sheep...

South Range, MI Population Breakdown by Gender and Age Dataset: Male and...

About this dataset

Content

Inspiration

Recommended for further research

ANN development + final testing datasets

Data from: Product Images Dataset

Coffee Shop Daily Revenue Prediction Dataset

Dataset Overview

Columns & Variables

Target Variable

Data Distribution & Insights

Use Cases & Applications

Real-World Applications in the Food & Beverage Industry

Conclusion

Credit Card Eligibility Data: Determining Factors

Data from: Current and projected research data storage needs of Agricultural...

Range View Road Cross Street Data in Valier, MT

Data from: FISBe: A real-world benchmark dataset for instance segmentation...

optional:import numpy as npraw_np = np.array(raw)

Data from: U.S. Geological Survey - Gap Analysis Project Species Range Maps...

Local Data Index

Luxury fashion products dataset from mytheresa

News Events Data in Asia ( Techsalerator)

Long-range Pedestrian Dataset

Data from: Login Data Set for Risk-Based Authentication

Data from: HomeRange: A global database of mammalian home ranges

Title of Dataset: HomeRange: A global database of mammalian home ranges

News Events Data in North America ( Techsalerator)

Fused Image dataset for convolutional neural Network-based crack Detection (FIND)