Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This dataset was created by AYUSH SINGH331
Released under MIT
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
DISL
The DISL dataset features a collection of 514506 unique Solidity files that have been deployed to Ethereum mainnet. It caters to the need for a large and diverse dataset of real-world smart contracts. DISL serves as a resource for developing machine learning systems and for benchmarking software engineering tools designed for smart contracts.
Content
the raw subset has full contracts source code and it's not deduplicated, it has 3,298,271 smart contracts the… See the full description on the dataset page: https://huggingface.co/datasets/ASSERT-KTH/DISL.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
## Overview
Network Devices Large is a dataset for object detection tasks - it contains Modem Router Wireless annotations for 225 images.
## Getting Started
You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
## License
This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
Facebook
TwitterBig-Math: UNVERIFIED
[!WARNING] WARNING: This dataset contains ONLY questions whose answers have not been verified to be correct. Use this dataset at your own caution.
Dataset Creation
Big-Math-Unverified is created as an offshoot of the Big-Math dataset (HuggingFace Dataset Link). Big-Math-Unverified goes through the same filters as the rest of Big-Math (eg. remove non-English, remove multiple choice, etc.), except that these problems were not solved in any of the… See the full description on the dataset page: https://huggingface.co/datasets/SynthLabsAI/Big-Math-RL-UNVERIFIED.
Facebook
TwitterAttribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Optimized for Geospatial and Big Data Analysis
This dataset is a refined and enhanced version of the original DataCo SMART SUPPLY CHAIN FOR BIG DATA ANALYSIS dataset, specifically designed for advanced geospatial and big data analysis. It incorporates geocoded information, language translations, and cleaned data to enable applications in logistics optimization, supply chain visualization, and performance analytics.
src_points.geojson: Source point geometries. dest_points.geojson: Destination point geometries. routes.geojson: Line geometries representing source-destination routes. DataCoSupplyChainDatasetRefined.csv
src_points.geojson
dest_points.geojson
routes.geojson
This dataset is based on the original dataset published by Fabian Constante, Fernando Silva, and António Pereira:
Constante, Fabian; Silva, Fernando; Pereira, António (2019), “DataCo SMART SUPPLY CHAIN FOR BIG DATA ANALYSIS”, Mendeley Data, V5, doi: 10.17632/8gx2fvg2k6.5.
Refinements include geospatial processing, translation, and additional cleaning by the uploader to enhance usability and analytical potential.
This dataset is designed to empower data scientists, researchers, and business professionals to explore the intersection of geospatial intelligence and supply chain optimization.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
With the rapid development of data acquisition and storage space, massive datasets exhibited with large sample size emerge increasingly and make more advanced statistical tools urgently need. To accommodate such big volume in the analysis, a variety of methods have been proposed in the circumstances of complete or right censored survival data. However, existing development of big data methodology has not attended to interval-censored outcomes, which are ubiquitous in cross-sectional or periodical follow-up studies. In this work, we propose an easily implemented divide-and-combine approach for analyzing massive interval-censored survival data under the additive hazards model. We establish the asymptotic properties of the proposed estimator, including the consistency and asymptotic normality. In addition, the divide-and-combine estimator is shown to be asymptotically equivalent to the full-data-based estimator obtained from analyzing all data together. Simulation studies suggest that, relative to the full-data-based approach, the proposed divide-and-combine approach has desirable advantage in terms of computation time, making it more applicable to large-scale data analysis. An application to a set of interval-censored data also demonstrates the practical utility of the proposed method.
Facebook
TwitterUse of JSATS can generate a large volume of data. To manage and visualize the data, an integrated suite of science-based tools known as the Hydropower Biological Evaluation Toolset (HBET) can be used.
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Foundation Tactile (FoTa) - a multi-sensor multi-task large dataset for tactile sensing
This repository stores the FoTa dataset and the pretrained checkpoints of Transferable Tactile Transformers (T3).
Paper
Code
Colab
[Project Website] Jialiang (Alan) Zhao, Yuxiang Ma, Lirui Wang, and Edward H. Adelson MIT CSAIL
Overview
FoTa was released with Transferable Tactile Transformers (T3) as a large dataset for tactile… See the full description on the dataset page: https://huggingface.co/datasets/alanz-mit/FoundationTactile.
Facebook
TwitterAn incomplete collection of open data domains throughout the U.S. (intended for comparison with King County open data)
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The dataset was compiled to examine the use of ChatGPT 3.5 in educational settings, particularly for creating and personalizing concept maps. The data has been organized into three folders: Maps, Texts, and Questionnaires. The Maps folder contains the graphical representation of the concept maps and the PlanUML code for drawing them in Italian and English. The Texts folder contains the source text used as input for the map's creation The Questionnaires folder includes the students' responses to the three administered questionnaires.
Facebook
TwitterAttribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
We present DaMuEL, a large Multilingual Dataset for Entity Linking containing data in 53 languages. DaMuEL consists of two components: a knowledge base that contains language-agnostic information about entities, including their claims from Wikidata and named entity types (PER, ORG, LOC, EVENT, BRAND, WORK_OF_ART, MANUFACTURED); and Wikipedia texts with entity mentions linked to the knowledge base, along with language-specific text from Wikidata such as labels, aliases, and descriptions, stored separately for each language. The Wikidata QID is used as a persistent, language-agnostic identifier, enabling the combination of the knowledge base with language-specific texts and information for each entity. Wikipedia documents deliberately annotate only a single mention for every entity present; we further automatically detect all mentions of named entities linked from each document. The dataset contains 27.9M named entities in the knowledge base and 12.3G tokens from Wikipedia texts. The dataset is published under the CC BY-SA licence.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This repository contains a selection of behavioral datasets collected using soluble agents and labeled using realistic threat simulation and IDS rules. The collected datasets are anonymized and aggregated using time window representations. The dataset generation pipeline preprocesses the application logs from the corporate network, structures them according to entities and users inventory, and labels them based on the IDS and phishing simulation appliances.
This repository is associated with the article "RBD24: A labelled dataset with risk activities using log applications data" published in the journal Computers & Security. For more information go to https://doi.org/10.1016/j.cose.2024.104290" target="_blank" rel="noreferrer noopener">https://doi.org/10.1016/j.cose.2024.104290
The RBD24 dataset comprises various risk activities collected from real entities and users over a period of 15 days, with the samples segmented by Desktop (DE) and Smartphone (SM) devices.
| DatasetId | Entity | Observed Behaviour | Groundtruth | Sample Shape |
| Crypto_desktop.parquet | DE | Miner Checking | IDS | 0: 738/161202, 1: 11/1343 |
| Crypto_smarphone.parquet | SM | Miner Checking | IDS | 0: 613/180021, 1: 4/956 |
| OutFlash_desktop.parquet | DE | Outdated software components | IDS | 0: 738/161202, 1: 56/10820 |
| OutFlash_smartphone.parquet | SM | Outdated software components | IDS | 0: 613/180021, 1: 22/6639 |
| OutTLS_desktop.parquet | DE | Outdated TLS protocol | IDS | 0: 738/161202, 1: 18/2458 |
| OutTLS_smartphone.parquet | SM | Outdated TLS protocol | IDS | 0: 613/180021, 1: 11/2930 |
| P2P_desktop.parquet | DE | P2P Activity | IDS | 0: 738/161202, 1: 177/35892 |
| P2P_smartphone.parquet | SM | P2P Activity | IDS | 0: 613/180021, 1: 94/21688 |
| NonEnc_desktop.parquet | DE | Non-encrypted password | IDS | 0: 738/161202, 1: 291/59943 |
| NonEnc_smaprthone.parquet | SM | Non-encrypted password | IDS | 0: 613/180021, 1: 167/41434 |
| Phishing_desktop.parquet | DE | Phishing email |
Experimental Campaign | 0: 98/13864, 1: 19/3072 |
| Phishing_smartphone.parquet | SM | Phishing email | Experimental Campaign | 0: 117/34006, 1: 26/8968 |
To collect the dataset, we have deployed multiple agents and soluble agents within an infrastructure with
more than 3k entities, comprising laptops, workstations, and smartphone devices. The methods to build
ground truth are as follows:
- Simulator: We launch different realistic phishing campaigns, aiming to expose user credentials or defeat access to a service.
- IDS: We deploy an IDS to collect various alerts associated with behavioral anomalies, such as cryptomining or peer-to-peer traffic.
For each user exposed to the behaviors stated in the summary table, different TW is computed, aggregating
user behavior within a fixed time interval. This TW serves as the basis for generating various supervised
and unsupervised methods.
The time windows (TW) are a data representation based on aggregated logs from multimodal sources between two
timestamps. In this study, logs from HTTP, DNS, SSL, and SMTP are taken into consideration, allowing the
construction of rich behavioral profiles. The indicators described in the TE are a set of manually curated
interpretable features designed to describe device-level properties within the specified time frame. The most
influential features are described below.
Parquet format uses a columnar storage format, which enhances efficiency and compression, making it suitable for large datasets and complex analytical tasks. It has support across various tools and languages, including Python. Parquet can be used with pandas library in Python, allowing pandas to read and write Parquet files through the `pyarrow` or `fastparquet` libraries. Its efficient data retrieval and fast query execution improve performance over other formats. Compared to row-based storage formats such as CSV, Parquet's columnar storage greatly reduces read times and storage costs for large datasets. Although binary formats like HDF5 are effective for specific use cases, Parquet provides broader compatibility and optimization. The provided datasets use the Parquet format. Here’s an example of how to retrieve data using pandas, ensure you have the fastparquet library installed:
```pythonimport pandas as pd
# Reading a Parquet filedf = pd.read_parquet( 'path_to_your_file.parquet', engine='fastparquet' )
```
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
USHAP (USHighAirPollutants) is one of the series of long-term, full-coverage, high-resolution, and high-quality datasets of ground-level air pollutants for the United States. It is generated from the big data (e.g., ground-based measurements, satellite remote sensing products, atmospheric reanalysis, and model simulations) using artificial intelligence by considering the spatiotemporal heterogeneity of air pollution. This is the big data-derived seamless (spatial coverage = 100%) daily, monthly, and yearly 1 km (i.e., D1K, M1K, and Y1K) ground-level PM2.5 dataset in the United States from 2000 to 2020. Our daily PM2.5 estimates agree well with ground measurements with an average cross-validation coefficient of determination (CV-R2) of 0.82 and normalized root-mean-square error (NRMSE) of 0.40, respectively. All the data will be made public online once our paper is accepted, and if you want to use the USHighPM2.5 dataset for related scientific research, please contact us (Email: weijing_rs@163.com; weijing@umd.edu). Wei, J., Wang, J., Li, Z., Kondragunta, S., Anenberg, S., Wang, Y., Zhang, H., Diner, D., Hand, J., Lyapustin, A., Kahn, R., Colarco, P., da Silva, A., and Ichoku, C. Long-term mortality burden trends attributed to black carbon and PM2.5 from wildfire emissions across the continental USA from 2000 to 2020: a deep learning modelling study. The Lancet Planetary Health, 2023, 7, e963–e975. https://doi.org/10.1016/S2542-5196(23)00235-8 More air quality datasets of different air pollutants can be found at: https://weijing-rs.github.io/product.html
Facebook
TwitterThis document, Innovating the Data Ecosystem: An Update of The Federal Big Data Research and Development Strategic Plan, updates the 2016 Federal Big Data Research and Development Strategic Plan. This plan updates the vision and strategies on the research and development needs for big data laid out in the 2016 Strategic Plan through the six strategies areas (enhance the reusability and integrity of data; enable innovative, user-driven data science; develop and enhance the robustness of the federated ecosystem; prioritize privacy, ethics, and security; develop necessary expertise and diverse talent; and enhance U.S. leadership in the international context) to enhance data value and reusability and responsiveness to federal policies on data sharing and management.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Context
The dataset tabulates the population of Gratis by gender, including both male and female populations. This dataset can be utilized to understand the population distribution of Gratis across both sexes and to determine which sex constitutes the majority.
Key observations
There is a slight majority of female population, with 50.0% of total population being female. Source: U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates.
When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates.
Scope of gender :
Please note that American Community Survey asks a question about the respondents current sex, but not about gender, sexual orientation, or sex at birth. The question is intended to capture data for biological sex, not gender. Respondents are supposed to respond with the answer as either of Male or Female. Our research and this dataset mirrors the data reported as Male and Female for gender distribution analysis. No further analysis is done on the data reported from the Census Bureau.
Variables / Data Columns
Good to know
Margin of Error
Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.
Custom data
If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.
Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.
This dataset is a part of the main dataset for Gratis Population by Race & Ethnicity. You can refer the same here
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
In software development, it’s common to reuse existing source code by copying and pasting, resulting in the proliferation of numerous code clones—similar or identical code fragments—that detrimentally affect software quality and maintainability. Although several techniques for code clone detection exist, many encounter challenges in effectively identifying semantic clones due to their inability to extract syntax and semantics information. Fewer techniques leverage low-level source code representations like bytecode or assembly for clone detection. This work introduces a novel code representation for identifying syntactic and semantic clones in Java source code. It integrates high-level features extracted from the Abstract Syntax Tree with low-level features derived from intermediate representations generated by static analysis tools, like the Soot framework. Leveraging this combined representation, fifteen machine-learning models are trained to effectively detect code clones. Evaluation on a large dataset demonstrates the models’ efficacy in accurately identifying semantic clones. Among these classifiers, ensemble classifiers, such as the LightGBM classifier, exhibit exceptional accuracy. Linearly combining features enhances the effectiveness of the models compared to multiplication and distance combination techniques. The experimental findings indicate that the proposed method can outperform the current clone detection techniques in detecting semantic clones.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
We include the sets of adversarial questions for each of the seven EquityMedQA datasets (OMAQ, EHAI, FBRT-Manual, FBRT-LLM, TRINDS, CC-Manual, and CC-LLM), the three other non-EquityMedQA datasets used in this work (HealthSearchQA, Mixed MMQA-OMAQ, and Omiye et al.), as well as the data generated as a part of the empirical study, including the generated model outputs (Med-PaLM 2 [1] primarily, with Med-PaLM [2] answers for pairwise analyses) and ratings from human annotators (physicians, health equity experts, and consumers). See the paper for details on all datasets.
We include other datasets evaluated in this work: HealthSearchQA [2], Mixed MMQA-OMAQ, and Omiye et al [3].
A limited number of data elements described in the paper are not included here. The following elements are excluded:
The reference answers written by physicians to HealthSearchQA questions, introduced in [2], and the set of corresponding pairwise ratings. This accounts for 2,122 rated instances.
The free-text comments written by raters during the ratings process.
Demographic information associated with the consumer raters (only age group information is included).
Singhal, K., et al. Towards expert-level medical question answering with large language models. arXiv preprint arXiv:2305.09617 (2023).
Singhal, K., Azizi, S., Tu, T. et al. Large language models encode clinical knowledge. Nature 620, 172–180 (2023). https://doi.org/10.1038/s41586-023-06291-2
Omiye, J.A., Lester, J.C., Spichak, S. et al. Large language models propagate race-based medicine. npj Digit. Med. 6, 195 (2023). https://doi.org/10.1038/s41746-023-00939-z
Abacha, Asma Ben, et al. "Overview of the medical question answering task at TREC 2017 LiveQA." TREC. 2017.
Abacha, Asma Ben, et al. "Bridging the gap between consumers’ medication questions and trusted answers." MEDINFO 2019: Health and Wellbeing e-Networks for All. IOS Press, 2019. 25-29.
Independent Ratings [ratings_independent.csv]: Contains ratings of the presence of bias and its dimensions in Med-PaLM 2 outputs using the independent assessment rubric for each of the datasets studied. The primary response regarding the presence of bias is encoded in the column bias_presence with three possible values (No bias, Minor bias, Severe bias). Binary assessments of the dimensions of bias are encoded in separate columns (e.g., inaccuracy_for_some_axes). Instances for the Mixed MMQA-OMAQ dataset are triple-rated for each rater group; other datasets are single-rated. Instances were missing for five instances in MMQA-OMAQ and two instances in CC-Manual. This file contains 7,519 rated instances.
Paired Ratings [ratings_pairwise.csv]: Contains comparisons of the presence or degree of bias and its dimensions in Med-PaLM and Med-PaLM 2 outputs for each of the datasets studied. Pairwise responses are encoded in terms of two binary columns corresponding to which of the answers was judged to contain a greater degree of bias (e.g., Med-PaLM-2_answer_more_bias). Dimensions of bias are encoded in the same way as for ratings_independent.csv. Instances for the Mixed MMQA-OMAQ dataset are triple-rated for each rater group; other datasets are single-rated. Four ratings were missing (one for EHAI, two for FRT-Manual, one for FBRT-LLM). This file contains 6,446 rated instances.
Counterfactual Paired Ratings [ratings_counterfactual.csv]: Contains ratings under the counterfactual rubric for pairs of questions defined in the CC-Manual and CC-LLM datasets. Contains a binary assessment of the presence of bias (bias_presence), columns for each dimension of bias, and categorical columns corresponding to other elements of the rubric (ideal_answers_diff, how_answers_diff). Instances for the CC-Manual dataset are triple-rated, instances for CC-LLM are single-rated. Due to a data processing error, we removed questions that refer to `Natal'' from the analysis of the counterfactual rubric on the CC-Manual dataset. This affects three questions (corresponding to 21 pairs) derived from one seed question based on the TRINDS dataset. This file contains 1,012 rated instances.
Open-ended Medical Adversarial Queries (OMAQ) [equitymedqa_omaq.csv]: Contains questions that compose the OMAQ dataset. The OMAQ dataset was first described in [1].
Equity in Health AI (EHAI) [equitymedqa_ehai.csv]: Contains questions that compose the EHAI dataset.
Failure-Based Red Teaming - Manual (FBRT-Manual) [equitymedqa_fbrt_manual.csv]: Contains questions that compose the FBRT-Manual dataset.
Failure-Based Red Teaming - LLM (FBRT-LLM); full [equitymedqa_fbrt_llm.csv]: Contains questions that compose the extended FBRT-LLM dataset.
Failure-Based Red Teaming - LLM (FBRT-LLM) [equitymedqa_fbrt_llm_661_sampled.csv]: Contains questions that compose the sampled FBRT-LLM dataset used in the empirical study.
TRopical and INfectious DiseaseS (TRINDS) [equitymedqa_trinds.csv]: Contains questions that compose the TRINDS dataset.
Counterfactual Context - Manual (CC-Manual) [equitymedqa_cc_manual.csv]: Contains pairs of questions that compose the CC-Manual dataset.
Counterfactual Context - LLM (CC-LLM) [equitymedqa_cc_llm.csv]: Contains pairs of questions that compose the CC-LLM dataset.
HealthSearchQA [other_datasets_healthsearchqa.csv]: Contains questions sampled from the HealthSearchQA dataset [1,2].
Mixed MMQA-OMAQ [other_datasets_mixed_mmqa_omaq]: Contains questions that compose the Mixed MMQA-OMAQ dataset.
Omiye et al. [other datasets_omiye_et_al]: Contains questions proposed in Omiye et al. [3].
Version 2: Updated to include ratings and generated model outputs. Dataset files were updated to include unique ids associated with each question. Version 1: Contained datasets of questions without ratings. Consistent with v1 available as a preprint on Arxiv (https://arxiv.org/abs/2403.12025)
WARNING: These datasets contain adversarial questions designed specifically to probe biases in AI systems. They can include human-written and model-generated language and content that may be inaccurate, misleading, biased, disturbing, sensitive, or offensive.
NOTE: the content of this research repository (i) is not intended to be a medical device; and (ii) is not intended for clinical use of any kind, including but not limited to diagnosis or prognosis.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
## Overview
3 BIG DATA is a dataset for object detection tasks - it contains Cats annotations for 943 images.
## Getting Started
You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
## License
This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The CIFAR-100 dataset is a large dataset of labeled images. It is a popular dataset for machine learning and artificial intelligence research. The dataset consists of 100,000 32x32 images. These images are split into 100 mutually exclusive classes, with 1,000 images per class. The classes are animals, vehicles, and other objects.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Road safety systems are essential for planning, managing, and improving road infrastructure and decreasing road accidents. Manual systems used for road safety assessments are inefficient, time consuming, and prone to error. Some automated systems using sensors, cameras, lidar, and radar to detect nearby obstacles such as vehicles, pedestrians, lane lines, some traffic signs and parking slots have been introduced to reduce road fatalities by minimizing human error. However, the existing road safety systems available in industry are unable to accurately detect all road safety attributes required by the Australian Road Assessment Program (AusRAP), a program launched to establish a safer road system through high-risk roads inspection, developing star ratings and safer roads investment plans to mitigate the possibility of meeting with accidents. Therefore, it is important to explore novel techniques and develop better automated systems which can accurately detect and classify all road safety attributes.
This research focuses on the development of a novel deep learning technique for the analysis of road safety attributes. Various architectures, learning and optimisation techniques have been investigated to develop an appropriate deep learning-based technique that can detect road safety attributes with high accuracy. Firstly, a single-stage segmentation and classification technique to automatically identify AusRAP attributes has been investigated. Secondly, multi-stage segmentation and classification techniques using various classifiers have been investigated. Finally, Genetic Algorithm (GA) and Particle Swarm Optimisation (PSO)-based techniques have been investigated to optimise the proposed deep learning techniques.
The proposed techniques were evaluated on a real-world dataset using roadside videos provided by the Department of Transport and Main Roads (DTMR), Queensland, Australia, and Australian Road Research Board (ARRB). The classification accuracy was used as a metric to measure the performance, and to further validate the efficacy, different diversity measures such as specificity, sensitivity, and f1-score were used. An appropriate analysis and a comparison with existing techniques were conducted and presented. The results and analysis show that the proposed single-stage and multi-stage deep learning-based techniques achieve classification accuracy and misclassifications better than the existing state-of-the-art segmentation and classification techniques. It was found through experimentation that proposed single stage technique can avoid re-training the whole model using all training samples which requires a lot of time when a new attribute is introduced. Moreover, through extensive experimentation, it was found that it is not always necessarily required to have a large dataset for training. Effective solutions were found to eliminate the requirement to annotate large number of samples for each attribute to produce acceptable accuracy for industry. Both single-stage and multi-stage deep learning-based techniques were also validated using real world test data without cropping and pixel wise prediction was obtained for each object. The accurate location of the predicted object was known in predictions and hence, bounding box problem was avoided. Through the incorporation of optimisation techniques, optimum parameters suitable for road safety attributes were determined. The optimum parameters proved to be effective in terms of classification accuracy and time to achieve minimum error.
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This dataset was created by AYUSH SINGH331
Released under MIT