18 datasets found

D
Data Annotation Tool Market Report
marketresearchforecast.com
doc, pdf, ppt
Updated Dec 9, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Market Research Forecast (2024). Data Annotation Tool Market Report [Dataset]. https://www.marketresearchforecast.com/reports/data-annotation-tool-market-10075
Explore at:
doc, ppt, pdfAvailable download formats
Dataset updated
Dec 9, 2024
Dataset authored and provided by
Market Research Forecast
License
https://www.marketresearchforecast.com/privacy-policyhttps://www.marketresearchforecast.com/privacy-policy
Time period covered
2025 - 2033
Area covered
Global
Variables measured
Market Size
Description
The size of the Data Annotation Tool Market market was valued at USD 3.9 USD billion in 2023 and is projected to reach USD 6.64 USD billion by 2032, with an expected CAGR of 7.9% during the forecast period. A Data Annotation Tool is defined as the software that can be employed to make annotations to data hence helping a learning computer model learn patterns. These tools provide a way of segregating the data types to include images, texts, and audio, as well as videos. Some of the subcategories of annotation include images such as bounding boxes, segmentation, text such as entity recognition, sentiment analysis, audio such as transcription, sound labeling, and video such as object tracking. Other common features depend on the case but they commonly consist of interfaces, cooperation with others, suggestion of labels, and quality assurance. It can be used in the automotive industry (object detection for self-driving cars), text processing (classification of text), healthcare (medical imaging), and retail (recommendation). These tools get applied in training good quality, accurately labeled data sets for the engineering of efficient AI systems. Key drivers for this market are: Increasing Adoption of Cloud-based Managed Services to Drive Market Growth. Potential restraints include: Adverse Health Effect May Hamper Market Growth. Notable trends are: Growing Implementation of Touch-based and Voice-based Infotainment Systems to Increase Adoption of Intelligent Cars.
Website Screenshots Dataset
universe.roboflow.com
zip
Updated Aug 19, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Roboflow (2022). Website Screenshots Dataset [Dataset]. https://universe.roboflow.com/roboflow-gw7yv/website-screenshots/model/1
Explore at:
zipAvailable download formats
Dataset updated
Aug 19, 2022
Dataset authored and provided by
Roboflow
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Variables measured
Elements Bounding Boxes
Description
About This Dataset

The Roboflow Website Screenshots dataset is a synthetically generated dataset composed of screenshots from over 1000 of the world's top websites. They have been automatically annotated to label the following classes: :fa-spacer: * button - navigation links, tabs, etc. * heading - text that was enclosed in <h1> to <h6> tags. * link - inline, textual <a> tags. * label - text labeling form fields. * text - all other text. * image - <img>, <svg>, or <video> tags, and icons. * iframe - ads and 3rd party content.

Example

This is an example image and annotation from the dataset: https://i.imgur.com/mOG3u3Z.png" alt="WIkipedia Screenshot">

Usage

Annotated screenshots are very useful in Robotic Process Automation. But they can be expensive to label. This dataset would cost over $4000 for humans to label on popular labeling services. We hope this dataset provides a good starting point for your project. Try it with a model from our model library.

Collecting Custom Data

Roboflow is happy to provide a custom screenshots dataset to meet your particular needs. We can crawl public or internal web applications. Just reach out and we'll be happy to provide a quote!

About Roboflow

Roboflow makes managing, preprocessing, augmenting, and versioning datasets for computer vision seamless. :fa-spacer: Developers reduce 50% of their boilerplate code when using Roboflow's workflow, save training time, and increase model reproducibility. :fa-spacer:
n
Great African Food Company Crop Type Tanzania
cmr.earthdata.nasa.gov
access.earthdata.nasa.gov
Updated Oct 10, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2023). Great African Food Company Crop Type Tanzania [Dataset]. http://doi.org/10.34911/rdnt.5vx40r
Explore at:
Unique identifier
https://doi.org/10.34911/rdnt.5vx40r
Dataset updated
Oct 10, 2023
Time period covered
Jan 1, 2020 - Jan 1, 2023
Area covered

Description
This dataset contains field boundaries and crop types from farms in Tanzania. Great African Food Company used Farmforce app to collect a point within each field, and recorded other properties including area of the field.

Radiant Earth Foundation team used the point measurements from the ground data collection and the area of each field overlaid on satellite imagery (multiple Sentinel-2 scenes during the growing season, and Google basemap) to draw the polygons for each field. These polygons do not cover the entirety of the field, and are always enclosed within the field. Therefore, they should not be used for field boundary detection, rather as reference polygons for crop type classification. Data points that were not clear if they belong to a neighboring farm (e.g. the point was on the edge of two farms)were removed from the dataset. Finally, ground reference polygons were matched with corresponding time series data from Sentinel-2 satellites (listed in the source imagery property of each label item).
Data from: X-ray CT data with semantic annotations for the paper "A workflow...
catalog.data.gov
s.cnmilf.com
+2more
Updated May 2, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Agricultural Research Service (2024). X-ray CT data with semantic annotations for the paper "A workflow for segmenting soil and plant X-ray CT images with deep learning in Google’s Colaboratory" [Dataset]. https://catalog.data.gov/dataset/x-ray-ct-data-with-semantic-annotations-for-the-paper-a-workflow-for-segmenting-soil-and-p-d195a
Explore at:
Dataset updated
May 2, 2024
Dataset provided by
Agricultural Research Servicehttps://www.ars.usda.gov/
Description
Leaves from genetically unique Juglans regia plants were scanned using X-ray micro-computed tomography (microCT) on the X-ray μCT beamline (8.3.2) at the Advanced Light Source (ALS) in Lawrence Berkeley National Laboratory (LBNL), Berkeley, CA USA). Soil samples were collected in Fall of 2017 from the riparian oak forest located at the Russell Ranch Sustainable Agricultural Institute at the University of California Davis. The soil was sieved through a 2 mm mesh and was air dried before imaging. A single soil aggregate was scanned at 23 keV using the 10x objective lens with a pixel resolution of 650 nanometers on beamline 8.3.2 at the ALS. Additionally, a drought stressed almond flower bud (Prunus dulcis) from a plant housed at the University of California, Davis, was scanned using a 4x lens with a pixel resolution of 1.72 µm on beamline 8.3.2 at the ALS Raw tomographic image data was reconstructed using TomoPy. Reconstructions were converted to 8-bit tif or png format using ImageJ or the PIL package in Python before further processing. Images were annotated using Intel’s Computer Vision Annotation Tool (CVAT) and ImageJ. Both CVAT and ImageJ are free to use and open source. Leaf images were annotated in following Théroux-Rancourt et al. (2020). Specifically, Hand labeling was done directly in ImageJ by drawing around each tissue; with 5 images annotated per leaf. Care was taken to cover a range of anatomical variation to help improve the generalizability of the models to other leaves. All slices were labeled by Dr. Mina Momayyezi and Fiona Duong.To annotate the flower bud and soil aggregate, images were imported into CVAT. The exterior border of the bud (i.e. bud scales) and flower were annotated in CVAT and exported as masks. Similarly, the exterior of the soil aggregate and particulate organic matter identified by eye were annotated in CVAT and exported as masks. To annotate air spaces in both the bud and soil aggregate, images were imported into ImageJ. A gaussian blur was applied to the image to decrease noise and then the air space was segmented using thresholding. After applying the threshold, the selected air space region was converted to a binary image with white representing the air space and black representing everything else. This binary image was overlaid upon the original image and the air space within the flower bud and aggregate was selected using the “free hand” tool. Air space outside of the region of interest for both image sets was eliminated. The quality of the air space annotation was then visually inspected for accuracy against the underlying original image; incomplete annotations were corrected using the brush or pencil tool to paint missing air space white and incorrectly identified air space black. Once the annotation was satisfactorily corrected, the binary image of the air space was saved. Finally, the annotations of the bud and flower or aggregate and organic matter were opened in ImageJ and the associated air space mask was overlaid on top of them forming a three-layer mask suitable for training the fully convolutional network. All labeling of the soil aggregate and soil aggregate images was done by Dr. Devin Rippner. These images and annotations are for training deep learning models to identify different constituents in leaves, almond buds, and soil aggregates Limitations: For the walnut leaves, some tissues (stomata, etc.) are not labeled and only represent a small portion of a full leaf. Similarly, both the almond bud and the aggregate represent just one single sample of each. The bud tissues are only divided up into buds scales, flower, and air space. Many other tissues remain unlabeled. For the soil aggregate annotated labels are done by eye with no actual chemical information. Therefore particulate organic matter identification may be incorrect. Resources in this dataset:Resource Title: Annotated X-ray CT images and masks of a Forest Soil Aggregate. File Name: forest_soil_images_masks_for_testing_training.zipResource Description: This aggregate was collected from the riparian oak forest at the Russell Ranch Sustainable Agricultural Facility. The aggreagate was scanned using X-ray micro-computed tomography (microCT) on the X-ray μCT beamline (8.3.2) at the Advanced Light Source (ALS) in Lawrence Berkeley National Laboratory (LBNL), Berkeley, CA USA) using the 10x objective lens with a pixel resolution of 650 nanometers. For masks, the background has a value of 0,0,0; pores spaces have a value of 250,250, 250; mineral solids have a value= 128,0,0; and particulate organic matter has a value of = 000,128,000. These files were used for training a model to segment the forest soil aggregate and for testing the accuracy, precision, recall, and f1 score of the model.Resource Title: Annotated X-ray CT images and masks of an Almond bud (P. Dulcis). File Name: Almond_bud_tube_D_P6_training_testing_images_and_masks.zipResource Description: Drought stressed almond flower bud (Prunis dulcis) from a plant housed at the University of California, Davis, was scanned by X-ray micro-computed tomography (microCT) on the X-ray μCT beamline (8.3.2) at the Advanced Light Source (ALS) in Lawrence Berkeley National Laboratory (LBNL), Berkeley, CA USA) using the 4x lens with a pixel resolution of 1.72 µm using. For masks, the background has a value of 0,0,0; air spaces have a value of 255,255, 255; bud scales have a value= 128,0,0; and flower tissues have a value of = 000,128,000. These files were used for training a model to segment the almond bud and for testing the accuracy, precision, recall, and f1 score of the model.Resource Software Recommended: Fiji (ImageJ),url: https://imagej.net/software/fiji/downloads Resource Title: Annotated X-ray CT images and masks of Walnut leaves (J. Regia) . File Name: 6_leaf_training_testing_images_and_masks_for_paper.zipResource Description: Stems were collected from genetically unique J. regia accessions at the 117 USDA-ARS-NCGR in Wolfskill Experimental Orchard, Winters, California USA to use as scion, and were grafted by Sierra Gold Nursery onto a commonly used commercial rootstock, RX1 (J. microcarpa × J. regia). We used a common rootstock to eliminate any own-root effects and to simulate conditions for a commercial walnut orchard setting, where rootstocks are commonly used. The grafted saplings were repotted and transferred to the Armstrong lathe house facility at the University of California, Davis in June 2019, and kept under natural light and temperature. Leaves from each accession and treatment were scanned using X-ray micro-computed tomography (microCT) on the X-ray μCT beamline (8.3.2) at the Advanced Light Source (ALS) in Lawrence Berkeley National Laboratory (LBNL), Berkeley, CA USA) using the 10x objective lens with a pixel resolution of 650 nanometers. For masks, the background has a value of 170,170,170; Epidermis value= 85,85,85; Mesophyll value= 0,0,0; Bundle Sheath Extension value= 152,152,152; Vein value= 220,220,220; Air value = 255,255,255.Resource Software Recommended: Fiji (ImageJ),url: https://imagej.net/software/fiji/downloads
Z
EmoLit
data.niaid.nih.gov
zenodo.org
Updated Jun 27, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Rei, Luis (2023). EmoLit [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7883953
Explore at:
Dataset updated
Jun 27, 2023
Dataset authored and provided by
Rei, Luis
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
Emotions in Literature

Description Literature sentences from Project Gutenberg. 38 emotion labels (+neutral examples). Semi-Supervised dataset.

Article

Detecting Fine-Grained Emotions in Literature

Please cite:

@Article{app13137502, AUTHOR = {Rei, Luis and Mladenić, Dunja}, TITLE = {Detecting Fine-Grained Emotions in Literature}, JOURNAL = {Applied Sciences}, VOLUME = {13}, YEAR = {2023}, NUMBER = {13}, ARTICLE-NUMBER = {7502}, URL = {https://www.mdpi.com/2076-3417/13/13/7502}, ISSN = {2076-3417}, DOI = {10.3390/app13137502} }

Abstract

Emotion detection in text is a fundamental aspect of affective computing and is closely linked to natural language processing. Its applications span various domains, from interactive chatbots to marketing and customer service. This research specifically focuses on its significance in literature analysis and understanding. To facilitate this, we present a novel approach that involves creating a multi-label fine-grained emotion detection dataset, derived from literary sources. Our methodology employs a simple yet effective semi-supervised technique. We leverage textual entailment classification to perform emotion-specific weak-labeling, selecting examples with the highest and lowest scores from a large corpus. Utilizing these emotion-specific datasets, we train binary pseudo-labeling classifiers for each individual emotion. By applying this process to the selected examples, we construct a multi-label dataset. Using this dataset, we train models and evaluate their performance within a traditional supervised setting. Our model achieves an F1 score of 0.59 on our labeled gold set, showcasing its ability to effectively detect fine-grained emotions. Furthermore, we conduct evaluations of the model's performance in zero- and few-shot transfer scenarios using benchmark datasets. Notably, our results indicate that the knowledge learned from our dataset exhibits transferability across diverse data domains, demonstrating its potential for broader applications beyond emotion detection in literature. Our contribution thus includes a multi-label fine-grained emotion detection dataset built from literature, the semi-supervised approach used to create it, as well as the models trained on it. This work provides a solid foundation for advancing emotion detection techniques and their utilization in various scenarios, especially within the cultural heritage analysis.

Labels

admiration: finds something admirable, impressive or worthy of respect

amusement: finds something funny, entertaining or amusing

anger: is angry, furious, or strongly displeased; displays ire, rage, or wrath

annoyance: is annoyed or irritated

approval: expresses a favorable opinion, approves, endorses or agrees with something or someone

boredom: feels bored, uninterested, monotony, tedium

calmness: is calm, serene, free from agitation or disturbance, experiences emotional tranquility

caring: cares about the well-being of someone else, feels sympathy, compassion, affectionate concern towards someone, displays kindness or generosity

courage: feels courage or the ability to do something that frightens one, displays fearlessness or bravery

curiosity: is interested, curious, or has strong desire to learn something

desire: has a desire or ambition, wants something, wishes for something to happen

despair: feels despair, helpless, powerless, loss or absence of hope, desperation, despondency

disappointment: feels sadness or displeasure caused by the non-fulfillment of hopes or expectations, being or let down, expresses regret due to the unfavorable outcome of a decision

disapproval: expresses an unfavorable opinion, disagrees or disapproves of something or someone

disgust: feels disgust, revulsion, finds something or someone unpleasant, offensive or hateful

doubt: has doubt or is uncertain about something, bewildered, confused, or shows lack of understanding

embarrassment: feels embarrassed, awkward, self-conscious, shame, or humiliation

envy: is covetous, feels envy or jealousy; begrudges or resents someone for their achievements, possessions, or qualities

excitement: feels excitement or great enthusiasm and eagerness

faith: expresses religious faith, has a strong belief in the doctrines of a religion, or trust in god

fear: is afraid or scared due to a threat, danger, or harm

frustration: feels frustrated: upset or annoyed because of inability to change or achieve something

gratitude: is thankful or grateful for something

greed: is greedy, rapacious, avaricious, or has selfish desire to acquire or possess more than what one needs

grief: feels grief or intense sorrow, or grieves for someone who has died

guilt: feels guilt, remorse, or regret to have committed wrong or failed in an obligation

indifference: is uncaring, unsympathetic, uncharitable, or callous, shows indifference, lack of concern, coldness towards someone

joy: is happy, feels joy, great pleasure, elation, satisfaction, contentment, or delight

love: feels love, strong affection, passion, or deep romantic attachment for someone

nervousness: feels nervous, anxious, worried, uneasy, apprehensive, stressed, troubled or tense

nostalgia: feels nostalgia, longing or wistful affection for the past, something lost, or for a period in one's life, feels homesickness, a longing for one's home, city, or country while being away; longing for a familiar place

optimism: feels optimism or hope, is hopeful or confident about the future, that something good may happen, or the success of something - pain: feels physical pain or is experiences physical suffering

pride: is proud, feels pride from one's own achievements, self-fulfillment, or from the achievements of those with whom one is closely associated, or from qualities or possessions that are widely admired

relief: feels relaxed, relief from tension or anxiety

sadness: feels sadness, sorrow, unhappiness, depression, dejection

surprise: is surprised, astonished or shocked by something unexpected

trust: trusts or has confidence in someone, or believes that someone is good, honest, or reliable

Dataset

EmoLit (Zenodo)

Code

EmoLit Train (Github)

Models

LARGE

BASE

DISTILL
World Transportation
wifire-data.sdsc.edu
csv, esri rest +4
Updated Jun 9, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
World Transportation [Dataset]. https://wifire-data.sdsc.edu/dataset/world-transportation
Explore at:
geojson, kml, esri rest, csv, zip, htmlAvailable download formats
Dataset updated
Jun 9, 2021
Dataset provided by
Esrihttp://esri.com/
Area covered
World
Description
This map presents transportation data, including highways, roads, railroads, and airports for the world.
The map was developed by Esri using Esri highway data; Garmin basemap layers; HERE street data for North America, Europe, Australia, New Zealand, South America and Central America, India, most of the Middle East and Asia, and select countries in Africa. Data for Pacific Island nations and the remaining countries of Africa was sourced from OpenStreetMap contributors. Specific country list and documentation of Esri's process for including OSM data is available to view.
You can add this layer on top of any imagery, such as the Esri World Imagery map service, to provide a useful reference overlay that also includes street labels at the largest scales. (At the largest scales, the line symbols representing the streets and roads are automatically hidden and only the labels showing the names of streets and roads are shown). Imagery With Labels basemap in the basemap dropdown in the ArcGIS web and mobile clients does not include this World Transportation map. If you use the Imagery With Labels basemap in your map and you want to have road and street names, simply add this World Transportation layer into your map. It is designed to be drawn underneath the labels in the Imagery With Labels basemap, and that is how it will be drawn if you manually add it into your web map.
GIS Data and Analysis for Cooling Demand and Environmental Impact in The...
zenodo.org
data.niaid.nih.gov
Updated Dec 6, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Simon van Lierde; Simon van Lierde (2023). GIS Data and Analysis for Cooling Demand and Environmental Impact in The Hague [Dataset]. http://doi.org/10.5281/zenodo.10277761
Explore at:
Unique identifier
https://doi.org/10.5281/zenodo.10277761
Dataset updated
Dec 6, 2023
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Simon van Lierde; Simon van Lierde
License
https://www.gnu.org/licenses/gpl-3.0-standalone.htmlhttps://www.gnu.org/licenses/gpl-3.0-standalone.html
Area covered
The Hague
Description
This dataset contains raw GIS data sourced from the BAG (Basisregistratie Adressen en Gebouwen; Registry of Addresses and Buildings). It provides comprehensive information on buildings, including advanced height data and administrative details. It also contains geographic divisions within The Hague. Additionally, the dataset incorporates energy label data, offering insights into the energy efficiency and performance of these buildings. This combined dataset serves as the backbone of a Master's thesis in Industrial Ecology, analysing residential and office cooling and its environmental impacts in The Hague, Netherlands. The codebase of this analysis can be found in this Github repository: https://github.com/simonvanlierde/msc-thesis-ie
The dataset includes a background research spreadsheet containing supporting calculations. It also presents geopackages with results from the cooling demand model (CDM) for various scenarios: Status quo (SQ), 2030, and 2050 scenarios (Low, Medium, and High)
Background research data
The background_research_data.xlsx spreadsheet contains comprehensive background research calculations supporting the shaping of input parameters used in the model. It contains several sheets:
Cooling Technologies: Details the various cooling technologies examined in the study, summarizing their characteristics and the market penetration mixes used in the analysis.
LCA Results of Ventilation Systems: Provides an overview of the ecoinvent processes serving as proxies for the life-cycle impacts of cooling equipment, along with calculations of the weight of cooling systems and contribution tables from the LCA-based assessment.
Material Scarcity: A detailed examination of the critical raw material content in the material footprint of ecoinvent processes, representing cooling equipment.
Heat Plans per Neighbourhood: Forecasts of future heating solutions for each neighbourhood in The Hague.
Building Stock: Analysis of the projected growth trends in residential and office building stocks in The Hague. AC Market: Market analysis covering air conditioner sales in the Netherlands from 2002 to 2022.
Climate Change: Computations of climate-related parameters based on KNMI climate scenarios.
Electricity Mix Analysis: Analysis of future projections for the Dutch electricity grid and calculations of life-cycle carbon intensities of the grid.
Input data
Geographic divisions
The outline of The Hague municipality through the Municipal boundaries (Gemeenten) layer, sourced from the Administrative boundaries (Bestuurlijke Gemeenten) dataset on the PDOK WFS service.
District (Wijken) and Neighbourhood (Buurten) layers were downloaded from the PDOK WFS service (from the CBS Wijken en Buurten 2022 data package) and clipped to the outline of The Hague.
The 4-digit postcodes layer was downloaded from PDOK WFS service (CBS Postcode4 statistieken 2020) and clipped to The Hague's outline. The postcodes within The Hague were subsequently stored in a csv file.
The census block layer was downloaded from the PDOK WFS service (from the CBS Vierkantstatistieken 100m 2021 data package) and also clipped to the outline of The Hague.
These layers have been combined in the GeographicDivisions_TheHague GeoPackage.
BAG data
BAG data was acquired through the download of a BAG GeoPackage from the BAG ATOM download page.
In the resulting GeoPackage, the Residences (Verblijfsobject) and Building (Pand) layers were clipped to match The Hague's outline.
The resulting residence data can be found in the BAG_buildings_TheHague GeoPackage.
3D BAG
Due to limitations imposed by the PDOK WFS service, which restricts the number of downloadable buildings to 10,000, it was necessary to acquire 145 individual GeoPackages for tiles covering The Hague from the 3D BAG website.
These GeoPackages were merged using the ogr2ogr append function from the GDAL library in bash.
Roof elevation data was extracted from the LoD 1.2 2D layer from the resulting GeoPackage.
Ground elevation data was obtained from the Pand layer.
Both of these layers were clipped to match The Hague's outline.
Roof and ground elevation data from the LoD 1.2 2D and Pand layers were joined to the Pand layer in the BAG dataset using the BAG ID of each building.
The resulting data can be found in the BAG_buildings_TheHague GeoPackage.
Energy labels
Energy labels were downloaded from the Energy label registry (EP-online) and stored in energy_labels_TheNetherlands.csv.
UHI effect data
A bitmap with the UHI effect intensity in The Hague was retrieved from the from the Dutch Natural Capital Atlas (Atlas Natuurlijk Kapitaal) and stored in UHI_effect_TheHague.tiff.
Output data
The residence-level data joined to the building layer is contained in the BAG_buildings_with_residence_data_full GeoPackage.
The results for each building, according to different scenarios, are compiled in the buildings_with_CDM_results_[scenario]_full GeoPackages. The scenarios are abbreviated as follows:
SQ: Status Quo, covering the 2018-2022 reference period.
2030: An average scenario projected for the year 2030.
2050_L: A low-impact, best-case scenario for 2050.
2050_M: A medium-impact, moderate scenario for 2050.
2050_H: A high-impact, worst-case scenario for 2050.

Lot Boundaries and Labels

data.nsw.gov.au

arcgis rest service

Updated Feb 24, 2025

Facebook

Twitter

Click to copy link

Link copied

Cite

Spatial Services (DCS) (2025). Lot Boundaries and Labels [Dataset]. https://data.nsw.gov.au/data/dataset/1-6c9f3b05fbd74a27a36a008b3a7faaa8

Explore at:

arcgis rest serviceAvailable download formats

Dataset updated

Feb 24, 2025

Dataset provided by

Spatial Services (DCS)

Description

<tr style='border-bottom: 1px solid

Content Title	NSW Land Parcel Property Theme MultiCRS - Lot
Content Type	Hosted Feature Layer
Description	NSW Land Parcel and Property Theme MultiCRS - Lot is a polygon feature that defines a parcel of land created on a survey plan. Parcel polygons are defined by a series of boundary lines that store recorded dimensions as attributes in the lines table. It visualises these boundaries of land parcels, often buildings on land, the parcel identifier, and basic topographic features. NSW Land Parcel and Property Theme provides the foundation fabric of land ownership. It consists of the digital cadastral database and associated parcel and property information. NSW Land Parcel and Property Theme Lot is made up of the following features within the NSW Land Parcel and Property Theme. Cadastral Fabric – Lot Lot - Depicts a parcel of land created on a survey plan. Each lot may be represented by standard lots, standard part lots, strata or stratum. Each lot has a lot number, section number, plan lot area, plan number, plan label, Integrated Titling System (ITS) title status, and stratum label. Land and property data underpins the economic, social and environmental fabric of NSW and is used, amongst other things, to: Secure tenure for access to capital Define allowable use of land Manage native title, nature conservation, heritage, protection, defence, and disaster management Improve infrastructure and property development planning Inform water and carbon accounting programs Spatial Services continuously updates the Land Parcel and Property theme with information sourced from relevant stakeholders and custodians. The majority of updates to the datasets in this theme originate from subdivision, registration and gazettal activity. The data is up to date to within 10 working days from when a plan is lodged at NSW Land Registry Services. Data is also sourced from Crown Lands, the Office of Environment and Heritage, the Aboriginal Land Council, Local Land Services, the Electoral Commission and NSW Trade and Investment. The Cadastral upgrade program commenced in 2007 and is ongoing, improving the spatial accuracy of different feature classes. Upgrades are carried out in consultation with the relevant Local Government Authority and are further facilitated through the incorporation of data provided by external agencies. Upgrade positional accuracy varies across the state and generally ranges from less than 5m from true position in rural areas to less than 0.2m from true position in urban areas, dependent on the survey control available. Data quality for both Cadastral Maintenance and Cadastral Upgrade activities are assured through specification compliance and data topology rules. The client delivery database is automatically updated each evening with the changes that occurred that day in the maintenance environment.
Initial Publication Date	05/02/2020
Data Currency	01/01/3000
Data Update Frequency	Daily
Content Source	Data provider files
File Type	ESRI File Geodatabase (*.gdb)
Attribution	© State of New South Wales (Spatial Services, a business unit of the Department of Customer Service NSW). For current information go to spatial.nsw.gov.au
Data Theme, Classification or Relationship to other Datasets	NSW Land Parcel Property Theme of the Foundation Spatial Data Framework (FSDF)
Accuracy	The dataset maintains a positional relationship to, and alignment with, the Lot and Property digital datasets. This dataset was captured by digitising the best available cadastral mapping at a variety of scales and accuracies, ranging from 1:500 to 1:250 000 according to the National Mapping Council of Australia, Standards of Map Accuracy (1975). Therefore, the position of the feature instance will be within 0.5mm at map scale for 90% of the well-defined points. That is, 1:500 = 0.25m, 1:2000 = 1m, 1:4000 = 2m, 1:25000 = 12.5m, 1:50000 = 25m and 1:100000 = 50m. A program of positional upgrade (accuracy improvement) is currently underway. A program to upgrade the spatial location and accuracy of data is ongoing.
Spatial Reference System (dataset)	GDA94

F
Travel Call Center Speech Data: Czech (Czech Republic)
futurebeeai.com
wav
Updated Aug 1, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
FutureBee AI (2022). Travel Call Center Speech Data: Czech (Czech Republic) [Dataset]. https://www.futurebeeai.com/dataset/speech-dataset/travel-call-center-conversation-czech-czech-republic
Explore at:
wavAvailable download formats
Dataset updated
Aug 1, 2022
Dataset provided by
FutureBeeAI
Authors
FutureBee AI
License
https://www.futurebeeai.com/data-license-agreementhttps://www.futurebeeai.com/data-license-agreement
Area covered
Czechia
Dataset funded by
FutureBeeAI
Description
Introduction
Welcome to the Czech Call Center Speech Dataset for the Travel domain designed to enhance the development of call center speech recognition models specifically for the Travel industry. This dataset is meticulously curated to support advanced speech recognition, natural language processing, conversational AI, and generative voice AI algorithms.
Speech Data:
This training dataset comprises 30 Hours of call center audio recordings covering various topics and scenarios related to the Travel domain, designed to build robust and accurate customer service speech technology.
•Participant Diversity:
•
Speakers: 60 expert native Czech speakers from the FutureBeeAI Community.

•
Regions: Different states/provinces of Czech Republic, ensuring a balanced representation of Czech accents, dialects, and demographics.

•
Participant Profile: Participants range from 18 to 70 years old, representing both males and females in a 60:40 ratio, respectively.

•Recording Details:
•
Conversation Nature: Unscripted and spontaneous conversations between call center agents and customers.

•
Call Duration: Average duration of 5 to 15 minutes per call.

•
Formats: WAV format with stereo channels, a bit depth of 16 bits, and a sample rate of 8 and 16 kHz.

•
Environment: Without background noise and without echo.

Topic Diversity
This dataset offers a diverse range of conversation topics, call types, and outcomes, including both inbound and outbound calls with positive, neutral, and negative outcomes.
•Inbound Calls:
•Booking inquiries and assistance
•Destination information and recommendations
•Assistance with flight delays or cancellations
•Special assistance for passengers with disabilities
•Travel-related health and safety inquiry
•Assistance with lost or delayed baggage, and many more
•Outbound Calls:
•Promotional offers and package deals
•Customer satisfaction surveys
•Booking confirmations and updates
•Flight schedule changes and notifications
•Customer feedback collection
•Reminders for passport or visa expiration date, and many more
This extensive coverage ensures the dataset includes realistic call center scenarios, which is essential for developing effective customer support speech recognition models.
Transcription
To facilitate your workflow, the dataset includes manual verbatim transcriptions of each call center audio file in JSON format. These transcriptions feature:
•
Speaker-wise Segmentation: Time-coded segments for both agents and customers.

•
Non-Speech Labels: Tags and labels for non-speech elements.

•
Word Error Rate: Word error rate is less than 5% thanks to the dual layer of QA.

These ready-to-use transcriptions accelerate the development of the Travel domain call center conversational AI and ASR models for the Czech language.
Metadata
The dataset provides comprehensive metadata for each conversation and participant:
•
Participant Metadata: Unique identifier, age, gender, country, state, district, accent and dialect.

•
Conversation Metadata: Domain, topic, call type, outcome/sentiment, bit depth, and sample rate.
CNPS-GAP-PADUS Calif/West Webmap (CnpsGap CAL12)
hub.arcgis.com
Updated Jan 13, 2015
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Esri Conservation Program (2015). CNPS-GAP-PADUS Calif/West Webmap (CnpsGap CAL12) [Dataset]. https://hub.arcgis.com/maps/930d3709072b4179923067cc5f8b75ba
Explore at:
Dataset updated
Jan 13, 2015
Dataset provided by
Esrihttp://esri.com/
Authors
Esri Conservation Program
Area covered

Description
This webmap is a collaboration between the California Native Plant Society (CNPS) and the California Dept of Fish and Game (CDFG).The CNPS Vegetation Program has worked for over 15 years to provide standards and tools for identifying and representing vegetation, as an important feature of California's natural heritage and biodiversity. Many knowledgeable ecologists and botanists support the program as volunteers and paid staff. Through grants, contracts, and grass-roots efforts, CNPS collects field data and compiles information into reports, manuals, and maps on California's vegetation, ecology and rare plants in order to better protect and manage them. We provide these services to governmental, non-governmental and other organizations, and we collaborate on vegetation resource assessment projects around the state. CNPS is also the publisher of the authoritative Manual of California Vegetation, you can purchase a copy HERE. To support the work of the CNPS, please JOIN NOW and become a member!The CDFG Vegetation Classification and Mapping Program develops and maintains California's expression of the National Vegetation Classification System. We implement its use through assessment and mapping projects in high-priority conservation and management areas, through training programs, and through working continuously on best management practices for field assessment, classification of vegetation data, and fine-scale vegetation mapping.HOW THE OVERLAY LAYERS WERE CREATED:Nserve and GapLC Sources: Early shortcomings in the NVC standard led to Natureserve's development of a mid-scale mapping-friendly "Ecological Systems" standard roughly corresponding to the "Group" level of the NVC, which facilitated NVC-based mapping of entire continents. Current scientific work is leading to the incorporation of Ecological Systems into the NVC as group and macrogroup concepts are revised. Natureserve and Gap Ecological Systems layers differ slightly even though both were created from 30m landsat data and both follow the NVC-related Ecological Systems Classification curated by Natureserve. In either case, the vector overlay was created by first enforcing a .3ha minimum mapping unit, that required deleting any classes consisting of fewer than 4 contiguous landsat cells either side-side or cornerwise. This got around the statistical problem of numerous single-cell classes with types that seemed improbable given their matrix, and would have been inaccurate to use as an n=1 sample compared to the weak but useable n=4 sample. A primary goal in this elimination was to best preserve riparian and road features that might only be one pixel wide, hence the use of cornerwise contiguous groupings. Eliminated cell groups were absorbed into whatever neighboring class they shared the longest boundary with. The remaining raster groups were vectorized with light simplification to smooth out the stairstep patterns of raster data and hopefully improve the fidelity of the boundaries with the landscape. The resultant vectors show a range of fidelity with the landscape, where there is less apparent fidelity it must be remembered that ecosystems are normally classified with a mixture of visible and non-visible characteristics including soil, elevation and slope. Boundaries can be assigned based on the difference between 10% shrub cover and 20% shrub cover. Often large landscape areas would create "godzilla" polygons of more than 50,000 vertices, which can affect performance. These were eliminated using SIMPLIFY POLYGONS to reduce vertex spacing from 30m down to 50-60m where possible. Where not possible DICE was used, which bisects all large polygons with arbitrary internal divisions until no polygon has more than 50,000 vertices. To create midscale layers, ecological systems were dissolved into the macrogroups that they belonged to and resymbolized on macrogroup. This was another frequent source for godzillas as larger landscape units were delineate, so simplify and dice were then run again. Where the base ecol system tiles could only be served up by individual partition tile, macrogroups typically exhibited a 10-1 or 20-1 reduction in feature count allowing them to be assembled into single integrated map services by region, ie NW, SW. CNPS / CDFW / National Park Service Sources: (see also base service definition page) Unlike the Landsat-based raster modelling of the Natureserve and Gap national ecological systems, the CNPS/CDFW/NPS data date back to the origin of the National Vegetation Classification effort to map the US national parks in the mid 1990's.
These mapping efforts are a hybrid of photo-interpretation, satellite and corollary data to create draft ecological land units, which are then sampled by field crews and traditional vegetation plot surveys to quantify and analyze vegetation composition and distribution into the final vector boundaries of the formal NVC classes identified and classified. As such these are much more accurate maps, but the tradeoff is they are only done on one field project area at a time so there is not yet a national or even statewide coverage of these detailed maps.
However, with almost 2/3d's of California already mapped, that time is approaching. The challenge in creating standard map layers for this wide diversity of projects over the 2 decades since NVC began is the extensive evolution in the NVC standard itself as well as evolution in the field techniques and tools. To create a consistent set of map layers, a master crosswalk table was built using every different classification known at the time each map was created and then crosswalking each as best as could be done into a master list of the currently-accepted classifications. This field is called the "NVC_NAME" in each of these layers, and it contains a mixture of scientific names and common names at many levels of the classification from association to division, whatever the ecologists were able to determine at the time. For further precision, this field is split out into scientific name equivalents and common name equivalents.MAP LAYER NAMING: The data sublayers in this webmap are all based on the US National Vegetation Classification, a partnership of the USGS GAP program, US Forest Service, Ecological Society of America and Natureserve, with adoption and support from many federal & state agencies and nonprofit conservation groups. The USNVC grew out of the US National Park Service Vegetation Mapping Program, a mid-1990's effort led by The Nature Conservancy, Esri and the University of California. The classification standard is now an international standard, with associated ecological mapping occurring around the world. NVC is a hierarchical taxonomy of 8 levels, from top down: Class, Subclass, Formation, Division, Macrogroup, Group, Alliance, Association. The layers in this webmap represent 4 distinct programs: 1. The California Native Plant Society/Calif Dept of Fish & Wildlife Vegetation Classification and Mapping Program (Full Description of these layers is at the CNPS MS10 Service Registration Page and Cnps MS10B Service Registration Page . 2. USGS Gap Protected Areas Database, full description at the PADUS registration page . 3. USGS Gap Landcover, full description below 4. Natureserve Ecological Systems, full description belowLAYER NAMING: All Layer names follow this pattern: Source - Program - Level - Scale - RegionSource - Program = who created the data: Nserve = Natureserve, GapLC = USGS Gap Program Landcover Data PADUS = USGS Gap Protected Areas of the USA program Cnps/Cdfw = California Native Plant Society/Calif Dept of Fish & Wildlife, often followed by the project name such as: SFhill = Sierra Foothills, Marin Open Space, MMWD = Marin Municipal Water District etc. National Parks are included and may be named by their standard 4-letter code ie YOSE = Yosemite, PORE = Point Reyes.Level: The level in the NVC Hierarchy which this layer is based on: Base = Alliances and Associations Mac = Macrogroups Sub = SubclassesScale: One of 3 basic scales at which this layer will appear: Base = base scale, approx 1:1k up to 1:36k Mid = 72k to about 500k Out = 1m to 10mRegion: The region that this layer covers, ie USA=USA, WEST= western USA,
Marin = Marin County. May not appear if redundant to the Source-Program text.LABEL & COLOR: These overlays utilize a separate labelling layer to make it easy to include or not include labels, as needed. These are named the same as the layer they label, with "LABEL" added, and often the color used for that label layer in order to help tell them apart on the map. Note there can be multiple different label layers for the same set of polygons, depending upon the attribute or naming style desired, ie scientific names or common names. Finally the order of these services in the sublayers of a map service is normally designed so that ALL of the label services appear above ANY/ALL of the vector services they refer to, to prevent a vector service writing on top of a label and obscuring it.MAP LAYER CATALOGThis map includes a test segment of Natureserve Ecological Systems in the US Southwest, with the following layers and sublayers:GapNsUSA BoundaryMasksALB2: A grid showing the boundaries that define each partition tile of the national vegetation map services, with regional and state boundaries in the USGS Gap US Albers projectionPadus Gap13 WM Base Scale plus Label: (Full PADUS FGDC Metadata here) Overlay vectors at 1k to 288k scale with separate 1k-288k Labelling services for one of 3 different attributes: --Landowner Name: Land owner and primary entity responsible for managing parcel when ‘Manager Name’ is not attributed (e.g. USFS, State Fish and

Nutritional Labelling Market by Label Type, Technology, End User & Region |...

futuremarketinsights.com

pdf

Updated Oct 14, 2022

Facebook

Twitter

Click to copy link

Link copied

Cite

Nutritional Labelling Market by Label Type, Technology, End User & Region | Forecast 2022 to 2032 [Dataset]. https://www.futuremarketinsights.com/reports/nutritional-labeling-market

Explore at:

pdfAvailable download formats

Dataset updated

Oct 14, 2022

Dataset authored and provided by

Future Market Insights

License

https://www.futuremarketinsights.com/privacy-policyhttps://www.futuremarketinsights.com/privacy-policy

Time period covered

2022 - 2032

Area covered

Worldwide

Description

The Nutritional Labelling market is valued at USD 13.0 billion in 2022 and is projected to grow at a CAGR of 3.4% during the forecast period, to reach a value of USD 18.1 Billion by 2032. Newly released data from Future Market Insights market analysis shows that global Nutritional Labelling demand is projected to grow year-on-year (Y-o-Y) growth of 4.1% in 2022.

Attributes	Details
Market Size Value in 2022	USD 23.0 Billion
Market Forecast Value in 2032	USD 18.1 Billion
Global Growth Rate (2022 to 2032)	3.4% CAGR
Forecast Period	2022 to 2032
Collective Value Share: Top 3 Countries (2021A)	42.4%

Scope of Report

Attribute	Details
Market Size Value in 2022	USD 11.9 Billion
Market Forecast Value in 2032	USD 20.5 Billion
Global Growth Rate	5.6% CAGR
Forecast Period	2022 to 2032
Historical Data Available for	2017 to 2021
Market Analysis	MT for Volume and USD Billion for Value
Key Regions Covered	North America, Latin America, Europe, East Asia, South Asia, Oceania, and MEA
Key Countries Covered	United States, Canada, Brazil, Mexico, Argentina, Chile, Peru, Germany, France, Italy, Spain, UK, Netherlands, Belgium, Nordic, Russia, China, Japan, South Korea, India, Thailand, Malaysia, Indonesia, Singapore, Australia, New Zealand, GCC Countries, South Africa, Central Africa, North Africa, and others
Key Market Segments Covered	Label Type, Technology, End User
Key Companies Profiled	Checkpoint Systems Avery Dennison Corporation Smartrac NV Label Insight Thin Film Electronics Qliktag Smart Label Sato Holdings Corporation Graphic Label Others
Pricing	Available upon Request

2017 Census of Agriculture - Census Data Query Tool (CDQT)
agdatacommons.nal.usda.gov
bin
Updated Feb 13, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
USDA National Agricultural Statistics Service (2024). 2017 Census of Agriculture - Census Data Query Tool (CDQT) [Dataset]. https://agdatacommons.nal.usda.gov/articles/dataset/2017_Census_of_Agriculture_-_Census_Data_Query_Tool_CDQT_/24663345
Explore at:
binAvailable download formats
Dataset updated
Feb 13, 2024
Dataset provided by
National Agricultural Statistics Servicehttp://www.nass.usda.gov/
United States Department of Agriculturehttp://usda.gov/
Authors
USDA National Agricultural Statistics Service
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
The Census of Agriculture is a complete count of U.S. farms and ranches and the people who operate them. Even small plots of land - whether rural or urban - growing fruit, vegetables or some food animals count if $1,000 or more of such products were raised and sold, or normally would have been sold, during the Census year. The Census of Agriculture, taken only once every five years, looks at land use and ownership, operator characteristics, production practices, income and expenditures. For America's farmers and ranchers, the Census of Agriculture is their voice, their future, and their opportunity. The Census Data Query Tool (CDQT) is a web-based tool that is available to access and download table level data from the Census of Agriculture Volume 1 publication. The data found via the CDQT may also be accessed in the NASS Quick Stats database. The CDQT is unique in that it automatically displays data from the past five Census of Agriculture publications. The CDQT is presented as a "2017 centric" view of the Census of Agriculture data. All data series that are present in the 2017 dataset are available within the CDQT, and any matching data series from prior Census years will also display (back to 1997). If a data series is not included in the 2017 dataset, then data cells will remain blank in the tool. For example, one of the data series had a label change from "Operator" to "Producer." This means that data from prior Census years labelled "Operator" will not show up where the label has changed to “Producer” for 2017. The new Census Data Query Tool application can be used to query Census data from 1997 through 2017. Data are searchable by Census table and are downloadable as CSV or PDF files. 2017 Census Ag Atlas Maps are also available for download. Resources in this dataset:Resource Title: 2017 Census of Agriculture - Census Data Query Tool (CDQT). File Name: Web Page, url: https://www.nass.usda.gov/Quick_Stats/CDQT/chapter/1/table/1 The Census Data Query Tool (CDQT) is a web based tool that is available to access and download table level data from the Census of Agriculture Volume 1 publication. The data found via the CDQT may also be accessed in the NASS Quick Stats database. The CDQT is unique in that it automatically displays data from the past five Census of Agriculture publications. The CDQT is presented as a "2017 centric" view of the Census of Agriculture data. All data series that are present in the 2017 dataset are available within the CDQT, and any matching data series from prior Census years will also display (back to 1997). If a data series is not included in the 2017 dataset, then data cells will remain blank in the tool. For example, one of the data series had a label change from "Operator" to "Producer." This means that data from prior Census years labelled "Operator" will not show up where the label has changed to "Producer" for 2017. Using CDQT:

Upon entering the CDQT, a data table is present. Changing the parameters at the top of the data table will retrieve different combinations of Census Chapter, Table, State, or County (when selecting Chapter 2). For the U.S., Volume 1, US/State Chapter 1 will include only U.S. data; Chapter 2 will include U.S. and State level data. For a State, Volume 1 US/State Level Data Chapter 1 will include only the State level data; Chapter 2 will include the State and county level data. Once a selection is made, press the “Update Grid” button to retrieve the new data table. Comma-separated values (CSV) download, compatible with most spreadsheet and database applications: to download a CSV file of the data as it is currently presented in the data grid, press the "CSV" button in the "Export Data" section of the toolbar. When CSV is chosen, data will be downloaded as numeric. To view the source PDF file for the data table, press the "View PDF" button in the toolbar.
r
IMOS - AATAMS Facility - Satellite Relay Tagging Program - Delayed mode data...
researchdata.edu.au
Updated 2015
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Integrated Marine Observing System (IMOS) (2015). IMOS - AATAMS Facility - Satellite Relay Tagging Program - Delayed mode data [Dataset]. https://researchdata.edu.au/imos-aatams-facility-delayed-mode/955033
Explore at:
Dataset updated
2015
Dataset provided by
Integrated Marine Observing System
Australian Ocean Data Network
Authors
Integrated Marine Observing System (IMOS)
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The Animal Tracking Facility (formerly known as the Australian Animal Tracking And Monitoring System (AATAMS)) is a coordinated marine animal tagging project. Satellite Relay Data Loggers (SRDL) (most with CTDs, and some also with fluorometers) are used to explore how marine mammal behaviour relates to their oceanic environment. Loggers developed at the University of St Andrews Sea Mammal Research Unit transmit data in near real time via the Argo satellite system. The Satellite Relay Data Loggers are deployed on marine mammals, including Elephant Seals, Weddell Seals, Australian Fur Seals, Australian Sea Lions, New Zealand Fur Seals. Data is being collected in the Southern Ocean, the Great Australian Bight, and off the South-East Coast of Australia. This metadata record, represents several different datasets listed hereafter, which can all be accessed through a multi-WFS service. The data represented by this record are presented in delayed mode. CTD - parameters measured by the instruments include time, conductivity (salinity), temperature, speed, fluorescence (available in the future) and depth. Diving - parameters measured by the instruments include start and end time and longitude/latitude of each individual dive, post-dive surface duration, dive duration, maximum dive depth, intermediate dive depths and times. Haulout - a haulout begins when the SRDL has been continuously dry for a specified length of time (usually 10 minutes). It ends when continuously wet for another interval (usually 40 seconds). Haulout data parameters measured by the instruments include haulout start and end dates and longitude/latitude, and haulout number. Argos locations - location data parameters measured by the instruments include time, longitude, latitude, location quality, along with other diagnostic information provided by Argos (http://www.argos-system.org/). Summary Statistics - as well as sending records of individual events such as dives and haulouts, the SRDL also calculates summary statistics of those events over a specified time period (usually 3, 4 or 6 hours). Summary statistics computed by the instruments include the proportion of time spent diving, at the surface and hauled-out, the number of dives, and the average, standard deviation and maximum dive duration and dive depth during each summary period. These statistics are based on all the data recorded by the SRDL and so are not prone to distortion by variations in the efficiency of transmission via Argos.
h
Bitext-retail-ecommerce-llm-chatbot-training-dataset
huggingface.co
Updated Aug 6, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Bitext (2024). Bitext-retail-ecommerce-llm-chatbot-training-dataset [Dataset]. https://huggingface.co/datasets/bitext/Bitext-retail-ecommerce-llm-chatbot-training-dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Aug 6, 2024
Dataset authored and provided by
Bitext
License
https://choosealicense.com/licenses/cdla-sharing-1.0/https://choosealicense.com/licenses/cdla-sharing-1.0/
Description
Bitext - Retail (eCommerce) Tagged Training Dataset for LLM-based Virtual Assistants

Overview

This hybrid synthetic dataset is designed to be used to fine-tune Large Language Models such as GPT, Mistral and OpenELM, and has been generated using our NLP/NLG technology and our automated Data Labeling (DAL) tools. The goal is to demonstrate how Verticalization/Domain Adaptation for the [Retail (eCommerce)] sector can be easily achieved using our two-step approach to… See the full description on the dataset page: https://huggingface.co/datasets/bitext/Bitext-retail-ecommerce-llm-chatbot-training-dataset.
A Dataset of Polarities and Emotions from Brazilian Portuguese Play Store...
zenodo.org
data.niaid.nih.gov
csv
Updated Aug 28, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Vitor Siqueira; Gabriel M. Lunardi; Williamson Silva; Ricardo Luiz Hentges Costa; Schifelbein Soares Tales; Vitor Siqueira; Gabriel M. Lunardi; Williamson Silva; Ricardo Luiz Hentges Costa; Schifelbein Soares Tales (2024). A Dataset of Polarities and Emotions from Brazilian Portuguese Play Store Reviews [Dataset]. http://doi.org/10.5281/zenodo.10823148
Explore at:
csvAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.10823148
Dataset updated
Aug 28, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Vitor Siqueira; Gabriel M. Lunardi; Williamson Silva; Ricardo Luiz Hentges Costa; Schifelbein Soares Tales; Vitor Siqueira; Gabriel M. Lunardi; Williamson Silva; Ricardo Luiz Hentges Costa; Schifelbein Soares Tales
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Brazil
Description
User reviews play a crucial role in shaping consumer perceptions and guiding decision-making processes in the digital marketplace. With the rise of mobile applications, platforms like the Google Play Store serve as hubs for users to express their opinions and experiences with various apps and services. Understanding the polarities and emotions conveyed in these reviews provides valuable insights for developers, marketers, and researchers alike.

The dataset consists of user reviews collected from the "Trending" section of the Google Play Store in May 2023. A total of 300 reviews were gathered for each of the top 10 most downloaded applications during this period. Each review in the dataset has been meticulously labeled for polarity, categorizing sentiments as positive, negative, or neutral, and emotion, encompassing a range of emotional responses such as happiness, sadness, surprise, fear, disgust and anger.

Additionally, it's worth noting that this dataset underwent a rigorous annotation process. Three annotators independently classified the reviews for polarity and emotion. Afterward, they reconciled any discrepancies through discussion and arrived at a consensus for the final annotations. This ensures a high level of accuracy and reliability in the labeling process, providing researchers and practitioners with trustworthy data for analysis and decision-making.

It's important to highlight that all reviews in this dataset are in Brazilian Portuguese, reflecting the specific linguistic and cultural nuances of the Brazilian market. By leveraging this dataset, stakeholders gain access to a robust resource for exploring user sentiment and emotion within the context of popular mobile applications in Brazil.
Label pts Select CFs 17Jan25
usfs.hub.arcgis.com
Updated Jan 31, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. Forest Service (2022). Label pts Select CFs 17Jan25 [Dataset]. https://usfs.hub.arcgis.com/content/320492de685b4da2b24167bb3ab7f4b3
Explore at:
Dataset updated
Jan 31, 2022
Dataset provided by
U.S. Department of Agriculture Forest Servicehttp://fs.fed.us/
Authors
U.S. Forest Service
Area covered

Description
The USGS Protected Areas Database of the United States (PAD-US) is the nation's inventory of protected areas, including public land and voluntarily provided private protected areas, identified as an A-16 National Geospatial Data Asset in the Cadastre Theme ( https://communities.geoplatform.gov/ngda-cadastre/ ). The PAD-US is an ongoing project with several published versions of a spatial database including areas dedicated to the preservation of biological diversity, and other natural (including extraction), recreational, or cultural uses, managed for these purposes through legal or other effective means. The database was originally designed to support biodiversity assessments; however, its scope expanded in recent years to include all open space public and nonprofit lands and waters. Most are public lands owned in fee (the owner of the property has full and irrevocable ownership of the land); however, permanent and long-term easements, leases, agreements, Congressional (e.g. 'Wilderness Area'), Executive (e.g. 'National Monument'), and administrative designations (e.g. 'Area of Critical Environmental Concern') documented in agency management plans are also included. The PAD-US strives to be a complete inventory of U.S. public land and other protected areas, compiling “best available” data provided by managing agencies and organizations. The PAD-US geodatabase maps and describes areas using thirty-six attributes and five separate feature classes representing the U.S. protected areas network: Fee (ownership parcels), Designation, Easement, Marine, Proclamation and Other Planning Boundaries. An additional Combined feature class includes the full PAD-US inventory to support data management, queries, web mapping services, and analyses. The Feature Class (FeatClass) field in the Combined layer allows users to extract data types as needed. A Federal Data Reference file geodatabase lookup table (PADUS3_0Combined_Federal_Data_References) facilitates the extraction of authoritative federal data provided or recommended by managing agencies from the Combined PAD-US inventory. This PAD-US Version 3.0 dataset includes a variety of updates from the previous Version 2.1 dataset (USGS, 2020, https://doi.org/10.5066/P92QM3NT ), achieving goals to: 1) Annually update and improve spatial data representing the federal estate for PAD-US applications; 2) Update state and local lands data as state data-steward and PAD-US Team resources allow; and 3) Automate data translation efforts to increase PAD-US update efficiency. The following list summarizes the integration of "best available" spatial data to ensure public lands and other protected areas from all jurisdictions are represented in the PAD-US (other data were transferred from PAD-US 2.1). Federal updates - The USGS remains committed to updating federal fee owned lands data and major designation changes in annual PAD-US updates, where authoritative data provided directly by managing agencies are available or alternative data sources are recommended. The following is a list of updates or revisions associated with the federal estate: 1) Major update of the Federal estate (fee ownership parcels, easement interest, and management designations where available), including authoritative data from 8 agencies: Bureau of Land Management (BLM), U.S. Census Bureau (Census Bureau), Department of Defense (DOD), U.S. Fish and Wildlife Service (FWS), National Park Service (NPS), Natural Resources Conservation Service (NRCS), U.S. Forest Service (USFS), and National Oceanic and Atmospheric Administration (NOAA). The federal theme in PAD-US is developed in close collaboration with the Federal Geographic Data Committee (FGDC) Federal Lands Working Group (FLWG, https://communities.geoplatform.gov/ngda-govunits/federal-lands-workgroup/ ). 2) Improved the representation (boundaries and attributes) of the National Park Service, U.S. Forest Service, Bureau of Land Management, and U.S. Fish and Wildlife Service lands, in collaboration with agency data-stewards, in response to feedback from the PAD-US Team and stakeholders. 3) Added a Federal Data Reference file geodatabase lookup table (PADUS3_0Combined_Federal_Data_References) to the PAD-US 3.0 geodatabase to facilitate the extraction (by Data Provider, Dataset Name, and/or Aggregator Source) of authoritative data provided directly (or recommended) by federal managing agencies from the full PAD-US inventory. A summary of the number of records (Frequency) and calculated GIS Acres (vs Documented Acres) associated with features provided by each Aggregator Source is included; however, the number of records may vary from source data as the "State Name" standard is applied to national files. The Feature Class (FeatClass) field in the table and geodatabase describe the data type to highlight overlapping features in the full inventory (e.g. Designation features often overlap Fee features) and to assist users in building queries for applications as needed. 4) Scripted the translation of the Department of Defense, Census Bureau, and Natural Resource Conservation Service source data into the PAD-US format to increase update efficiency. 5) Revised conservation measures (GAP Status Code, IUCN Category) to more accurately represent protected and conserved areas. For example, Fish and Wildlife Service (FWS) Waterfowl Production Area Wetland Easements changed from GAP Status Code 2 to 4 as spatial data currently represents the complete parcel (about 10.54 million acres primarily in North Dakota and South Dakota). Only aliquot parts of these parcels are documented under wetland easement (1.64 million acres). These acreages are provided by the U.S. Fish and Wildlife Service and are referenced in the PAD-US geodatabase Easement feature class 'Comments' field. State updates - The USGS is committed to building capacity in the state data-steward network and the PAD-US Team to increase the frequency of state land updates, as resources allow. The USGS supported efforts to significantly increase state inventory completeness with the integration of local parks data in the PAD-US 2.1, and developed a state-to-PAD-US data translation script during PAD-US 3.0 development to pilot in future updates. Additional efforts are in progress to support the technical and organizational strategies needed to increase the frequency of state updates. The PAD-US 3.0 included major updates to the following three states: 1) California - added or updated state, regional, local, and nonprofit lands data from the California Protected Areas Database (CPAD), managed by GreenInfo Network, and integrated conservation and recreation measure changes following review coordinated by the data-steward with state managing agencies. Developed a data translation Python script (see Process Step 2 Source Data Documentation) in collaboration with the data-steward to increase the accuracy and efficiency of future PAD-US updates from CPAD. 2) Virginia - added or updated state, local, and nonprofit protected areas data (and removed legacy data) from the Virginia Conservation Lands Database, provided by the Virginia Department of Conservation and Recreation's Natural Heritage Program, and integrated conservation and recreation measure changes following review by the data-steward. 3) West Virginia - added or updated state, local, and nonprofit protected areas data provided by the West Virginia University, GIS Technical Center. For more information regarding the PAD-US dataset please visit, https://www.usgs.gov/gapanalysis/PAD-US/. For more information about data aggregation please review the PAD-US Data Manual available at https://www.usgs.gov/core-science-systems/science-analytics-and-synthesis/gap/pad-us-data-manual . A version history of PAD-US updates is summarized below (See https://www.usgs.gov/core-science-systems/science-analytics-and-synthesis/gap/pad-us-data-history for more information): 1) First posted - April 2009 (Version 1.0 - available from the PAD-US: Team pad-us@usgs.gov). 2) Revised - May 2010 (Version 1.1 - available from the PAD-US: Team pad-us@usgs.gov). 3) Revised - April 2011 (Version 1.2 - available from the PAD-US: Team pad-us@usgs.gov). 4) Revised - November 2012 (Version 1.3) https://doi.org/10.5066/F79Z92XD 5) Revised - May 2016 (Version 1.4) https://doi.org/10.5066/F7G73BSZ 6) Revised - September 2018 (Version 2.0) https://doi.org/10.5066/P955KPLE 7) Revised - September 2020 (Version 2.1) https://doi.org/10.5066/P92QM3NT 8) Revised - January 2022 (Version 3.0) https://doi.org/10.5066/P9Q9LQ4B Comparing protected area trends between PAD-US versions is not recommended without consultation with USGS as many changes reflect improvements to agency and organization GIS systems, or conservation and recreation measure classification, rather than actual changes in protected area acquisition on the ground.
Offshore Rocks and Wrecks Labels
amsis-geoscience-au.hub.arcgis.com
Updated Oct 25, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Geoscience Australia (2021). Offshore Rocks and Wrecks Labels [Dataset]. https://amsis-geoscience-au.hub.arcgis.com/items/4d5a2116a7ee4929931c2430255723a3
Explore at:
Dataset updated
Oct 25, 2021
Dataset authored and provided by
Geoscience Australiahttp://ga.gov.au/
Area covered

Description
Abstract:These data are best suited to graphical applications. These data may vary greatly in quality depending on the method of capture and digitising specifications in place at the time of capture. All features have been included from the 250K data capture. This layer is only for labelling. 250K Specification Description - Offshore Rock - A rock located offshore that represents a hazard to shipping. Wreck - A disabled vessel, either submerged or visible, which is attached to, or foul of, the bottom or cast up on the shore. (Source - https://www.ga.gov.au/mapspecs/topographic/v6/appendixA_files/Marine.html)This service has been created specifically for display in the National Map and the symbology displayed may not suit other mapping applications. Information included within the service includes the point locations for surface hydrology, including natural and man-made features such as water courses (including directional flow paths), lakes, dams and other water bodies and marine themes. The data is sourced from Geoscience Australia 250K Topographic data and Surface Hydrology data. The service contains layer scale dependencies.© Commonwealth of Australia (Geoscience Australia) 2017.Downloads and Links:Web ServicesOffshore Rocks and Wrecks Labels MapServerDownloads available from the expanded catalogue link, belowMetadata URL:https://pid.geoscience.gov.au/service/ga/100106
a
2010 Irrigated Lands for the Mountain Home Plateau: Machine Learning...
data-idwr.hub.arcgis.com
gis-idaho.hub.arcgis.com
Updated May 15, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Idaho Department of Water Resources (2024). 2010 Irrigated Lands for the Mountain Home Plateau: Machine Learning Generated [Dataset]. https://data-idwr.hub.arcgis.com/documents/a73726f584c544269310d73ec7cf65a2
Explore at:
Dataset updated
May 15, 2024
Dataset authored and provided by
Idaho Department of Water Resources
Description
This raster file represents land within the Mountain Home Study Area classified as either “irrigated” with a cell value of 1 or “non-irrigated” with a cell value of 0 at a 30-meter spatial resolution. These classifications were determined at the pixel level by a Random Forest supervised machine learning methodology. Random Forest models are often used to classify large datasets accurately and efficiently by assigning each pixel to one of a pre-determined set of labels or groups. The model works by using decision trees that split the data based on characteristics that make the resulting groups as different from each other as possible. The model “learns” the characteristics that correlate to each label based on manually classified data points, also known as training data.A variety of data can be supplied as input to the Random Forest model for it to use in making its classification determinations. Irrigation produces distinct signals in observational data that can be identified by machine learning algorithms. Additionally, datasets that provide the model with information on landscape characteristics that often influence whether irrigation is present are also useful. This dataset was classified by the Random Forest model using Collection 1 Tier 1 top-of-atmosphere reflectance data from Landsat 5 and Landsat 7, United States Geological Survey National Elevation Dataset (USGS NED) data, and Height Above Nearest Drainage (HAND) data. Landsat 5, Landsat 7, and HAND data are at a 30-meter spatial resolution, and the USGS NED data are at a 10-meter spatial resolution. The Cropland Data Layer (CDL) from the United States Department of Agriculture National Agricultural Statistics Service (USDA NASS), Active Water Rights Place of Use (POU) data from IDWR, and National Agriculture Imagery Program (NAIP) data from the USDA Farm Service Agency (FSA) were also used in determining irrigation status for the manually classified training data points but were not used for the machine learning model predictions. The final model results were manually reviewed prior to release, however, no extensive ground truthing process was implemented. A wetlands mask was applied using Fish and Wildlife Service’s National Wetlands Inventory (FWS NWI) data for areas without overlapping irrigation place of use areas or locations manually determined to have potential irrigation. “Speckling”, or small areas of incorrectly classified pixels, was reduced by a majority filter smoothing technique using a kernel of 8 nearest neighbors. A limited number of manual corrections were made to correct for missing data due to Landsat 7 ETM+ Scan Line Corrector gaps (https://www.usgs.gov/faqs/what-landsat-7-etm-slc-data). These data have also been snapped to same grid used with IDWR’s Mapping EvapoTranspiration using high Resolution and Internalized Calibration (METRIC) evapotranspiration data. Information regarding Landsat imagery:Landsat 5 and Landsat 7 Collection 1 Tier 1 top-of-atmosphere reflectance images that overlapped the area of interest were used in this analysis. Images were filtered to exclude those that were more than 70% cloud covered, resulting in 35 Landsat 5 and 35 Landsat 7 images for the analysis period of 2010-03-01 to 2010-10-27. Normalized Difference Vegetation Index (NDVI), Band 1 (Blue) and Band 7 (SWIR2) values were interpolated for the following dates: 2010-04-15, 2010-05-15, 2010-06-14, 2010-07-14, 2010-08-13, and 2010-09-12 using image values from up to 45 days before and after each interpolation date.
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

Market Research Forecast (2024). Data Annotation Tool Market Report [Dataset]. https://www.marketresearchforecast.com/reports/data-annotation-tool-market-10075

Data Annotation Tool Market Report

Explore at:

doc, ppt, pdfAvailable download formats

Dataset updated

Dec 9, 2024

Dataset authored and provided by

Market Research Forecast

License

https://www.marketresearchforecast.com/privacy-policyhttps://www.marketresearchforecast.com/privacy-policy

Time period covered

2025 - 2033

Area covered

Global

Variables measured

Market Size

Description

The size of the Data Annotation Tool Market market was valued at USD 3.9 USD billion in 2023 and is projected to reach USD 6.64 USD billion by 2032, with an expected CAGR of 7.9% during the forecast period. A Data Annotation Tool is defined as the software that can be employed to make annotations to data hence helping a learning computer model learn patterns. These tools provide a way of segregating the data types to include images, texts, and audio, as well as videos. Some of the subcategories of annotation include images such as bounding boxes, segmentation, text such as entity recognition, sentiment analysis, audio such as transcription, sound labeling, and video such as object tracking. Other common features depend on the case but they commonly consist of interfaces, cooperation with others, suggestion of labels, and quality assurance. It can be used in the automotive industry (object detection for self-driving cars), text processing (classification of text), healthcare (medical imaging), and retail (recommendation). These tools get applied in training good quality, accurately labeled data sets for the engineering of efficient AI systems. Key drivers for this market are: Increasing Adoption of Cloud-based Managed Services to Drive Market Growth. Potential restraints include: Adverse Health Effect May Hamper Market Growth. Notable trends are: Growing Implementation of Touch-based and Voice-based Infotainment Systems to Increase Adoption of Intelligent Cars.

Clear search

Close search

Google apps

Main menu

Data Annotation Tool Market Report

Website Screenshots Dataset

About This Dataset

Example

Usage

Collecting Custom Data

About Roboflow

Great African Food Company Crop Type Tanzania

Data from: X-ray CT data with semantic annotations for the paper "A workflow...

EmoLit

Description Literature sentences from Project Gutenberg. 38 emotion labels (+neutral examples). Semi-Supervised dataset.

Article

Abstract

Labels

Dataset

Code

Models

World Transportation

GIS Data and Analysis for Cooling Demand and Environmental Impact in The...

Background research data

Input data

Output data

Lot Boundaries and Labels

Travel Call Center Speech Data: Czech (Czech Republic)

Introduction

Speech Data:

Topic Diversity

Transcription

Metadata

CNPS-GAP-PADUS Calif/West Webmap (CnpsGap CAL12)

Nutritional Labelling Market by Label Type, Technology, End User & Region |...

2017 Census of Agriculture - Census Data Query Tool (CDQT)

IMOS - AATAMS Facility - Satellite Relay Tagging Program - Delayed mode data...

Bitext-retail-ecommerce-llm-chatbot-training-dataset

A Dataset of Polarities and Emotions from Brazilian Portuguese Play Store...

Label pts Select CFs 17Jan25

Offshore Rocks and Wrecks Labels

2010 Irrigated Lands for the Mountain Home Plateau: Machine Learning...

Data Annotation Tool Market Report