Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset was compiled with the ultimate goal of developing non-invasive computer vision algorithms for assessing shrimp biometrics and biomass estimation. The main folder, labeled "DATASET," contains five sub-folders—DB1, DB2, DB3, DB4, and DB5—each filled with images of shrimps. Additionally, each sub-folder is accompanied by an Excel file that includes manually measured data for the shrimps pictured. The files are named respectively: DB1_INDUSTRIAL_FARM_1, DB2_INDUSTRIAL_FARM_2_C1, DB3_INDUSTRIAL_FARM_2_C2, DB4_ACADEMIC_POND_S1, and DB5_ACADEMIC_POND_S2.
Here’s a detailed description of the contents of each sub-folder and its corresponding Excel file:
1) DB1 includes 490 PNG images of 22 shrimps taken from one pond at an industrial farm. The associated Excel file, DB1_INDUSTRIAL_FARM_1, contains columns for: SAMPLE: Reflecting the number of individual shrimps (22 entries or rows). LENGTH (cm): Measuring from the rostrum (near the eyes) to the start of the tail. WEIGHT (g): Recorded using a scale. COMPLETE SHRIMP IMAGES: Indicates if at least one full-body image is available (1) or not (0).
2) DB2 consists of 2002 PNG images of 58 shrimps. The Excel file, DB2_INDUSTRIAL_FARM_2_C1, includes: SAMPLE: Number of shrimps (58 entries or rows). CEPHALOTHORAX (cm): Total length of the cephalothorax. LENGTH (cm) and WEIGHT (g): Similar measurements as DB1. COMPLETE SHRIMP IMAGES: Presence (1) or absence (0) of full-body images.
3) DB3 contains 1719 PNG images of 50 shrimps, with its Excel file, DB3_INDUSTRIAL_FARM_2_C2, documenting: SAMPLE: Number of shrimps (50 entries or rows). Measurements and categories identical to DB2.
4) DB4 encompasses 635 PNG images of 20 shrimps, detailed in the Excel file DB4_ACADEMIC_POND_S1. This includes: SAMPLE: Number of shrimps (20 entries or rows). CEPHALOTHORAX (cm), LENGTH (cm), WEIGHT (g), and COMPLETE SHRIMP IMAGES: Documented as in other datasets.
5) DB5 includes 661 PNG images of 20 shrimps, with DB5_ACADEMIC_POND_S2 as the corresponding Excel file. The file mirrors the structure and measurements of DB4.
The images for each foler are named "sm_n", where m is the number of shrimp sample and n is the number of picture of that shrimp. This carefully structured dataset provides comprehensive biometric data on shrimps, facilitating the development of algorithms aimed at non-invasive measurement techniques. This will likely be pivotal in enhancing the precision of biomass estimation in aquaculture farming, utilizing advanced statistical morphology analysis and machine learning techniques.
CHANGES FROM VERSION 1:
The cephalothorax metric is the length rather than the width. That was an error in the first version. The name in the columns also had a typo, which has been corrected (from CEPHALOTORAX to CEPHALOTHORAX).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
A preliminary analysis of the data already has been discussed in 10.2139/ssrn.3812299.
Raw data
Raw measurement data is in the Excel files `day*_raw.xlsx`.
Model
Covariate and label scaler objects are serialized in joblib format in the following files:
Checkpoints of the models are in the `*.pth.tar` files. An example for loading the models is:
from pyprocessta.model.tcn import TCNModelDropout
model_cov = TCNModelDropout(
input_chunk_length=8,
output_chunk_length=1,
num_layers=5,
num_filters=16,
kernel_size=6,
dropout=0.3,
weight_norm=True,
batch_size=32,
n_epochs=100,
log_tensorboard=True,
optimizer_kwargs={"lr": 2e-4},
)
model_cov.load_from_checkpoint('20210814_2amp_pip_model_reduced_feature_set_darts')
which assumes that the checkpoints are placed as `model_best.pth.tar` in a folder called `20210812_2amp_pip_model_reduced_feature_set_darts`.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The dataset contains two sub-folders:1. Load sharing 2. function approximation1. Load sharing ("data.zip"):Each folder corresponds with a subject.the monopolar, single differential and double differential data issaved in the corresponding sub-folders 'mono', 'sd' and 'dd' respectively.In each subfolder, the data is saved as '30.mat','50mat',or '70.mat' corresponding with 30%,50% or 70% MVC isometric flexion-extension.The recording protocol can be found the word file 'report.doc' in this folder inThe subsection: experimental recording.Structure of the '.mat' files :They all have the same structure:Raw_Torque : The measured Torque in ADC numbersstructure 'TAB_ARV' , the EMG envelopes for 'BB', 'BR', 'TM', 'TL' (Read report for the methods and acronyms).2. function approximation ("fun_approx.zip")Multiple benchmark examples including a piecewise single variable function, five nonlinear dynamic plants with various nonlinear structures, the chaotic Mackey Glass time series (with different signal to noise ratio (SNR) and various chaotic degree) and the real-world Box-Jenkins gas furnace system are considered to verify the effectiveness of the proposed FJWNN model. The description ("info.pdf") and the entire simulated data as well as the results of our method on the training and test sets (in excel files) were provided.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This Excel-file contains a set of data from a scoping review on methods for fire evacuation training in buildings. The review follows the PRISMA approach (Transparent Reporting of Systematic Reviews and Meta-Analyses) and systematically identifies 73 sources among scientific literature published between 1997 and 2022. The dataset contains information excerpted through a custom template on the employed training methods and technology, study information, participants, and contents of the discussion of the 73 sources of evidence that were identified in the systematic review process.
This dataset, released by DoD, contains geographic information for major installations, ranges, and training areas in the United States and its territories. This release integrates site information about DoD installations, training ranges, and land assets in a format which can be immediately put to work in commercial geospatial information systems. Homeland Security/Homeland Defense, law enforcement, and readiness planners will benefit from immediate access to DoD site location data during emergencies. Land use planning and renewable energy planning will also benefit from use of this data. Users are advised that the point and boundary location datasets are intended for planning purposes only, and do not represent the legal or surveyed land parcel boundaries.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
USAGE LICENSE
Creative Commons Attribution 4.0 International Public License
FILE CONTENTS
dataset
|_ foods folder containing files about foods dataset (starting data get from giallozafferano.it)
| |_ CSV folder containing files in .csv format
| | |_ categories.csv
| | |_ foodDataset.xlsx each of other .csv files derived from sheet of this excel file
| | |_ ingredients.csv
| | |_ ingredientsClasses.csv
| | |_ ingredientsMetaclasses.csv
| | |_ preparations.csv
| | |_ recipes.csv
| |_ TXT folder containing files in .txt format
| |_ scorpored values folder containing values in textData scorpored by type of data
| | |_ category-cost-difficulty.txt
| | |_ ingredients.txt
| | |_ names.txt
| | |_ preparations.txt
| | |_ preparationTime.txt
| |_ textData.txt .txt version of dataset/foods/CSV/foodDataset.xlsx file
|_ survey_answers folder containing the files about the results of the surveys on the food preferences of the dataset
| |_ sorts folder containing the results of three questions of the survey in which the users were asked to sort some foods by preference
| | |_ sort1.csv each of the three file csv contain the survey id, and then the food ordered by the user (each row represent the answer of a user)
| | |_ sort2.csv
| | |_ sort3.csv
| |_ answers.csv results of the surveys (TRANSLATION NOTE: survey ID => ID-sondaggio; user ID => ID-utente; answer ID => ID-risposta)
| |_ labels.txt labels of the samples in samples.txt
| |_ samples.txt couples of food extracted from the sorts in dataset/survey_answers/sorts/ translating the preference sorts in the form of pairwise comparison *
|_ readme.txt
* Each sample has the form
RECIPES DATASET DESCRIPTION
the description refer to dataset/foods/CSV/foodDataset.xlsx
Name italian name of the recipe
ID ID associated to the recipe
Link link of where the food data has been get
Category Name mame of the category (Starter, Complete Meal, First Course, Second Course, Savoury Cake)
Category ID ID associated to the category.
Cost cost indicator of the food, discrete interval from 1 to 5
Difficulty difficuly indicator of the food, discrete interval from 1 to 4
Preparation Time integer positive number that indicates preparation time of the food expressed in minutes
Ingredient english name of an ingredient of the recipe
Ingredient ID ID associated to the ingredient.
W weight that the ingredient has in the composition of the interested recipe
NOTE: the last three columns repeats for 18 times, leaving empty spaces when the recipe has no ingredients other than those already entered
Preparation english name of a preparation performed on the recipe
Preparation ID ID associated to the preparation.
W weight that the preparation has in the composition of the interested recipe
NOTE: the last three columns repeats for 5 times similarly to ingredients
in other sheet of the file are reportet all the ingredients, divided in classes and metaclasses, preparations and categories
NOTE: in dataset/foods/TXT/textData.txt ingredients and preparation has been vectorized as follow:
- each element of the ingredient vector represent the weight of the ingredient class in the recipe. The weight of an ingredient class in a recipe is collected by sum up the weight of the ingredients owned by that particular ingredient class in the recipe.
- each element of the preparation vector represent the weight of the preparation in the recipe.
# PREFERENCES DATASET DESCRIPTION
In the file *dataset/survey_answer/samples.txt* are reported 54 user's orderings in the form of *pairwise comparison*. Thus for each row correspond the ordering of a user. The ordering is written in the form of pairwise comparison, so each element of the ordering are paired with all others (avoiding simmetries).
For instance, given the recipes as their ID:
1;2;3
becomes:
1,2;1,3;2,3
The file is written following the *.csv* format.
In the file *dataset/survey_answer/lables.txt* are written the correspongin labels of the couples.
HOW TO CITE:
D.Fossemò, F.Mignosi, L.Raggioli, M.Spezialetti, F.A.D'Asaro. Using Inductive Logic Programming to globally approximate Neural Networks for preference learning: challenges and preliminary results. BEWARE-22, co-located with AIxIA 2022, November 28-December 2, 2022, University of Udine, Udine, Italy.
The national Survey of Information Technology Occupations, conducted in 2002 on behalf of the Software Human Resource Council (SHRC), is the first to shed light on the IT labour market in both the public and private sectors. IT employers and employees were surveyed separately, but simultaneously. The employer survey consisted of questions on occupation profile, hiring and recruitment, employee retention, and training and development. The employee survey had questions on the occupational history of IT employees, salary, education, training, and skills. The target population consisted of private sector locations with at least six employees, and with at least one employee working in IT, as well as public-sector divisions with at least one IT employee. The NSITO is a three-stage survey. First, a sample of employers in both private and public sectors is selected; this is stage 1. The questions asked in stage 1 are essentially about the IT workforce. Stage 2 involves selecting a maximum of two occupations (out of 25) per employer. The questions asked in this stage deal with hiring, training and retaining employees in the selected occupations. In stage 3, a maximum of 10 employees are sampled for each occupation selected in stage 2. Among the subjects that employees are asked about are training, previous employment and demographic characteristics. For National Survey of Information Technology Occupations data, refer to Statistics Canada. Access data here
Sentinel2GlobalLULC is a deep learning-ready dataset of RGB images from the Sentinel-2 satellites designed for global land use and land cover (LULC) mapping. Sentinel2GlobalLULC v2.1 contains 194,877 images in GeoTiff and JPEG format corresponding to 29 broad LULC classes. Each image has 224 x 224 pixels at 10 m spatial resolution and was produced by assigning the 25th percentile of all available observations in the Sentinel-2 collection between June 2015 and October 2020 in order to remove atmospheric effects (i.e., clouds, aerosols, shadows, snow, etc.). A spatial purity value was assigned to each image based on the consensus across 15 different global LULC products available in Google Earth Engine (GEE). Our dataset is structured into 3 main zip-compressed folders, an Excel file with a dictionary for class names and descriptive statistics per LULC class, and a python script to convert RGB GeoTiff images into JPEG format. The first folder called "Sentinel2LULC_GeoTiff.zip" contains 29 zip-compressed subfolders where each one corresponds to a specific LULC class with hundreds to thousands of GeoTiff Sentinel-2 RGB images. The second folder called "Sentinel2LULC_JPEG.zip" contains 29 zip-compressed subfolders with a JPEG formatted version of the same images provided in the first main folder. The third folder called "Sentinel2LULC_CSV.zip" includes 29 zip-compressed CSV files with as many rows as provided images and with 12 columns containing the following metadata (this same metadata is provided in the image filenames): Land Cover Class ID: is the identification number of each LULC class Land Cover Class Short Name: is the short name of each LULC class Image ID: is the identification number of each image within its corresponding LULC class Pixel purity Value: is the spatial purity of each pixel for its corresponding LULC class calculated as the spatial consensus across up to 15 land-cover products GHM Value: is the spatial average of the Global Human Modification index (gHM) for each image Latitude: is the latitude of the center point of each image Longitude: is the longitude of the center point of each image Country Code: is the Alpha-2 country code of each image as described in the ISO 3166 international standard. To understand the country codes, we recommend the user to visit the following website where they present the Alpha-2 code for each country as described in the ISO 3166 international standard:https: //www.iban.com/country-codes Administrative Department Level1: is the administrative level 1 name to which each image belongs Administrative Department Level2: is the administrative level 2 name to which each image belongs Locality: is the name of the locality to which each image belongs Number of S2 images : is the number of found instances in the corresponding Sentinel-2 image collection between June 2015 and October 2020, when compositing and exporting its corresponding image tile For seven LULC classes, we could not export from GEE all images that fulfilled a spatial purity of 100% since there were millions of them. In this case, we exported a stratified random sample of 14,000 images and provided an additional CSV file with the images actually contained in our dataset. That is, for these seven LULC classes, we provide these 2 CSV files: A CSV file that contains all exported images for this class A CSV file that contains all images available for this class at spatial purity of 100%, both the ones exported and the ones not exported, in case the user wants to export them. These CSV filenames end with "including_non_downloaded_images". To clearly state the geographical coverage of images available in this dataset, we included in the version v2.1, a compressed folder called "Geographic_Representativeness.zip". This zip-compressed folder contains a csv file for each LULC class that provides the complete list of countries represented in that class. Each csv file has two columns, the first one gives the country code and the second one gives the number of images provided in that country for that LULC class. In addition to these 29 csv files, we provided another csv file that maps each ISO Alpha-2 country code to its original full country name. © Sentinel2GlobalLULC Dataset by Yassir Benhammou, Domingo Alcaraz-Segura, Emilio Guirado, Rohaifa Khaldi, Boujemâa Achchab, Francisco Herrera & Siham Tabik is marked with Attribution 4.0 International (CC-BY 4.0)
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The information on the concrete dataset is as follows:1. The Dataset 1 contains 177 samples of foamed concrete. References: Given in the excel file.2. The Dataset 1A contains 34 samples of foamed concrete from the laboratory.3. The Dataset 2 contains 1133 samples of high performance concrete.The codes of machine learning models (e.g. HO-DNN, SO-ANN, and so on) are available upon request.Please send emails to tuan.nguyen@unimelb.edu.auReferences:1. Nguyen T, Kashani A, Ngo T, Bordas S. Deep neural network with high‐order neuron for the prediction of foamed concrete strength. Comput Aided Civ Inf. 2018;1–17. https://doi.org/10.1111/mice.124222. Dac-Khuong Bui, Tuan Nguyen, Jui-Sheng Chou, H. Nguyen-Xuan, Tuan Duc Ngo,A modified firefly algorithm-artificial neural network expert system for predicting compressive and tensile strength of high-performance concrete,Construction and Building Materials, Volume 180, 2018, Pages 320-333, ISSN 0950-0618, https://doi.org/10.1016/j.conbuildmat.2018.05.201. (http://www.sciencedirect.com/science/article/pii/S0950061818312868)
Background
The Labour Force Survey (LFS) is a unique source of information using international definitions of employment and unemployment and economic inactivity, together with a wide range of related topics such as occupation, training, hours of work and personal characteristics of household members aged 16 years and over. It is used to inform social, economic and employment policy. The LFS was first conducted biennially from 1973-1983. Between 1984 and 1991 the survey was carried out annually and consisted of a quarterly survey conducted throughout the year and a 'boost' survey in the spring quarter (data were then collected seasonally). From 1992 quarterly data were made available, with a quarterly sample size approximately equivalent to that of the previous annual data. The survey then became known as the Quarterly Labour Force Survey (QLFS). From December 1994, data gathering for Northern Ireland moved to a full quarterly cycle to match the rest of the country, so the QLFS then covered the whole of the UK (though some additional annual Northern Ireland LFS datasets are also held at the UK Data Archive). Further information on the background to the QLFS may be found in the documentation.New reweighting policy
Following the new reweighting policy ONS has reviewed the latest population estimates made available during 2019 and have decided not to carry out a 2019 LFS and APS reweighting exercise. Therefore, the next reweighting exercise will take place in 2020. These will incorporate the 2019 Sub-National Population Projection data (published in May 2020) and 2019 Mid-Year Estimates (published in June 2020). It is expected that reweighted Labour Market aggregates and microdata will be published towards the end of 2020/early 2021.
Secure Access QLFS household data
Up to 2015, the LFS household datasets were produced twice a year (April-June and October-December) from the corresponding quarter's individual-level data. From January 2015 onwards, they are now produced each quarter alongside the main QLFS. The household datasets include all the usual variables found in the individual-level datasets, with the exception of those relating to income, and are intended to facilitate the analysis of the economic activity patterns of whole households. It is recommended that the existing individual-level LFS datasets continue to be used for any analysis at individual level, and that the LFS household datasets be used for analysis involving household or family-level data. For some quarters, users should note that all missing values in the data are set to one '-10' category instead of the separate '-8' and '-9' categories. For that period, the ONS introduced a new imputation process for the LFS household datasets and it was necessary to code the missing values into one new combined category ('-10'), to avoid over-complication. From the 2013 household datasets, the standard -8 and -9 missing categories have been reinstated.
Secure Access household datasets for the QLFS are available from 2002 onwards, and include additional, detailed variables not included in the standard 'End User Licence' (EUL) versions. Extra variables that typically can be found in the Secure Access versions but not in the EUL versions relate to: geography; date of birth, including day; education and training; household and family characteristics; employment; unemployment and job hunting; accidents at work and work-related health problems; nationality, national identity and country of birth; occurence of learning difficulty or disability; and benefits.
Prospective users of a Secure Access version of the QLFS will need to fulfil additional requirements, commencing with the completion of an extra application form to demonstrate to the data owners exactly why they need access to the extra, more detailed variables, in order to obtain permission to use that version. Secure Access users must also complete face-to-face training and agree to Secure Access' User Agreement (see 'Access' section below). Therefore, users are encouraged to download and inspect the EUL version of the data prior to ordering the Secure Access version.
LFS Documentation
The documentation available from the Archive to accompany LFS datasets largely consists of each volume of the User Guide including the appropriate questionnaires for the years concerned. However, LFS volumes are updated periodically by ONS, so users are advised to check the ONS LFS User Guidance pages before commencing analysis.
The study documentation presented in the Documentation section includes the most recent documentation for the LFS only, due to available space. Documentation for previous years is provided alongside the data for access and is also available upon request.
Review of imputation methods for LFS Household data - changes to missing values
A review of the imputation methods used in LFS Household and Family analysis resulted in a change from the January-March 2015 quarter onwards. It was no longer considered appropriate to impute any personal characteristic variables (e.g. religion, ethnicity, country of birth, nationality, national identity, etc.) using the LFS donor imputation method. This method is primarily focused to ensure the 'economic status' of all individuals within a household is known, allowing analysis of the combined economic status of households. This means that from 2015 larger amounts of missing values ('-8'/-9') will be present in the data for these personal characteristic variables than before. Therefore if users need to carry out any time series analysis of households/families which also includes personal characteristic variables covering this time period, then it is advised to filter off 'ioutcome=3' cases from all periods to remove this inconsistent treatment of non-responders.
Variables DISEA and LNGLST
Dataset A08 (Labour market status of disabled people) which ONS suspended due to an apparent discontinuity between April to June 2017 and July to September 2017 is now available. As a result of this apparent discontinuity and the inconclusive investigations at this stage, comparisons should be made with caution between April to June 2017 and subsequent time periods. However users should note that the estimates are not seasonally adjusted, so some of the change between quarters could be due to seasonality. Further recommendations on historical comparisons of the estimates will be given in November 2018 when ONS are due to publish estimates for July to September 2018.
An article explaining the quality assurance investigations that have been conducted so far is available on the ONS Methodology webpage. For any queries about Dataset A08 please email Labour.Market@ons.gov.uk.
Latest Edition Information
For the sixteenth
edition (November 2023), one quarterly data file covering the time period
April-June, 2023, along with a new Excel variable catalogue for 2023 and a
documentation form, have been added to the study.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
TWristAR is a small three subject dataset recorded using an e4 wristband. Each subject performed six scripted activities: upstairs/downstairs, walk/jog, and sit/stand. Each activity except stairs was performed for one minute a total of three times alternating between the pairs. Subjects 1 & 2 also completed a walking sequence of approximately 10 minutes. The dataset contains motion (accelerometer) data, temperature, electrodermal activity, and heart rate data. The .csv file associated with each datafile contains timing and labeling information and was built using the provided Excel files.
Each two activity session was recorded using a downward facing action camera. This video was used to generate the labels and is provided to investigate any data anomalies, especially for the free-form long walk. For privacy reasons only the sub1_stairs video contains audio.
The Jupyter notebook processes the acceleration data and performs hold-one-subject out evaluation of a 1D-CNN. Example results from a run performed on a google colab GPU instance (w/o GPU the training time increases to about 90 seconds per pass):
Hold-one-subject-out results
Train Sub
Test Sub
Accuracy
Training Time (HH:MM:SS)
[1,2]
[3]
0.757
00:00:12
[2,3]
[1]
0.849
00:00:14
[1,3]
[2]
0.800
00:00:11
This notebook can also be run in colab here. This video describes the processing https://mediaflo.txstate.edu/Watch/e4_data_processing.
We hope you find this a useful dataset with end-to-end code. We have several papers in process and would appreciate your citation of the dataset if you use it in your work.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
IntroductionClinical governance outlines duties and responsibilities as well as indicators of the actions towards best possible patient outcomes. However, evidence of outcomes on clinical governance interventions is limited in South Africa. This study determined knowledge of clinical staff about the existence of clinical governance protocols/tools that are utilised in selected South African hospitals.MethodsA cross-sectional study conducted among randomly sampled clinical staff at Nelson Mandela Academic (NMAH), St Elizabeth in the Eastern Cape Province and, Rob Ferreira (RFH) and Themba Hospitals in the Mpumalanga Province of South Africa. A self-administered survey questionnaire was used to collect demographic information and quality improvement protocols/tools in existence at the hospitals. Data were captured in Excel spreadsheet and analysed with STATA. Knowledge was generated based on the staff member’s score for the 12 questions assessed.ResultsA total of 720 participants were recruited of which 377 gave consent to participate. Overall, 8.5% (32/377) of the participants got none or only one correct out of the 12 protocols/tools; and 65.5% (247/377) got between two and five correct. The median knowledge scores were 41.7% (interquartile range (IQR) = 16.7%) in three of the hospitals and 33.3% (IQR = 16.7%) at NMAH (p-value = 0.002). Factors associated with good knowledge included more than five years of experience, being a professional nurse compared to other nurses, not working at NMAH as well as being a medical doctor or pharmacist compared to other staff. Overall, 74.0% (279/377) of the respondents scored below 50%; this was 84.4% (92/109) at NMAH and 66.3% (55/83) at RFH and this difference was statistically significant (p-value = 0.017).ConclusionDespite clinical governance implementation, there was low knowledge of clinical governance protocols/tools among clinical staff. Therefore, providing more effective, relevant training workshops with an emphasis on importance of local ownership of the concept of clinical governance, by both management and clinical staff is of great importance.
TimeSpec4LULC is archived in 30 different ZIP files owning the name of the 29 LULC classes (one class is divided into two files since it is too large). Within each ZIP file, there exists a set of seven CSV files, each one corresponding to one of the seven spectral bands. The naming of each file follows this structure: IdOfTheClass_NameOfTheClass_ModisBand.csv
For example, for band 1 of the Barren Lands class, the filename is: 01_BarrenLands_MCD09A1b01.csv
Inside each CSV file, rows represent the collected pixels for that class. The first 11 columns contain the following metadata:
- “IdOfTheClass”: Id of the class.
- “NameOfTheClass”: Name of the class.
- “IdOfTheLevel0”: Id of the FAO-L0 (i.e., countries).
- “IdOfTheLevel1”: Id of the FAO-L1 (i.e., departments, states, or provinces depending on the country).
- “IdOfThePixel”: Id of the pixel.
- “PurityOfThePixel”: Spatial and inter-annual consensus for this class across multiple land-cover products, i.e., Purity of the pixel.
- “DataAvailability”: percentage of non-missing data per band throughout the time series.
- “Index_GHM”: average of Global Human Modification index (gHM).
- “Lat”: Latitude of the pixel center.
- “Lon”: Longitude of the pixel center.
- “.geo”: (Longitude, Latitude) of the pixel center with more precision.
And, the last 223 columns contain the 223 monthly observations of the time series for one spectral band from 2002-07 to 2021-01. Along with the dataset, an Excel file named 'Countries_Departments_FAO-GAUL' containing the FAO-L0 and the FAO-L1 Ids and names (following the FAO-GAUL standards) is provided.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Speech Emotion Recognition (SER) is a rapidly evolving field of research aimed at identifying and categorizing emotional states through the analysis of speech signals. As SER holds significant socio-cultural and commercial importance, researchers are increasingly leveraging machine learning and deep learning techniques to drive advancements in this domain. A high-quality dataset is an essential resource for SER studies in any language. Despite Urdu being the 10th most spoken language globally, there is a significant lack of robust SER datasets, creating a research gap. Existing Urdu SER datasets are often limited by their small size, narrow emotional range, and repetitive content, reducing their applicability in real-world scenarios. To address this gap, the Urdu Speech Emotion Corpus (UrSEC) was developed. This comprehensive dataset includes 3500 Urdu speech signals sourced from 10 professional actors, with an equal representation of male and female speakers from diverse age groups. The dataset encompasses seven emotional states: Angry, Fear, Boredom, Disgust, Happy, Neutral, and Sad. The speech samples were curated from a wide collection of Pakistani Urdu drama serials and telefilms available on YouTube, ensuring diversity and natural delivery. Unlike conventional datasets, which rely on predefined dialogs recorded in controlled environments, UrSEC features unique and contextually varied utterances, making it more realistic and applicable for practical applications. To ensure balance and consistency, the dataset contains 500 samples per emotional class, with 50 samples contributed by each actor for each emotion. Additionally, an accompanying Excel file provides detailed metadata for each recording, including the file name, duration, format, sample rate, actor details, emotional state, and corresponding Urdu dialog. This metadata enables researchers to efficiently organize and utilize the dataset for their specific needs. The UrSEC dataset underwent rigorous validation, integrating expert evaluation and model-based validation to ensure its reliability, accuracy, and overall suitability for advancing research and development in Urdu Speech Emotion Recognition.
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset was compiled with the ultimate goal of developing non-invasive computer vision algorithms for assessing shrimp biometrics and biomass estimation. The main folder, labeled "DATASET," contains five sub-folders—DB1, DB2, DB3, DB4, and DB5—each filled with images of shrimps. Additionally, each sub-folder is accompanied by an Excel file that includes manually measured data for the shrimps pictured. The files are named respectively: DB1_INDUSTRIAL_FARM_1, DB2_INDUSTRIAL_FARM_2_C1, DB3_INDUSTRIAL_FARM_2_C2, DB4_ACADEMIC_POND_S1, and DB5_ACADEMIC_POND_S2.
Here’s a detailed description of the contents of each sub-folder and its corresponding Excel file:
1) DB1 includes 490 PNG images of 22 shrimps taken from one pond at an industrial farm. The associated Excel file, DB1_INDUSTRIAL_FARM_1, contains columns for: SAMPLE: Reflecting the number of individual shrimps (22 entries or rows). LENGTH (cm): Measuring from the rostrum (near the eyes) to the start of the tail. WEIGHT (g): Recorded using a scale. COMPLETE SHRIMP IMAGES: Indicates if at least one full-body image is available (1) or not (0).
2) DB2 consists of 2002 PNG images of 58 shrimps. The Excel file, DB2_INDUSTRIAL_FARM_2_C1, includes: SAMPLE: Number of shrimps (58 entries or rows). CEPHALOTHORAX (cm): Total length of the cephalothorax. LENGTH (cm) and WEIGHT (g): Similar measurements as DB1. COMPLETE SHRIMP IMAGES: Presence (1) or absence (0) of full-body images.
3) DB3 contains 1719 PNG images of 50 shrimps, with its Excel file, DB3_INDUSTRIAL_FARM_2_C2, documenting: SAMPLE: Number of shrimps (50 entries or rows). Measurements and categories identical to DB2.
4) DB4 encompasses 635 PNG images of 20 shrimps, detailed in the Excel file DB4_ACADEMIC_POND_S1. This includes: SAMPLE: Number of shrimps (20 entries or rows). CEPHALOTHORAX (cm), LENGTH (cm), WEIGHT (g), and COMPLETE SHRIMP IMAGES: Documented as in other datasets.
5) DB5 includes 661 PNG images of 20 shrimps, with DB5_ACADEMIC_POND_S2 as the corresponding Excel file. The file mirrors the structure and measurements of DB4.
The images for each foler are named "sm_n", where m is the number of shrimp sample and n is the number of picture of that shrimp. This carefully structured dataset provides comprehensive biometric data on shrimps, facilitating the development of algorithms aimed at non-invasive measurement techniques. This will likely be pivotal in enhancing the precision of biomass estimation in aquaculture farming, utilizing advanced statistical morphology analysis and machine learning techniques.
CHANGES FROM VERSION 1:
The cephalothorax metric is the length rather than the width. That was an error in the first version. The name in the columns also had a typo, which has been corrected (from CEPHALOTORAX to CEPHALOTHORAX).