100+ datasets found
  1. o

    University SET data, with faculty and courses characteristics

    • openicpsr.org
    Updated Sep 12, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Under blind review in refereed journal (2021). University SET data, with faculty and courses characteristics [Dataset]. http://doi.org/10.3886/E149801V1
    Explore at:
    Dataset updated
    Sep 12, 2021
    Authors
    Under blind review in refereed journal
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This paper explores a unique dataset of all the SET ratings provided by students of one university in Poland at the end of the winter semester of the 2020/2021 academic year. The SET questionnaire used by this university is presented in Appendix 1. The dataset is unique for several reasons. It covers all SET surveys filled by students in all fields and levels of study offered by the university. In the period analysed, the university was entirely in the online regime amid the Covid-19 pandemic. While the expected learning outcomes formally have not been changed, the online mode of study could have affected the grading policy and could have implications for some of the studied SET biases. This Covid-19 effect is captured by econometric models and discussed in the paper. The average SET scores were matched with the characteristics of the teacher for degree, seniority, gender, and SET scores in the past six semesters; the course characteristics for time of day, day of the week, course type, course breadth, class duration, and class size; the attributes of the SET survey responses as the percentage of students providing SET feedback; and the grades of the course for the mean, standard deviation, and percentage failed. Data on course grades are also available for the previous six semesters. This rich dataset allows many of the biases reported in the literature to be tested for and new hypotheses to be formulated, as presented in the introduction section. The unit of observation or the single row in the data set is identified by three parameters: teacher unique id (j), course unique id (k) and the question number in the SET questionnaire (n ϵ {1, 2, 3, 4, 5, 6, 7, 8, 9} ). It means that for each pair (j,k), we have nine rows, one for each SET survey question, or sometimes less when students did not answer one of the SET questions at all. For example, the dependent variable SET_score_avg(j,k,n) for the triplet (j=Calculus, k=John Smith, n=2) is calculated as the average of all Likert-scale answers to question nr 2 in the SET survey distributed to all students that took the Calculus course taught by John Smith. The data set has 8,015 such observations or rows. The full list of variables or columns in the data set included in the analysis is presented in the attached filesection. Their description refers to the triplet (teacher id = j, course id = k, question number = n). When the last value of the triplet (n) is dropped, it means that the variable takes the same values for all n ϵ {1, 2, 3, 4, 5, 6, 7, 8, 9}.Two attachments:- word file with variables description- Rdata file with the data set (for R language).Appendix 1. Appendix 1. The SET questionnaire was used for this paper. Evaluation survey of the teaching staff of [university name] Please, complete the following evaluation form, which aims to assess the lecturer’s performance. Only one answer should be indicated for each question. The answers are coded in the following way: 5- I strongly agree; 4- I agree; 3- Neutral; 2- I don’t agree; 1- I strongly don’t agree. Questions 1 2 3 4 5 I learnt a lot during the course. ○ ○ ○ ○ ○ I think that the knowledge acquired during the course is very useful. ○ ○ ○ ○ ○ The professor used activities to make the class more engaging. ○ ○ ○ ○ ○ If it was possible, I would enroll for the course conducted by this lecturer again. ○ ○ ○ ○ ○ The classes started on time. ○ ○ ○ ○ ○ The lecturer always used time efficiently. ○ ○ ○ ○ ○ The lecturer delivered the class content in an understandable and efficient way. ○ ○ ○ ○ ○ The lecturer was available when we had doubts. ○ ○ ○ ○ ○ The lecturer treated all students equally regardless of their race, background and ethnicity. ○ ○

  2. Education and training

    • gov.uk
    • s3.amazonaws.com
    Updated Jul 16, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Education and training [Dataset]. https://www.gov.uk/government/statistical-data-sets/fe-data-library-education-and-training
    Explore at:
    Dataset updated
    Jul 16, 2020
    Dataset provided by
    GOV.UKhttp://gov.uk/
    Authors
    Department for Education
    Description

    This statistical data set includes information on education and training participation and achievements broken down into a number of reports including sector subject areas, participation by gender, age, ethnicity, disability participation.

    It also includes data on offender learning.

    Can’t find what you’re looking for?

    If you need help finding data please refer to the table finder tool to search for specific breakdowns available for FE statistics.

    Academic year 2019 to 2020 (reported to date)

    https://assets.publishing.service.gov.uk/media/5f0c1995e90e0703146d2393/201920-July_PT_ET_part_ach_demog_LAD.xlsx">Education and training aim participation and achievement demographics by sector subject area and local authority district: academic year 2019 to 2020 Q3 (August 2019 to April 2020)

     <p class="gem-c-attachment_metadata"><span class="gem-c-attachment_attribute">MS Excel Spreadsheet</span>, <span class="gem-c-attachment_attribute">33 MB</span></p>
    
    
    
    
     <p class="gem-c-attachment_metadata">This file may not be suitable for users of assistive technology.</p>
     <details class="gem-c-details govuk-details govuk-!-margin-bottom-3" data-module="govuk-details gem-details ga4-event-tracker">
    

    Request an accessible format.

      If you use assistive technology (such as a screen reader) and need a version of this document in a more accessible format, please email <a href="mailto:alternative.formats@education.gov.uk" target="_blank" class="govuk-link">alternative.formats@education.gov.uk</a>. Please tell us what format you need. It will help us if you say what assistive technology you use.
    

  3. d

    Data from: Distributed Anomaly Detection using 1-class SVM for Vertically...

    • catalog.data.gov
    • data.nasa.gov
    • +1more
    Updated Dec 7, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Distributed Anomaly Detection using 1-class SVM for Vertically Partitioned Data [Dataset]. https://catalog.data.gov/dataset/distributed-anomaly-detection-using-1-class-svm-for-vertically-partitioned-data
    Explore at:
    Dataset updated
    Dec 7, 2023
    Dataset provided by
    Dashlink
    Description

    There has been a tremendous increase in the volume of sensor data collected over the last decade for different monitoring tasks. For example, petabytes of earth science data are collected from modern satellites, in-situ sensors and different climate models. Similarly, huge amount of flight operational data is downloaded for different commercial airlines. These different types of datasets need to be analyzed for finding outliers. Information extraction from such rich data sources using advanced data mining methodologies is a challenging task not only due to the massive volume of data, but also because these datasets are physically stored at different geographical locations with only a subset of features available at any location. Moving these petabytes of data to a single location may waste a lot of bandwidth. To solve this problem, in this paper, we present a novel algorithm which can identify outliers in the entire data without moving all the data to a single location. The method we propose only centralizes a very small sample from the different data subsets at different locations. We analytically prove and experimentally verify that the algorithm offers high accuracy compared to complete centralization with only a fraction of the communication cost. We show that our algorithm is highly relevant to both earth sciences and aeronautics by describing applications in these domains. The performance of the algorithm is demonstrated on two large publicly available datasets: (1) the NASA MODIS satellite images and (2) a simulated aviation dataset generated by the ‘Commercial Modular Aero-Propulsion System Simulation’ (CMAPSS).

  4. d

    Protected Areas Database of the United States (PAD-US) 3.0 Vector Analysis...

    • catalog.data.gov
    Updated Jul 6, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. Geological Survey (2024). Protected Areas Database of the United States (PAD-US) 3.0 Vector Analysis and Summary Statistics [Dataset]. https://catalog.data.gov/dataset/protected-areas-database-of-the-united-states-pad-us-3-0-vector-analysis-and-summary-stati
    Explore at:
    Dataset updated
    Jul 6, 2024
    Dataset provided by
    United States Geological Surveyhttp://www.usgs.gov/
    Area covered
    United States
    Description

    Spatial analysis and statistical summaries of the Protected Areas Database of the United States (PAD-US) provide land managers and decision makers with a general assessment of management intent for biodiversity protection, natural resource management, and recreation access across the nation. The PAD-US 3.0 Combined Fee, Designation, Easement feature class (with Military Lands and Tribal Areas from the Proclamation and Other Planning Boundaries feature class) was modified to remove overlaps, avoiding overestimation in protected area statistics and to support user needs. A Python scripted process ("PADUS3_0_CreateVectorAnalysisFileScript.zip") associated with this data release prioritized overlapping designations (e.g. Wilderness within a National Forest) based upon their relative biodiversity conservation status (e.g. GAP Status Code 1 over 2), public access values (in the order of Closed, Restricted, Open, Unknown), and geodatabase load order (records are deliberately organized in the PAD-US full inventory with fee owned lands loaded before overlapping management designations, and easements). The Vector Analysis File ("PADUS3_0VectorAnalysisFile_ClipCensus.zip") associated item of PAD-US 3.0 Spatial Analysis and Statistics ( https://doi.org/10.5066/P9KLBB5D ) was clipped to the Census state boundary file to define the extent and serve as a common denominator for statistical summaries. Boundaries of interest to stakeholders (State, Department of the Interior Region, Congressional District, County, EcoRegions I-IV, Urban Areas, Landscape Conservation Cooperative) were incorporated into separate geodatabase feature classes to support various data summaries ("PADUS3_0VectorAnalysisFileOtherExtents_Clip_Census.zip") and Comma-separated Value (CSV) tables ("PADUS3_0SummaryStatistics_TabularData_CSV.zip") summarizing "PADUS3_0VectorAnalysisFileOtherExtents_Clip_Census.zip" are provided as an alternative format and enable users to explore and download summary statistics of interest (Comma-separated Table [CSV], Microsoft Excel Workbook [.XLSX], Portable Document Format [.PDF] Report) from the PAD-US Lands and Inland Water Statistics Dashboard ( https://www.usgs.gov/programs/gap-analysis-project/science/pad-us-statistics ). In addition, a "flattened" version of the PAD-US 3.0 combined file without other extent boundaries ("PADUS3_0VectorAnalysisFile_ClipCensus.zip") allow for other applications that require a representation of overall protection status without overlapping designation boundaries. The "PADUS3_0VectorAnalysis_State_Clip_CENSUS2020" feature class ("PADUS3_0VectorAnalysisFileOtherExtents_Clip_Census.gdb") is the source of the PAD-US 3.0 raster files (associated item of PAD-US 3.0 Spatial Analysis and Statistics, https://doi.org/10.5066/P9KLBB5D ). Note, the PAD-US inventory is now considered functionally complete with the vast majority of land protection types represented in some manner, while work continues to maintain updates and improve data quality (see inventory completeness estimates at: http://www.protectedlands.net/data-stewards/ ). In addition, changes in protected area status between versions of the PAD-US may be attributed to improving the completeness and accuracy of the spatial data more than actual management actions or new acquisitions. USGS provides no legal warranty for the use of this data. While PAD-US is the official aggregation of protected areas ( https://www.fgdc.gov/ngda-reports/NGDA_Datasets.html ), agencies are the best source of their lands data.

  5. S

    Data from: CADDI: An in-Class Activity Detection Dataset using IMU data from...

    • scidb.cn
    • observatorio-cientifico.ua.es
    Updated May 28, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Luis Marquez-Carpintero; Sergio Suescun-Ferrandiz; Monica Pina-Navarro; Francisco Gomez-Donoso; Miguel Cazorla (2024). CADDI: An in-Class Activity Detection Dataset using IMU data from low-cost sensors [Dataset]. http://doi.org/10.57760/sciencedb.08377
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    May 28, 2024
    Dataset provided by
    Science Data Bank
    Authors
    Luis Marquez-Carpintero; Sergio Suescun-Ferrandiz; Monica Pina-Navarro; Francisco Gomez-Donoso; Miguel Cazorla
    Description

    Data DescriptionThe CADDI dataset is designed to support research in in-class activity recognition using IMU data from low-cost sensors. It provides multimodal data capturing 19 different activities performed by 12 participants in a classroom environment, utilizing both IMU sensors from a Samsung Galaxy Watch 5 and synchronized stereo camera images. This dataset enables the development and validation of activity recognition models using sensor fusion techniques.Data Generation ProceduresThe data collection process involved recording both continuous and instantaneous activities that typically occur in a classroom setting. The activities were captured using a custom setup, which included:A Samsung Galaxy Watch 5 to collect accelerometer, gyroscope, and rotation vector data at 100Hz.A ZED stereo camera capturing 1080p images at 25-30 fps.A synchronized computer acting as a data hub, receiving IMU data and storing images in real-time.A D-Link DSR-1000AC router for wireless communication between the smartwatch and the computer.Participants were instructed to arrange their workspace as they would in a real classroom, including a laptop, notebook, pens, and a backpack. Data collection was performed under realistic conditions, ensuring that activities were captured naturally.Temporal and Spatial ScopeThe dataset contains a total of 472.03 minutes of recorded data.The IMU sensors operate at 100Hz, while the stereo camera captures images at 25-30Hz.Data was collected from 12 participants, each performing all 19 activities multiple times.The geographical scope of data collection was Alicante, Spain, under controlled indoor conditions.Dataset ComponentsThe dataset is organized into JSON and PNG files, structured hierarchically:IMU Data: Stored in JSON files, containing:Samsung Linear Acceleration Sensor (X, Y, Z values, 100Hz)LSM6DSO Gyroscope (X, Y, Z values, 100Hz)Samsung Rotation Vector (X, Y, Z, W quaternion values, 100Hz)Samsung HR Sensor (heart rate, 1Hz)OPT3007 Light Sensor (ambient light levels, 5Hz)Stereo Camera Images: High-resolution 1920×1080 PNG files from left and right cameras.Synchronization: Each IMU data record and image is timestamped for precise alignment.Data StructureThe dataset is divided into continuous and instantaneous activities:Continuous Activities (e.g., typing, writing, drawing) were recorded for 210 seconds, with the central 200 seconds retained.Instantaneous Activities (e.g., raising a hand, drinking) were repeated 20 times per participant, with data captured only during execution.The dataset is structured as:/continuous/subject_id/activity_name/ /camera_a/ → Left camera images /camera_b/ → Right camera images /sensors/ → JSON files with IMU data

    /instantaneous/subject_id/activity_name/repetition_id/ /camera_a/ /camera_b/ /sensors/ Data Quality & Missing DataThe smartwatch buffers 100 readings per second before sending them, ensuring minimal data loss.Synchronization latency between the smartwatch and the computer is negligible.Not all IMU samples have corresponding images due to different recording rates.Outliers and anomalies were handled by discarding incomplete sequences at the start and end of continuous activities.Error Ranges & LimitationsSensor data may contain noise due to minor hand movements.The heart rate sensor operates at 1Hz, limiting its temporal resolution.Camera exposure settings were automatically adjusted, which may introduce slight variations in lighting.File Formats & Software CompatibilityIMU data is stored in JSON format, readable with Python’s json library.Images are in PNG format, compatible with all standard image processing tools.Recommended libraries for data analysis:Python: numpy, pandas, scikit-learn, tensorflow, pytorchVisualization: matplotlib, seabornDeep Learning: Keras, PyTorchPotential ApplicationsDevelopment of activity recognition models in educational settings.Study of student engagement based on movement patterns.Investigation of sensor fusion techniques combining visual and IMU data.This dataset represents a unique contribution to activity recognition research, providing rich multimodal data for developing robust models in real-world educational environments.CitationIf you find this project helpful for your research, please cite our work using the following bibtex entry:@misc{marquezcarpintero2025caddiinclassactivitydetection, title={CADDI: An in-Class Activity Detection Dataset using IMU data from low-cost sensors}, author={Luis Marquez-Carpintero and Sergio Suescun-Ferrandiz and Monica Pina-Navarro and Miguel Cazorla and Francisco Gomez-Donoso}, year={2025}, eprint={2503.02853}, archivePrefix={arXiv}, primaryClass={cs.CV}, url={https://arxiv.org/abs/2503.02853}, }

  6. Z

    Data from: ImageNet-Patch: A Dataset for Benchmarking Machine Learning...

    • data.niaid.nih.gov
    • zenodo.org
    Updated Jun 30, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ambra Demontis (2022). ImageNet-Patch: A Dataset for Benchmarking Machine Learning Robustness against Adversarial Patches [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_6568777
    Explore at:
    Dataset updated
    Jun 30, 2022
    Dataset provided by
    Ambra Demontis
    Angelo Sotgiu
    Battista Biggio
    Fabio Roli
    Luca Demetrio
    Daniele Angioni
    Maura Pintor
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Adversarial patches are optimized contiguous pixel blocks in an input image that cause a machine-learning model to misclassify it. However, their optimization is computationally demanding and requires careful hyperparameter tuning. To overcome these issues, we propose ImageNet-Patch, a dataset to benchmark machine-learning models against adversarial patches. It consists of a set of patches optimized to generalize across different models and applied to ImageNet data after preprocessing them with affine transformations. This process enables an approximate yet faster robustness evaluation, leveraging the transferability of adversarial perturbations.

    We release our dataset as a set of folders indicating the patch target label (e.g., banana), each containing 1000 subfolders as the ImageNet output classes.

    An example showing how to use the dataset is shown below.

    code for testing robustness of a model

    import os.path

    from torchvision import datasets, transforms, models import torch.utils.data

    class ImageFolderWithEmptyDirs(datasets.ImageFolder): """ This is required for handling empty folders from the ImageFolder Class. """

    def find_classes(self, directory):
      classes = sorted(entry.name for entry in os.scandir(directory) if entry.is_dir())
      if not classes:
        raise FileNotFoundError(f"Couldn't find any class folder in {directory}.")
      class_to_idx = {cls_name: i for i, cls_name in enumerate(classes) if
              len(os.listdir(os.path.join(directory, cls_name))) > 0}
      return classes, class_to_idx
    

    extract and unzip the dataset, then write top folder here

    dataset_folder = 'data/ImageNet-Patch'

    available_labels = { 487: 'cellular telephone', 513: 'cornet', 546: 'electric guitar', 585: 'hair spray', 804: 'soap dispenser', 806: 'sock', 878: 'typewriter keyboard', 923: 'plate', 954: 'banana', 968: 'cup' }

    select folder with specific target

    target_label = 954

    dataset_folder = os.path.join(dataset_folder, str(target_label)) normalizer = transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]) transforms = transforms.Compose([ transforms.ToTensor(), normalizer ])

    dataset = ImageFolderWithEmptyDirs(dataset_folder, transform=transforms) model = models.resnet50(pretrained=True) loader = torch.utils.data.DataLoader(dataset, shuffle=True, batch_size=5) model.eval()

    batches = 10 correct, attack_success, total = 0, 0, 0 for batch_idx, (images, labels) in enumerate(loader): if batch_idx == batches: break pred = model(images).argmax(dim=1) correct += (pred == labels).sum() attack_success += sum(pred == target_label) total += pred.shape[0]

    accuracy = correct / total attack_sr = attack_success / total

    print("Robust Accuracy: ", accuracy) print("Attack Success: ", attack_sr)

  7. DCASE-2023-TASK-5

    • kaggle.com
    zip
    Updated Jun 5, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Víctor Aguado (2023). DCASE-2023-TASK-5 [Dataset]. https://www.kaggle.com/datasets/aguado/dcase-2023-task-5
    Explore at:
    zip(7712922302 bytes)Available download formats
    Dataset updated
    Jun 5, 2023
    Authors
    Víctor Aguado
    Description

    Introduction

    This task focuses on sound event detection in a few-shot learning setting for animal (mammal and bird) vocalisations. Participants will be expected to create a method that can extract information from five exemplar vocalisations (shots) of mammals or birds and detect and classify sounds in field recordings.

    For more info please reffer to the official website: https://dcase.community/challenge2023/task-few-shot-bioacoustic-event-detection

    Description

    Few-shot learning is a highly promising paradigm for sound event detection. It is also an extremely good fit to the needs of users in bioacoustics, in which increasingly large acoustic datasets commonly need to be labelled for events of an identified category (e.g. species or call-type), even though this category might not be known in other datasets or have any yet-known label. While satisfying user needs, this will also benchmark few-shot learning for the wider domain of sound event detection (SED).

    Few-shot learning describes tasks in which an algorithm must make predictions given only a few instances of each class, contrary to standard supervised learning paradigm. The main objective is to find reliable algorithms that are capable of dealing with data sparsity, class imbalance and noisy/busy environments. Few-shot learning is usually studied using N-way-K-shot classification, where N denotes the number of classes and K the number of examples for each class.

    Some reasons why few-shot learning has been of increasing interest:

    Scarcity of supervised data can lead to unreliable generalisations of machine learning models. Explicitly labeling a huge dataset can be costly both in time and resources. Fixed ontologies or class labels used in SED and other DCASE tasks are often a poor fit to a given user’s goal. Development Set The development set is pre-split into training and validation sets. The training set consists of five sub-folders deriving from a different source each. Along with the audio files multi-class annotations are provided for each. The validation set consists of two sub-folders deriving from a different source each, with a single-class (class of interest) annotation file provided for each audio file.

    Training Set

    The training set contains four different sub-folders (BV, HV, JD, MT,WMW). Statistics are given overall and specific for each sub-folder.

    Overall Statistics Values Number of audio recordings 174 Total duration 21 hours Total classes (excl. UNK) 47 Total events (excl. UNK) 14229

    BV

    The BirdVox-DCASE-10h (BV for short) contains five audio files from four different autonomous recording units, each lasting two hours. These autonomous recording units are all located in Tompkins County, New York, United States. Furthermore, they follow the same hardware specification: the Recording and Observing Bird Identification Node (ROBIN) developed by the Cornell Lab of Ornithology. Andrew Farnsworth, an expert ornithologist, has annotated these recordings for the presence of flight calls from migratory passerines, namely: American sparrows, cardinals, thrushes, and warblers. In total, the annotator found 2,662 from 11 different species. We estimate these flight calls to have a duration of 150 milliseconds and a fundamental frequency between 2 kHz and 10 kHz.

    Statistics Values Number of audio recordings 5 Total duration 10 hours Total classes (excl. UNK) 11 Total events (excl. UNK) 9026 Ratio event/duration 0.04 Sampling rate 24,000 Hz

    HT

    Spotted hyenas are a highly social species that live in "fission-fusion" groups where group members range alone or in smaller subgroups that split and merge over time. Hyenas use a variety of types of vocalizations to coordinate with one another over both short and long distances. Spotted hyena vocalization data were recorded on custom-developed audio tags designed by Mark Johnson and integrated into combined GPS / acoustic collars (Followit Sweden AB) by Frants Jensen and Mark Johnson. Collars were deployed on female hyenas of the Talek West hyena clan at the MSU-Mara Hyena Project (directed by Kay Holekamp) in the Masai Mara, Kenya as part of a multi-species study on communication and collective behavior. Field work was carried out by Kay Holekamp, Andrew Gersick, Frants Jensen, Ariana Strandburg-Peshkin, and Benson Pion; labeling was done by Kenna Lehmann and colleagues.

    Statistics Values Number of audio recordings 5 Total duration 5 hours Total classes (excl. UNK) 3 Total events (excl. UNK) 611 Ratio events/duration 0.05 Sampling rate 6000 Hz

    JD

    Jackdaws are corvid songbirds which usually breed, forage and sleep in large groups, but form a pair bond with the same partner for life. They produce thousands of vocalisations per day, but many aspects of their vocal behaviour remained unexplored due to the difficulty in recording and assigning vocalisations to specific individuals, especia...

  8. Vocational qualifications dataset

    • gov.uk
    • s3.amazonaws.com
    Updated Mar 6, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ofqual (2025). Vocational qualifications dataset [Dataset]. https://www.gov.uk/government/statistical-data-sets/vocational-qualifications-dataset
    Explore at:
    Dataset updated
    Mar 6, 2025
    Dataset provided by
    GOV.UKhttp://gov.uk/
    Authors
    Ofqual
    Description

    This dataset covers vocational qualifications starting 2012 to present for England.

    It is updated every quarter.

    In the dataset, the number of certificates issued are rounded to the nearest 5 and values less than 5 appear as ‘Fewer than 5’ to preserve confidentiality (and a 0 represents no certificates).

    Where a qualification has been owned by more than one awarding organisation at different points in time, a separate row is given for each organisation.

    Background information as well as commentary accompanying this dataset is available separately.

    For any queries contact us at data.analytics@ofqual.gov.uk.

  9. d

    Identification and estimation of a class of household production models...

    • b2find.dkrz.de
    Updated Oct 24, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2023). Identification and estimation of a class of household production models (replication data) - Dataset - B2FIND [Dataset]. https://b2find.dkrz.de/dataset/3a48b7a6-4b32-5ad5-9d32-5f78d65f2762
    Explore at:
    Dataset updated
    Oct 24, 2023
    Description

    We consider a class of household production models characterized by a dichotomy property. In these models the amount of time spent on household production does not depend on the household utility function, conditional on household members having a paid job. We analyse the (non-parametric) identifiability of the production function and the so-called jointness function (a function describing which part of household production time is counted as pure leisure). It is shown that the models are identified in the two-adult case, but not in the single-adult case. We present an empirical application to Swedish time-allocation data. The estimates satisfy regularity conditions that were violated in previous studies and pass various specification tests. For this data set we find that male and female home production time are q-substitutes.

  10. Data from: THE RELEVANCY OF MASSIVE HEALTH EDUCATION IN THE BRAZILIAN PRISON...

    • zenodo.org
    csv, pdf
    Updated Jul 16, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Janaína L. R. da S. Valentim; Janaína L. R. da S. Valentim; Sara Dias-Trindade; Sara Dias-Trindade; Eloiza da S. G. Oliveira; Eloiza da S. G. Oliveira; José A. M. Moreira; José A. M. Moreira; Felipe Fernandes; Felipe Fernandes; Manoel Honorio Romão; Manoel Honorio Romão; Philippi S. G. de Morais; Philippi S. G. de Morais; Alexandre R. Caitano; Alexandre R. Caitano; Aline P. Dias; Aline P. Dias; Carlos A. P. Oliveira; Carlos A. P. Oliveira; Karilany D. Coutinho; Karilany D. Coutinho; Ricardo B. Ceccim; Ricardo B. Ceccim; Ricardo A. de M. Valentim; Ricardo A. de M. Valentim (2024). THE RELEVANCY OF MASSIVE HEALTH EDUCATION IN THE BRAZILIAN PRISON SYSTEM: THE COURSE "HEALTH CARE FOR PEOPLE DEPRIVED OF FREEDOM" AND ITS IMPACTS [Dataset]. http://doi.org/10.5281/zenodo.6499752
    Explore at:
    csv, pdfAvailable download formats
    Dataset updated
    Jul 16, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Janaína L. R. da S. Valentim; Janaína L. R. da S. Valentim; Sara Dias-Trindade; Sara Dias-Trindade; Eloiza da S. G. Oliveira; Eloiza da S. G. Oliveira; José A. M. Moreira; José A. M. Moreira; Felipe Fernandes; Felipe Fernandes; Manoel Honorio Romão; Manoel Honorio Romão; Philippi S. G. de Morais; Philippi S. G. de Morais; Alexandre R. Caitano; Alexandre R. Caitano; Aline P. Dias; Aline P. Dias; Carlos A. P. Oliveira; Carlos A. P. Oliveira; Karilany D. Coutinho; Karilany D. Coutinho; Ricardo B. Ceccim; Ricardo B. Ceccim; Ricardo A. de M. Valentim; Ricardo A. de M. Valentim
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Brazil
    Description

    Dataset name: asppl_dataset_v2.csv

    Version: 2.0

    Dataset period: 06/07/2018 - 01/14/2022

    Dataset Characteristics: Multivalued

    Number of Instances: 8118

    Number of Attributes: 9

    Missing Values: Yes

    Area(s): Health and education

    Sources:

    • Virtual Learning Environment of the Brazilian Health System (AVASUS) (Brasil, 2022a);

    • Brazilian Occupational Classification (CBO) (Brasil, 2022b);

    • National Registry of Health Establishments (CNES) (Brasil, 2022c);

    • Brazilian Institute of Geography and Statistics (IBGE) (Brasil, 2022e).

    Description: The data contained in the asppl_dataset_v2.csv dataset (see Table 1) originates from participants of the technology-based educational course “Health Care for People Deprived of Freedom.” The course is available on the AVASUS (Brasil, 2022a). This dataset provides elementary data for analyzing the course’s impact and reach and the profile of its participants. In addition, it brings an update of the data presented in work by Valentim et al. (2021).

    Table 1: Description of AVASUS dataset features.

    Attributes

    Description

    datatype

    Value

    gender

    Gender of the course participant.

    Categorical.

    Feminino / Masculino / Não Informado. (In English, Female, Male or Uninformed)

    course_progress

    Percentage of completion of the course.

    Numerical.

    Range from 0 to 100.

    course_evaluation

    A score given to the course by the participant.

    Numerical.

    0, 1, 2, 3, 4, 5 or NaN.

    evaluation_commentary

    Comment made by the participant about the course.

    Categorical.

    Free text or NaN.

    region

    Brazilian region in which the participant resides.

    Categorical.

    Brazilian region according to IBGE: Norte, Nordeste, Centro-Oeste, Sudeste or Sul (In English North, Northeast, Midwest, Southeast or South).

    CNES

    The CNES code refers to the health establishment where the participant works.

    Numerical.

    CNES Code or NaN.

    health_care_level

    Identification of the health care network level for which the course participant works.

    Categorical.

    “ATENCAO PRIMARIA”,

    “MEDIA COMPLEXIDADE”,

    “ALTA COMPLEXIDADE”,

    and their possible combinations.

    (In English "PRIMARY HEALTH CARE", "SECONDARY HEALTH CARE" AND "TERTIARY HEALTH CARE")

    year_enrollment

    Year in which the course participant registered.

    Numerical.

    Year (YYYY).

    CBO

    Participant occupation.

    Categorical.

    Text coded according to the Brazilian Classification of Occupations or “Indivíduo sem afiliação formal.” (In English “Individual without formal affiliation.”)

    Dataset name: prison_syphilis_and_population_brazil.csv

    Dataset period: 2017 - 2020

    Dataset Characteristics: Multivalued

    Number of Instances: 6

    Number of Attributes: 13

    Missing Values: No

    Source:

    • National Penitentiary Department (DEPEN) (Brasil, 2022d);

    Description: The data contained in the prison_syphilis_and_population_brazil.csv dataset (see Table 2) originate from the National Penitentiary Department Information System (SISDEPEN) (Brasil, 2022d). This dataset provides data on the population and prevalence of syphilis in the Brazilian prison system. In addition, it brings a rate that represents the normalized data for purposes of comparison between the populations of each region and Brazil.

    Table 2: Description of DEPEN dataset Features.

    Attributes

    Description

    datatype

    Value

    Region

    Brazilian region in which the participant resides. In addition, the sum of the regions, which refers to Brazil.

    Categorical.

    Brazil and Brazilian region according to IBGE: North, Northeast, Midwest, Southeast or South.

    syphilis_2017

    Number of syphilis cases in the prison system in 2017.

    Numerical.

    Number of syphilis cases.

    syphilis_rate_2017

    Normalized rate of syphilis cases in 2017.

    Numerical.

    Syphilis case rate.

    syphilis_2018

    Number of syphilis cases in the prison system in 2018.

    Numerical.

    Number of syphilis cases.

    syphilis_rate_2018

    Normalized rate of syphilis cases in 2018.

    Numerical.

    Syphilis case rate.

    syphilis_2019

    Number of syphilis cases in the prison system in 2019.

    Numerical.

    Number of syphilis cases.

    syphilis_rate_2019

    Normalized rate of syphilis cases in 2019.

    Numerical.

    Syphilis case rate.

    syphilis_2020

    Number of syphilis cases in the prison system in 2020.

    Numerical.

    Number of syphilis cases.

    syphilis_rate_2020

    Normalized rate of syphilis cases in 2020.

    Numerical.

    Syphilis case rate.

    pop_2017

    Prison population in 2017.

    Numerical.

    Population number.

    pop_2018

    Prison population in 2018.

    Numerical.

    Population number.

    pop_2019

    Prison population in 2019.

    Numerical.

    Population number.

    pop_2020

    Prison population in 2020.

    Numerical.

    Population number.

    Dataset name: students_cumulative_sum.csv

    Dataset period: 2018 - 2020

    Dataset Characteristics: Multivalued

    Number of Instances: 6

    Number of Attributes: 7

    Missing Values: No

    Source:

    • Virtual Learning Environment of the Brazilian Health System (AVASUS) (Brasil, 2022a);

    • Brazilian Institute of Geography and Statistics (IBGE) (Brasil, 2022e).

    Description: The data contained in the students_cumulative_sum.csv dataset (see Table 3) originate mainly from AVASUS (Brasil, 2022a). This dataset provides data on the number of students by region and year. In addition, it brings a rate that represents the normalized data for purposes of comparison between the populations of each region and Brazil. We used population data estimated by the IBGE (Brasil, 2022e) to calculate the rate.

    Table 3: Description of Students dataset Features.

  11. h

    gsm8k

    • huggingface.co
    Updated Aug 11, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    OpenAI (2022). gsm8k [Dataset]. https://huggingface.co/datasets/openai/gsm8k
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 11, 2022
    Dataset authored and provided by
    OpenAIhttp://openai.com/
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Dataset Card for GSM8K

      Dataset Summary
    

    GSM8K (Grade School Math 8K) is a dataset of 8.5K high quality linguistically diverse grade school math word problems. The dataset was created to support the task of question answering on basic mathematical problems that require multi-step reasoning.

    These problems take between 2 and 8 steps to solve. Solutions primarily involve performing a sequence of elementary calculations using basic arithmetic operations (+ − ×÷) to reach the… See the full description on the dataset page: https://huggingface.co/datasets/openai/gsm8k.

  12. d

    Soil Class - Dataset - data.govt.nz - discover and use data

    • catalogue.data.govt.nz
    Updated Oct 3, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2023). Soil Class - Dataset - data.govt.nz - discover and use data [Dataset]. https://catalogue.data.govt.nz/dataset/soil-class
    Explore at:
    Dataset updated
    Oct 3, 2023
    Description

    {{description}}

  13. Public School Characteristics - Current

    • catalog.data.gov
    • s.cnmilf.com
    • +3more
    Updated Oct 21, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    National Center for Education Statistics (NCES) (2024). Public School Characteristics - Current [Dataset]. https://catalog.data.gov/dataset/public-school-characteristics-current-340b1
    Explore at:
    Dataset updated
    Oct 21, 2024
    Dataset provided by
    National Center for Education Statisticshttps://nces.ed.gov/
    Description

    The National Center for Education Statistics' (NCES) Education Demographic and Geographic Estimate (EDGE) program develops annually updated point locations (latitude and longitude) for public elementary and secondary schools included in the NCES Common Core of Data (CCD). The CCD program annually collects administrative and fiscal data about all public schools, school districts, and state education agencies in the United States. The data are supplied by state education agency officials and include basic directory and contact information for schools and school districts, as well as characteristics about student demographics, number of teachers, school grade span, and various other administrative conditions. CCD school and agency point locations are derived from reported information about the physical location of schools and agency administrative offices. The point locations and administrative attributes in this data layer represent the most current CCD collection. For more information about NCES school point data, see: https://nces.ed.gov/programs/edge/Geographic/SchoolLocations. For more information about these CCD attributes, as well as additional attributes not included, see: https://nces.ed.gov/ccd/files.asp.Notes:-1 or MIndicates that the data are missing.-2 or NIndicates that the data are not applicable.-9Indicates that the data do not meet NCES data quality standards.Collections are available for the following years:2022-232021-222020-212019-202018-192017-18All information contained in this file is in the public domain. Data users are advised to review NCES program documentation and feature class metadata to understand the limitations and appropriate use of these data. Collections are available for the following years:

  14. w

    U.S. Geological Survey Gap Analysis Program- Land Cover Data v2.2

    • data.wu.ac.at
    • datadiscoverystudio.org
    • +3more
    esri rest
    Updated Jun 8, 2018
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Department of the Interior (2018). U.S. Geological Survey Gap Analysis Program- Land Cover Data v2.2 [Dataset]. https://data.wu.ac.at/schema/data_gov/MmMzYjljMzQtZmJjMy00NjUwLWE3YmMtNzRlOWRmMTFkZTVj
    Explore at:
    esri restAvailable download formats
    Dataset updated
    Jun 8, 2018
    Dataset provided by
    Department of the Interior
    Area covered
    d8998031d4cf34652dda2763c83c7b599a8a3521
    Description

    This dataset combines the work of several different projects to create a seamless data set for the contiguous United States. Data from four regional Gap Analysis Projects and the LANDFIRE project were combined to make this dataset. In the northwestern United States (Idaho, Oregon, Montana, Washington and Wyoming) data in this map came from the Northwest Gap Analysis Project. In the southwestern United States (Colorado, Arizona, Nevada, New Mexico, and Utah) data used in this map came from the Southwest Gap Analysis Project. The data for Alabama, Florida, Georgia, Kentucky, North Carolina, South Carolina, Mississippi, Tennessee, and Virginia came from the Southeast Gap Analysis Project and the California data was generated by the updated California Gap land cover project. The Hawaii Gap Analysis project provided the data for Hawaii. In areas of the county (central U.S., Northeast, Alaska) that have not yet been covered by a regional Gap Analysis Project, data from the Landfire project was used. Similarities in the methods used by these projects made possible the combining of the data they derived into one seamless coverage. They all used multi-season satellite imagery (Landsat ETM+) from 1999-2001 in conjunction with digital elevation model (DEM) derived datasets (e.g. elevation, landform) to model natural and semi-natural vegetation. Vegetation classes were drawn from NatureServe's Ecological System Classification (Comer et al. 2003) or classes developed by the Hawaii Gap project. Additionally, all of the projects included land use classes that were employed to describe areas where natural vegetation has been altered. In many areas of the country these classes were derived from the National Land Cover Dataset (NLCD). For the majority of classes and, in most areas of the country, a decision tree classifier was used to discriminate ecological system types. In some areas of the country, more manual techniques were used to discriminate small patch systems and systems not distinguishable through topography. The data contains multiple levels of thematic detail. At the most detailed level natural vegetation is represented by NatureServe's Ecological System classification (or in Hawaii the Hawaii GAP classification). These most detailed classifications have been crosswalked to the five highest levels of the National Vegetation Classification (NVC), Class, Subclass, Formation, Division and Macrogroup. This crosswalk allows users to display and analyze the data at different levels of thematic resolution. Developed areas, or areas dominated by introduced species, timber harvest, or water are represented by other classes, collectively refered to as land use classes; these land use classes occur at each of the thematic levels. Raster data in both ArcGIS Grid and ERDAS Imagine format is available for download at http://gis1.usgs.gov/csas/gap/viewer/land_cover/Map.aspx Six layer files are included in the download packages to assist the user in displaying the data at each of the Thematic levels in ArcGIS. In adition to the raster datasets the data is available in Web Mapping Services (WMS) format for each of the six NVC classification levels (Class, Subclass, Formation, Division, Macrogroup, Ecological System) at the following links. http://gis1.usgs.gov/arcgis/rest/services/gap/GAP_Land_Cover_NVC_Class_Landuse/MapServer http://gis1.usgs.gov/arcgis/rest/services/gap/GAP_Land_Cover_NVC_Subclass_Landuse/MapServer http://gis1.usgs.gov/arcgis/rest/services/gap/GAP_Land_Cover_NVC_Formation_Landuse/MapServer http://gis1.usgs.gov/arcgis/rest/services/gap/GAP_Land_Cover_NVC_Division_Landuse/MapServer http://gis1.usgs.gov/arcgis/rest/services/gap/GAP_Land_Cover_NVC_Macrogroup_Landuse/MapServer http://gis1.usgs.gov/arcgis/rest/services/gap/GAP_Land_Cover_Ecological_Systems_Landuse/MapServer

  15. Sarnet Search And Rescue Dataset

    • universe.roboflow.com
    zip
    Updated Jun 16, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Roboflow Public (2022). Sarnet Search And Rescue Dataset [Dataset]. https://universe.roboflow.com/roboflow-public/sarnet-search-and-rescue
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jun 16, 2022
    Dataset provided by
    Roboflow
    Authors
    Roboflow Public
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Variables measured
    SaR Bounding Boxes
    Description

    Description from the SaRNet: A Dataset for Deep Learning Assisted Search and Rescue with Satellite Imagery GitHub Repository * The "Note" was added by the Roboflow team.

    Satellite Imagery for Search And Rescue Dataset - ArXiv

    This is a single class dataset consisting of tiles of satellite imagery labeled with potential 'targets'. Labelers were instructed to draw boxes around anything they suspect may a paraglider wing, missing in a remote area of Nevada. Volunteers were shown examples of similar objects already in the environment for comparison. The missing wing, as it was found after 3 weeks, is shown below.

    https://michaeltpublic.s3.amazonaws.com/images/anomaly_small.jpg" alt="anomaly">

    The dataset contains the following:

    SetImagesAnnotations
    Train18083048
    Validate490747
    Test254411
    Total25524206

    The data is in the COCO format, and is directly compatible with faster r-cnn as implemented in Facebook's Detectron2.

    Getting hold of the Data

    Download the data here: sarnet.zip

    Or follow these steps

    # download the dataset
    wget https://michaeltpublic.s3.amazonaws.com/sarnet.zip
    
    # extract the files
    unzip sarnet.zip
    

    ***Note* with Roboflow, you can download the data here** (original, raw images, with annotations): https://universe.roboflow.com/roboflow-public/sarnet-search-and-rescue/ (download v1, original_raw-images) * Download the dataset in COCO JSON format, or another format of choice, and import them to Roboflow after unzipping the folder to get started on your project.

    Getting started

    Get started with a Faster R-CNN model pretrained on SaRNet: SaRNet_Demo.ipynb

    Source Code for Paper

    Source code for the paper is located here: SaRNet_train_test.ipynb

    Cite this dataset

    @misc{thoreau2021sarnet,
       title={SaRNet: A Dataset for Deep Learning Assisted Search and Rescue with Satellite Imagery}, 
       author={Michael Thoreau and Frazer Wilson},
       year={2021},
       eprint={2107.12469},
       archivePrefix={arXiv},
       primaryClass={eess.IV}
    }
    

    Acknowledgment

    The source data was generously provided by Planet Labs, Airbus Defence and Space, and Maxar Technologies.

  16. d

    RxClass

    • catalog.data.gov
    • healthdata.gov
    • +5more
    Updated Feb 3, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    National Library of Medicine (2025). RxClass [Dataset]. https://catalog.data.gov/dataset/rxclass-ca087
    Explore at:
    Dataset updated
    Feb 3, 2025
    Dataset provided by
    National Library of Medicine
    Description

    The RxClass Browser is a web application for exploring and navigating through the class hierarchies to find the RxNorm drug members associated with each class. RxClass links drug classes of several drug sources including ATC, MeSH, NDF-RT and FDA/SPL to their RxNorm drug members (ingredients, precise ingredients and multiple ingredients). RxClass allows users to search by class name or identifier to find the RxNorm drug members or, conversely, search by RxNorm drug name or identifier to find the classes that the RxNorm drug is a member of.

  17. d

    Louisville Metro KY - Feature Class Containing Trip Level Data for LouVelo...

    • catalog.data.gov
    • data.louisvilleky.gov
    • +5more
    Updated Apr 13, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Louisville/Jefferson County Information Consortium (2023). Louisville Metro KY - Feature Class Containing Trip Level Data for LouVelo Bikeshare Program [Dataset]. https://catalog.data.gov/dataset/louisville-metro-ky-feature-class-containing-trip-level-data-for-louvelo-bikeshare-program
    Explore at:
    Dataset updated
    Apr 13, 2023
    Dataset provided by
    Louisville/Jefferson County Information Consortium
    Area covered
    Louisville, Kentucky
    Description

    LouVelo is a docked bikeshare program owned by Louisville Metro Government and operated by Cyclehop since May of 2017. The System includes Approximately 250 bikes, 25 Docked Stations in Louisville, and an additional 3 stations owned and operated by the City of Jeffersonville in Partnership with Cyclehop. These data will be updated on a monthly basis to show monthly trends in ridership along with general patterns of use with pick up and drop off location data. These data are updated and maintained for use in the Louisville Metro Open Data Portal LouVelo Dashboard to show ridership for the entirety of the program. Some stations have been relocated since the programs founding. For up to date information on dock locations please view the system map on the LouVelo website. This dashboard is maintained by Louisville Metro Public Works.For any questions please contact:James GrahamMobility CoordinatorLouisville Metro Public WorksDivision of Transportation444 S. 5th, St, Suite 400Louisville, KY 40202(502) 574-6473james.graham@louisvilleky.govFor more information about the LouVelo bikeshare program please visit their website.

  18. d

    Skills for All Course Search

    • data.gov.au
    • researchdata.edu.au
    • +1more
    xml
    Updated Aug 6, 2016
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Department for Innovation and Skills (2016). Skills for All Course Search [Dataset]. https://data.gov.au/dataset/skills-for-all-course-search
    Explore at:
    xmlAvailable download formats
    Dataset updated
    Aug 6, 2016
    Dataset provided by
    Department for Innovation and Skills
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This data can be used to find Skills for All subsidised, and other, courses in South Australia, as well as training providers who deliver each course and their contact information. This data can be used to find Skills for All subsidised, and other, courses in South Australia, as well as training providers who deliver each course and their contact information.

  19. p

    Building Point Classification - New Zealand

    • pacificgeoportal.com
    • hub.arcgis.com
    • +2more
    Updated Sep 17, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Eagle Technology Group Ltd (2023). Building Point Classification - New Zealand [Dataset]. https://www.pacificgeoportal.com/content/ebc54f498df94224990cf5f6598a5665
    Explore at:
    Dataset updated
    Sep 17, 2023
    Dataset authored and provided by
    Eagle Technology Group Ltd
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    New Zealand
    Description

    This New Zealand Point Cloud Classification Deep Learning Package will classify point clouds into building and background classes. This model is optimized to work with New Zealand aerial LiDAR data.The classification of point cloud datasets to identify Building is useful in applications such as high-quality 3D basemap creation, urban planning, and planning climate change response.Building could have a complex irregular geometrical structure that is hard to capture using traditional means. Deep learning models are highly capable of learning these complex structures and giving superior results.This model is designed to extract Building in both urban and rural area in New Zealand.The Training/Testing/Validation dataset are taken within New Zealand resulting of a high reliability to recognize the pattern of NZ common building architecture.Licensing requirementsArcGIS Desktop - ArcGIS 3D Analyst extension for ArcGIS ProUsing the modelThe model can be used in ArcGIS Pro's Classify Point Cloud Using Trained Model tool. Before using this model, ensure that the supported deep learning frameworks libraries are installed. For more details, check Deep Learning Libraries Installer for ArcGIS.Note: Deep learning is computationally intensive, and a powerful GPU is recommended to process large datasets.The model is trained with classified LiDAR that follows the The model was trained using a training dataset with the full set of points. Therefore, it is important to make the full set of points available to the neural network while predicting - allowing it to better discriminate points of 'class of interest' versus background points. It is recommended to use 'selective/target classification' and 'class preservation' functionalities during prediction to have better control over the classification and scenarios with false positives.The model was trained on airborne lidar datasets and is expected to perform best with similar datasets. Classification of terrestrial point cloud datasets may work but has not been validated. For such cases, this pre-trained model may be fine-tuned to save on cost, time, and compute resources while improving accuracy. Another example where fine-tuning this model can be useful is when the object of interest is tram wires, railway wires, etc. which are geometrically similar to electricity wires. When fine-tuning this model, the target training data characteristics such as class structure, maximum number of points per block and extra attributes should match those of the data originally used for training this model (see Training data section below).OutputThe model will classify the point cloud into the following classes with their meaning as defined by the American Society for Photogrammetry and Remote Sensing (ASPRS) described below: 0 Background 6 BuildingApplicable geographiesThe model is expected to work well in the New Zealand. It's seen to produce favorable results as shown in many regions. However, results can vary for datasets that are statistically dissimilar to training data.Training dataset - Auckland, Christchurch, Kapiti, Wellington Testing dataset - Auckland, WellingtonValidation/Evaluation dataset - Hutt City Dataset City Training Auckland, Christchurch, Kapiti, Wellington Testing Auckland, Wellington Validating HuttModel architectureThis model uses the SemanticQueryNetwork model architecture implemented in ArcGIS Pro.Accuracy metricsThe table below summarizes the accuracy of the predictions on the validation dataset. - Precision Recall F1-score Never Classified 0.984921 0.975853 0.979762 Building 0.951285 0.967563 0.9584Training dataThis model is trained on classified dataset originally provided by Open TopoGraphy with < 1% of manual labelling and correction.Train-Test split percentage {Train: 75~%, Test: 25~%} Chosen this ratio based on the analysis from previous epoch statistics which appears to have a descent improvementThe training data used has the following characteristics: X, Y, and Z linear unitMeter Z range-137.74 m to 410.50 m Number of Returns1 to 5 Intensity16 to 65520 Point spacing0.2 ± 0.1 Scan angle-17 to +17 Maximum points per block8192 Block Size50 Meters Class structure[0, 6]Sample resultsModel to classify a dataset with 23pts/m density Wellington city dataset. The model's performance are directly proportional to the dataset point density and noise exlcuded point clouds.To learn how to use this model, see this story

  20. MSL Curiosity Rover Images with Science and Engineering Classes

    • zenodo.org
    • data.niaid.nih.gov
    zip
    Updated Sep 17, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Steven Lu; Steven Lu; Kiri L. Wagstaff; Kiri L. Wagstaff (2020). MSL Curiosity Rover Images with Science and Engineering Classes [Dataset]. http://doi.org/10.5281/zenodo.4033453
    Explore at:
    zipAvailable download formats
    Dataset updated
    Sep 17, 2020
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Steven Lu; Steven Lu; Kiri L. Wagstaff; Kiri L. Wagstaff
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Please note that the file msl-labeled-data-set-v2.1.zip below contains the latest images and labels associated with this data set.

    Data Set Description

    The data set consists of 6,820 images that were collected by the Mars Science Laboratory (MSL) Curiosity Rover by three instruments: (1) the Mast Camera (Mastcam) Left Eye; (2) the Mast Camera Right Eye; (3) the Mars Hand Lens Imager (MAHLI). With the help from Dr. Raymond Francis, a member of the MSL operations team, we identified 19 classes with science and engineering interests (see the "Classes" section for more information), and each image is assigned with 1 class label. We split the data set into training, validation, and test sets in order to train and evaluate machine learning algorithms. The training set contains 5,920 images (including augmented images; see the "Image Augmentation" section for more information); the validation set contains 300 images; the test set contains 600 images. The training set images were randomly sampled from sol (Martian day) range 1 - 948; validation set images were randomly sampled from sol range 949 - 1920; test set images were randomly sampled from sol range 1921 - 2224. All images are resized to 227 x 227 pixels without preserving the original height/width aspect ratio.

    Directory Contents

    • images - contains all 6,820 images
    • class_map.csv - string-integer class mappings
    • train-set-v2.1.txt - label file for the training set
    • val-set-v2.1.txt - label file for the validation set
    • test-set-v2.1.txt - label file for the test set

    The label files are formatted as below:

    "Image-file-name class_in_integer_representation"

    Labeling Process

    Each image was labeled with help from three different volunteers (see Contributor list). The final labels are determined using the following processes:

    • If all three labels agree with each other, then use the label as the final label.
    • If the three labels do not agree with each other, then we manually review the labels and decide the final label.
    • We also performed error analysis to correct labels as a post-processing step in order to remove noisy/incorrect labels in the data set.

    Classes

    There are 19 classes identified in this data set. In order to simplify our training and evaluation algorithms, we mapped the class names from string to integer representations. The names of classes, string-integer mappings, distributions are shown below:

    Class name, counts (training set), counts (validation set), counts (test set), integer representation

    Arm cover, 10, 1, 4, 0

    Other rover part, 190, 11, 10, 1

    Artifact, 680, 62, 132, 2

    Nearby surface, 1554, 74, 187, 3

    Close-up rock, 1422, 50, 84, 4

    DRT, 8, 4, 6, 5

    DRT spot, 214, 1, 7, 6

    Distant landscape, 342, 14, 34, 7

    Drill hole, 252, 5, 12, 8

    Night sky, 40, 3, 4, 9

    Float, 190, 5, 1, 10

    Layers, 182, 21, 17, 11

    Light-toned veins, 42, 4, 27, 12

    Mastcam cal target, 122, 12, 29, 13

    Sand, 228, 19, 16, 14

    Sun, 182, 5, 19, 15

    Wheel, 212, 5, 5, 16

    Wheel joint, 62, 1, 5, 17

    Wheel tracks, 26, 3, 1, 18

    Image Augmentation

    Only the training set contains augmented images. 3,920 of the 5,920 images in the training set are augmented versions of the remaining 2000 original training images. Images taken by different instruments were augmented differently. As shown below, we employed 5 different methods to augment images. Images taken by the Mastcam left and right eye cameras were augmented using a horizontal flipping method, and images taken by the MAHLI camera were augmented using all 5 methods. Note that one can filter based on the file names listed in the train-set.txt file to obtain a set of non-augmented images.

    • 90 degrees clockwise rotation (file name ends with -r90.jpg)
    • 180 degrees clockwise rotation (file name ends with -r180.jpg)
    • 270 degrees clockwise rotation (file name ends with -r270.jpg)
    • Horizontal flip (file name ends with -fh.jpg)
    • Vertical flip (file name ends with -fv.jpg)

    Acknowledgment

    The authors would like to thank the volunteers (as in the Contributor list) who provided annotations for this data set. We would also like to thank the PDS Imaging Note for the continuous support of this work.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Under blind review in refereed journal (2021). University SET data, with faculty and courses characteristics [Dataset]. http://doi.org/10.3886/E149801V1

University SET data, with faculty and courses characteristics

Explore at:
Dataset updated
Sep 12, 2021
Authors
Under blind review in refereed journal
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

This paper explores a unique dataset of all the SET ratings provided by students of one university in Poland at the end of the winter semester of the 2020/2021 academic year. The SET questionnaire used by this university is presented in Appendix 1. The dataset is unique for several reasons. It covers all SET surveys filled by students in all fields and levels of study offered by the university. In the period analysed, the university was entirely in the online regime amid the Covid-19 pandemic. While the expected learning outcomes formally have not been changed, the online mode of study could have affected the grading policy and could have implications for some of the studied SET biases. This Covid-19 effect is captured by econometric models and discussed in the paper. The average SET scores were matched with the characteristics of the teacher for degree, seniority, gender, and SET scores in the past six semesters; the course characteristics for time of day, day of the week, course type, course breadth, class duration, and class size; the attributes of the SET survey responses as the percentage of students providing SET feedback; and the grades of the course for the mean, standard deviation, and percentage failed. Data on course grades are also available for the previous six semesters. This rich dataset allows many of the biases reported in the literature to be tested for and new hypotheses to be formulated, as presented in the introduction section. The unit of observation or the single row in the data set is identified by three parameters: teacher unique id (j), course unique id (k) and the question number in the SET questionnaire (n ϵ {1, 2, 3, 4, 5, 6, 7, 8, 9} ). It means that for each pair (j,k), we have nine rows, one for each SET survey question, or sometimes less when students did not answer one of the SET questions at all. For example, the dependent variable SET_score_avg(j,k,n) for the triplet (j=Calculus, k=John Smith, n=2) is calculated as the average of all Likert-scale answers to question nr 2 in the SET survey distributed to all students that took the Calculus course taught by John Smith. The data set has 8,015 such observations or rows. The full list of variables or columns in the data set included in the analysis is presented in the attached filesection. Their description refers to the triplet (teacher id = j, course id = k, question number = n). When the last value of the triplet (n) is dropped, it means that the variable takes the same values for all n ϵ {1, 2, 3, 4, 5, 6, 7, 8, 9}.Two attachments:- word file with variables description- Rdata file with the data set (for R language).Appendix 1. Appendix 1. The SET questionnaire was used for this paper. Evaluation survey of the teaching staff of [university name] Please, complete the following evaluation form, which aims to assess the lecturer’s performance. Only one answer should be indicated for each question. The answers are coded in the following way: 5- I strongly agree; 4- I agree; 3- Neutral; 2- I don’t agree; 1- I strongly don’t agree. Questions 1 2 3 4 5 I learnt a lot during the course. ○ ○ ○ ○ ○ I think that the knowledge acquired during the course is very useful. ○ ○ ○ ○ ○ The professor used activities to make the class more engaging. ○ ○ ○ ○ ○ If it was possible, I would enroll for the course conducted by this lecturer again. ○ ○ ○ ○ ○ The classes started on time. ○ ○ ○ ○ ○ The lecturer always used time efficiently. ○ ○ ○ ○ ○ The lecturer delivered the class content in an understandable and efficient way. ○ ○ ○ ○ ○ The lecturer was available when we had doubts. ○ ○ ○ ○ ○ The lecturer treated all students equally regardless of their race, background and ethnicity. ○ ○

Search
Clear search
Close search
Google apps
Main menu