18 datasets found
  1. iNeuron Projectathon Oct-Nov'21

    • kaggle.com
    zip
    Updated Oct 22, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Aman Anand (2021). iNeuron Projectathon Oct-Nov'21 [Dataset]. https://www.kaggle.com/yekahaaagayeham/ineuron-projectathon-octnov21
    Explore at:
    zip(3335989 bytes)Available download formats
    Dataset updated
    Oct 22, 2021
    Authors
    Aman Anand
    Description

    iNeuron-Projectathon-Oct-Nov-21

    Problem Statement:

    Design a web portal to automate the various operation performed in machine learning projects to solve specific problems related to supervised or unsupervised use case.. Web portal must have the capabilities to perform below-mentioned task: 1. Extract Transform Load: a. Extract: Portal should provide the capabilities to configure any data source example. Cloud Storage (AWS, Azure, GCP), Database (RDBMS, NoSQL,), and real-time streaming data to extract data into portportal. (Allow feasibility to write cucustom script if required to connect to any data source to extract data) b. Transform: Portal should provide various inbuilt functions/components to apply rich set of transformation to transform extracted data into desired format. c. Load: Portal should be able to save data into any of the cloud storage after extracted data transformed into desired format. d. Allow user to write custom script in python if some of the functionality is not present in the portal. 2. Exploratory Data Analysis: Portal should allow users to perform exploratory data analysis. 3. Data Preparation: data wrangling, feature extraction and feature selection should be automation with minimal user intervention. 4. Application must suggest appropriate machine learning algorithm which is best suitable for the use case and perform best model search operation to automate model development operation. 5. Application should provide feature to deploy model in any of the cloud and application should create prediction API to predict new instances. 6. Application should log each and every detail so that each activity can be audited in future to investigate any of the event. 7. Detail report should be generated for ETL, EDA, Data preparation and Model development and deployment. 8. Create a dashboard to monitor model performance and create various alert mechanism to notify appropriate user to take necessary precaution. 9. Create functionality to implement retraining for existing model if it is necessary. 10.Portal must be designed in such a way that it can be used by multiple organization/user where each organization/user is isolated from other. 11.Portal should provide functionality to manage user. Similar to RBAC concept used in Cloud. (It is not necessary to build so many role but design it in such a way that it can add role in future so that newly created role can also be applied to users.) Organization/User can have multiple user and each user will have specific role. 12.Portal should have a scheduler to schedule training or prediction task and appropriate alert regarding to scheduled job should be notified to subscriber/configured email id. 13.Implement watcher functionality to perform prediction as soon as file arrived at input location.

    Approach:

    1. Follow standard guild line to write quality solution for web portal.
    2. Follow OOPS to design solution.
    3. Implement REST API wherever possible.
    4. Implement CI, CD pipeline with automated testing and dockerization. (Use container or Kubernetes to deploy your dockerized application)
    5. CI, CD pipeline should have different environment example ST, SST, Production. Note: Feel free to use any of the technology to design your solution.

    Results:

    You have to build a solution that should summarize the various news articles from different reading categories.

    Project Evaluation metrics:

    Code:  You are supposed to write a code in a modular fashion  Safe: It can be used without causing harm.  Testable: It can be tested at the code level.  Maintainable: It can be maintained, even as your codebase grows.  Portable: It works the same in every environment (operating system)  You have to maintain your code on GitHub.  You have to keep your GitHub repo public so that anyone can check your code.  Proper readme file you have to maintain for any project development.  You should include basic workflow and execution of the entire project in the readme

    file on GitHub  Follow the coding standards: https://www.python.org/dev/peps/pep-0008/

    Database: Based on development requirement feel free to choose any database (SQL,

    NoSQL) or use multiple database.

    Cloud:

     You can use any cloud platform for this entire solution hosting like AWS, Azure or GCP.

    API Details or User Interface:

    1. Web portal should be designed like any cloud platform.
    2. Model developed using web portal should have functionality to expose API to test prediction.

    Logging:

     Logging is a must for every action performed by your code use the python logging library for this.

    DevOps Pipeline:

    Use source version control tool to implement CI, CD pipeline, e.g.: Azure Devops, Github, Circle CI.

    Deployment:

     You can host your application in the cloud platform using automated CI, CD pipeline.

    Solutions Design:

     You have to submit complete solution design strate...

  2. Replication Package: Unboxing Default Argument Breaking Changes in 1 + 2...

    • zenodo.org
    application/gzip
    Updated Jul 15, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    João Eduardo Montandon; Luciana Lourdes Silva; Cristiano Politowski; Daniel Prates; Arthur Bonifácio; Ghizlane El Boussaidi; João Eduardo Montandon; Luciana Lourdes Silva; Cristiano Politowski; Daniel Prates; Arthur Bonifácio; Ghizlane El Boussaidi (2024). Replication Package: Unboxing Default Argument Breaking Changes in 1 + 2 Data Science Libraries in Python [Dataset]. http://doi.org/10.5281/zenodo.11584961
    Explore at:
    application/gzipAvailable download formats
    Dataset updated
    Jul 15, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    João Eduardo Montandon; Luciana Lourdes Silva; Cristiano Politowski; Daniel Prates; Arthur Bonifácio; Ghizlane El Boussaidi; João Eduardo Montandon; Luciana Lourdes Silva; Cristiano Politowski; Daniel Prates; Arthur Bonifácio; Ghizlane El Boussaidi
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Replication Package

    This repository contains data and source files needed to replicate our work described in the paper "Unboxing Default Argument Breaking Changes in Scikit Learn".

    Requirements

    We recommend the following requirements to replicate our study:

    1. Internet access
    2. At least 100GB of space
    3. Docker installed
    4. Git installed

    Package Structure

    We relied on Docker containers to provide a working environment that is easier to replicate. Specifically, we configure the following containers:

    • data-analysis, an R-based Container we used to run our data analysis.
    • data-collection, a Python Container we used to collect Scikit's default arguments and detect them in client applications.
    • database, a Postgres Container we used to store clients' data, obtainer from Grotov et al.
    • storage, a directory used to store the data processed in data-analysis and data-collection. This directory is shared in both containers.
    • docker-compose.yml, the Docker file that configures all containers used in the package.

    In the remainder of this document, we describe how to set up each container properly.

    Using VSCode to Setup the Package

    We selected VSCode as the IDE of choice because its extensions allow us to implement our scripts directly inside the containers. In this package, we provide configuration parameters for both data-analysis and data-collection containers. This way you can directly access and run each container inside it without any specific configuration.

    You first need to set up the containers

    $ cd /replication/package/folder
    $ docker-compose build
    $ docker-compose up
    # Wait docker creating and running all containers
    

    Then, you can open them in Visual Studio Code:

    1. Open VSCode in project root folder
    2. Access the command palette and select "Dev Container: Reopen in Container"
      1. Select either Data Collection or Data Analysis.
    3. Start working

    If you want/need a more customized organization, the remainder of this file describes it in detail.

    Longest Road: Manual Package Setup

    Database Setup

    The database container will automatically restore the dump in dump_matroskin.tar in its first launch. To set up and run the container, you should:

    Build an image:

    $ cd ./database
    $ docker build --tag 'dabc-database' .
    $ docker image ls
    REPOSITORY  TAG    IMAGE ID    CREATED     SIZE
    dabc-database latest  b6f8af99c90d  50 minutes ago  18.5GB
    

    Create and enter inside the container:

    $ docker run -it --name dabc-database-1 dabc-database
    $ docker exec -it dabc-database-1 /bin/bash
    root# psql -U postgres -h localhost -d jupyter-notebooks
    jupyter-notebooks=# \dt
           List of relations
     Schema |    Name    | Type | Owner
    --------+-------------------+-------+-------
     public | Cell       | table | root
     public | Code_cell     | table | root
     public | Md_cell      | table | root
     public | Notebook     | table | root
     public | Notebook_features | table | root
     public | Notebook_metadata | table | root
     public | repository    | table | root
    

    If you got the tables list as above, your database is properly setup.

    It is important to mention that this database is extended from the one provided by Grotov et al.. Basically, we added three columns in the table Notebook_features (API_functions_calls, defined_functions_calls, andother_functions_calls) containing the function calls performed by each client in the database.

    Data Collection Setup

    This container is responsible for collecting the data to answer our research questions. It has the following structure:

    • dabcs.py, extract DABCs from Scikit Learn source code, and export them to a CSV file.
    • dabcs-clients.py, extract function calls from clients and export them to a CSV file. We rely on a modified version of Matroskin to leverage the function calls. You can find the tool's source code in the `matroskin`` directory.
    • Makefile, commands to set up and run both dabcs.py and dabcs-clients.py
    • matroskin, the directory containing the modified version of matroskin tool. We extended the library to collect the function calls performed on the client notebooks of Grotov's dataset.
    • storage, a docker volume where the data-collection should save the exported data. This data will be used later in Data Analysis.
    • requirements.txt, Python dependencies adopted in this module.

    Note that the container will automatically configure this module for you, e.g., install dependencies, configure matroskin, download scikit learn source code, etc. For this, you must run the following commands:

    $ cd ./data-collection
    $ docker build --tag "data-collection" .
    $ docker run -it -d --name data-collection-1 -v $(pwd)/:/data-collection -v $(pwd)/../storage/:/data-collection/storage/ data-collection
    $ docker exec -it data-collection-1 /bin/bash
    $ ls
    Dockerfile Makefile config.yml dabcs-clients.py dabcs.py matroskin storage requirements.txt utils.py
    

    If you see project files, it means the container is configured accordingly.

    Data Analysis Setup

    We use this container to conduct the analysis over the data produced by the Data Collection container. It has the following structure:

    • dependencies.R, an R script containing the dependencies used in our data analysis.
    • data-analysis.Rmd, the R notebook we used to perform our data analysis
    • datasets, a docker volume pointing to the storage directory.

    Execute the following commands to run this container:

    $ cd ./data-analysis
    $ docker build --tag "data-analysis" .
    $ docker run -it -d --name data-analysis-1 -v $(pwd)/:/data-analysis -v $(pwd)/../storage/:/data-collection/datasets/ data-analysis
    $ docker exec -it data-analysis-1 /bin/bash
    $ ls
    data-analysis.Rmd datasets dependencies.R Dockerfile figures Makefile
    

    If you see project files, it means the container is configured accordingly.

    A note on storage shared folder

    As mentioned, the storage folder is mounted as a volume and shared between data-collection and data-analysis containers. We compressed the content of this folder due to space constraints. Therefore, before starting working on Data Collection or Data Analysis, make sure you extracted the compressed files. You can do this by running the Makefile inside storage folder.

    $ make unzip # extract files
    $ ls
    clients-dabcs.csv clients-validation.csv dabcs.csv Makefile scikit-learn-versions.csv versions.csv
    $ make zip # compress files
    $ ls
    csv-files.tar.gz Makefile
  3. Recombinant Read Extraction Pipeline Test Input File

    • figshare.com
    txt
    Updated Dec 5, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jillis Grubben (2024). Recombinant Read Extraction Pipeline Test Input File [Dataset]. http://doi.org/10.6084/m9.figshare.27967968.v1
    Explore at:
    txtAvailable download formats
    Dataset updated
    Dec 5, 2024
    Dataset provided by
    figshare
    Figsharehttp://figshare.com/
    Authors
    Jillis Grubben
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Recombinant Read Extraction Pipeline with Test Input DataDescription:This dataset showcases the Recombinant Read Extraction Pipeline, previously developed by us (https://doi.org/10.6084/m9.figshare.26582380), designed for the detection of recombination events in sequencing data. The pipeline enables the alignment of sequence reads to a reference genome, generation of SNP strings, identification of haplotypes, extraction of recombinant sequences, and comprehensive result compilation into an Excel summary for seamless analysis.Included in this dataset:config.json: Configuration file with default settings.pipeline_test_reads.fa: A test FASTA file containing simulated recombination and allele replacement events, specifically:Two recombination events each covered by 15 reads, transitioning between Solanum lycopersicum cv. Moneyberg and Moneymaker haplotypes.One recombination event covered by 20 reads, involving a switch at the extremity of the amplicon analysed from Moneymaker to Moneyberg haplotype.One allele replacement event covered by 20 reads, featuring recombination from Moneymaker to Moneyberg and back to Moneymaker.Wild-type Solanum lycopersicum cv. Moneyberg and Moneymaker sequences.final_output.xlsx: Example output summarizing read names, sequences, and read counts.Usage Instructions:Install Dependencies: Follow the installation guidelines to set up required software and Python libraries (please refer to https://doi.org/10.6084/m9.figshare.26582380).Configure Pipeline: Customize parameters in config.json as needed.Run Pipeline: Execute the pipeline using the provided script to process the test input file.Review Outputs: Examine final_output.xlsx to verify the detection and summarization of recombinant events.The dataset pipeline_test_reads.fa serves as a control dataset designed to verify the functionality of the Recombinant Read Extraction Pipeline previously described (https://doi.org/10.6084/m9.figshare.26582380). This dataset contains artificially generated "reads" and does not include any genuine DNA sequencing data.Keywords: Genomic Data Processing, Recombinant Detection, Haplotype Analysis, Bioinformatics Pipeline, SNP Analysis

  4. f

    DataSheet_1_AgTC and AgETL: open-source tools to enhance data collection and...

    • frontiersin.figshare.com
    pdf
    Updated Feb 21, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Luis Vargas-Rojas; To-Chia Ting; Katherine M. Rainey; Matthew Reynolds; Diane R. Wang (2024). DataSheet_1_AgTC and AgETL: open-source tools to enhance data collection and management for plant science research.pdf [Dataset]. http://doi.org/10.3389/fpls.2024.1265073.s001
    Explore at:
    pdfAvailable download formats
    Dataset updated
    Feb 21, 2024
    Dataset provided by
    Frontiers
    Authors
    Luis Vargas-Rojas; To-Chia Ting; Katherine M. Rainey; Matthew Reynolds; Diane R. Wang
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Advancements in phenotyping technology have enabled plant science researchers to gather large volumes of information from their experiments, especially those that evaluate multiple genotypes. To fully leverage these complex and often heterogeneous data sets (i.e. those that differ in format and structure), scientists must invest considerable time in data processing, and data management has emerged as a considerable barrier for downstream application. Here, we propose a pipeline to enhance data collection, processing, and management from plant science studies comprising of two newly developed open-source programs. The first, called AgTC, is a series of programming functions that generates comma-separated values file templates to collect data in a standard format using either a lab-based computer or a mobile device. The second series of functions, AgETL, executes steps for an Extract-Transform-Load (ETL) data integration process where data are extracted from heterogeneously formatted files, transformed to meet standard criteria, and loaded into a database. There, data are stored and can be accessed for data analysis-related processes, including dynamic data visualization through web-based tools. Both AgTC and AgETL are flexible for application across plant science experiments without programming knowledge on the part of the domain scientist, and their functions are executed on Jupyter Notebook, a browser-based interactive development environment. Additionally, all parameters are easily customized from central configuration files written in the human-readable YAML format. Using three experiments from research laboratories in university and non-government organization (NGO) settings as test cases, we demonstrate the utility of AgTC and AgETL to streamline critical steps from data collection to analysis in the plant sciences.

  5. Data bundle for egon-data: A transparent and reproducible data processing...

    • zenodo.org
    zip
    Updated Jun 10, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ilka Cußmann; Ilka Cußmann (2022). Data bundle for egon-data: A transparent and reproducible data processing pipeline for energy system modeling [Dataset]. http://doi.org/10.5281/zenodo.5743452
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jun 10, 2022
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Ilka Cußmann; Ilka Cußmann
    Description

    egon-data provides a transparent and reproducible open data based data processing pipeline for generating data models suitable for energy system modeling. The data is customized for the requirements of the research project eGon. The research project aims to develop tools for an open and cross-sectoral planning of transmission and distribution grids. For further information please visit the eGon project website or its Github repository.

    egon-data retrieves and processes data from several different external input sources. As not all data dependencies can be downloaded automatically from external sources we provide a data bundle to be downloaded by egon-data.

    The following data sets are part of the available data bundle:

    1. climate_zones_germany
      • Climate zones in Germany
      • source: Own representation based on DWD TRY climate zones
      • License: Attribution 4.0 International (CC BY 4.0)
    2. geothermal_potential
    3. household_electricity_demand_profiles
      • Annual profiles in hourly resolution of electricity demand of private households for different household types (singles, couples, other) with varying number of elderly and children.
        The profiles were created using a bottom-up load profile generator by Fraunhofer IEE developed in the Bachelor's thesis "Auswirkungen verschiedener Haushaltslastprofile auf PV-Batterie-Systeme" by Jonas Haack, Fachhochschule Flensburg, December 2012.
        The columns are named as follows: "
      • License: Attribution 4.0 International (CC BY 4.0)
    4. household_heat_demand_profiles
      • Sample heat time series including hot water and space heating for single- and multi-familiy houses. The profiles were created using the loadprofile generator by Fraunhofer IEE developed in the Master's thesis "Synthesis of a heat and electrical load profile for single and multi-family houses used for subsequent performance tests of a multi-component energy system", Simon Ruben Drauz, RWTH Aachen University, March 2016
      • License: Attribution 4.0 International (CC BY 4.0)
    5. hydrogen_storage_potential_saltstructures
      • The data are taken from figure 7.1 in Donadei, S., et al., (2020), p. 7-5..
      • Source: Flach lagernde Salze, (c) BGR Hannover, 2021.
        Datenquelle: InSpEE-Salzstrukturen, (c) BGR, Hannover, 2015. &
        Donadei, S., Horváth, B., Horváth, P.-L., Keppliner, J., Schneider, G.-S., &
        Zander-Schiebenhöfer, D. (2020). Teilprojekt Bewertungskriterien und
        Potenzialabschätzung. BGR. Informationssystem Salz: Planungsgrundlagen,
        Auswahlkriterien und Potenzialabschätzung für die Errichtung von Salzkavernen
        zur Speicherung von Erneuerbaren Energien (Wasserstoff und Druckluft) –
        Doppelsalinare und flach lagernde Salzschichten: InSpEE-DS. Sachbericht.
        Hannover: BGR.
      • License: The original data are licensed under the GeoNutzV, see https://sg.geodatenzentrum.de/web_public/gdz/lizenz/geonutzv.pdf
    6. industrial_sites
      • Information about industrial sites with DSM-potential in Germany from a Master's thesis by Danielle Schmidt. The data set includes own information on the coordinates of every industrial site.
      • source: Schmidt, Danielle. (2019). Supplementary material to the masters thesis: NUTS-3 Regionalization of Industrial Load Shifting Potential in Germany using a Time-Resolved Model [Data set]. Zenodo. https://doi.org/10.5281/zenodo.3613767
      • License: Attribution 4.0 International (CC BY 4.0)
    7. nep2035_version2021
      • Data extracted from the German grid development plan - power
      • source: Netzentwicklungsplan Strom 2035 (2021), erster Entwurf | Übertragungsnetzbetreiber (M) CC-BY-4.0
      • License: Attribution 4.0 International (CC BY 4.0)
    8. pipeline_classification_gas
    9. pypsa_eur_sec
      • Preliminary results from scenario generator pypsa-eur-sec
      • source: own calculation using pypsa-eur-sec fork (https://github.com/openego/pypsa-eur-sec)
      • License: Attribution 4.0 International (CC BY 4.0)
    10. regions_dynamic_line_rating
    11. WZ_definition
      • Definitions of industrial and commercial branches
      • source: Klassifikation der Wirtschaftszweige (WZ 2008)
      • Extract from Terms of Use: © Statistisches Bundesamt, Wiesbaden 2008 Vervielfältigung und Verbreitung, auch auszugsweise, mit Quellenangabe gestattet.
    12. zensus_households
      • Dataset describing the amount of people living by a certain types of family-types, age-classes,sex and size of household in Germany in state-resolution.
      • source: Data retrieved from Zensus Datenbank by performing these steps:
        • Search for: "1000A-2029"
        • or choose topic: "Bevölkerung kompakt"
        • Choose table code: "1000A-2029" with title "Personen: Alter (11 Altersklassen)/Geschlecht/Größe desprivaten Haushalts - Typ des privaten Haushalts (nach Familien/Lebensform)"
        • Change setting "GEOLK1" to "Bundesländer (16)" higher resolution "Landkreise und kreisfreie Städte (412)" only accessible after registration.
      • Extract from Terms of Use: © Statistische Ämter des Bundes und der Länder 2021, Vervielfältigung und Verbreitung, auch auszugsweise, mit Quellennachweis gestattet.

  6. R

    Imagedetection Dataset

    • universe.roboflow.com
    zip
    Updated Apr 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    custom yolov5 (2023). Imagedetection Dataset [Dataset]. https://universe.roboflow.com/custom-yolov5-fwa2b/imagedetection-kf1ww/model/2
    Explore at:
    zipAvailable download formats
    Dataset updated
    Apr 1, 2023
    Dataset authored and provided by
    custom yolov5
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Variables measured
    Letters Bounding Boxes
    Description

    Here are a few use cases for this project:

    1. Educational Application: This model could be used in educational applications or games designed for children learning to recognize letters or digits. It could help in providing immediate feedback to learners by identifying whether the written letter or digit is correct.

    2. Document Analysis: The model could be applied for document analysis and capturing data from written or printed material, including books, bills, notes, letters, and more. The numbers and special characters capability could be used for capturing amounts, expressions, or nuances in the text.

    3. Accessibility Software: This model could be integrated into accessibility software applications aimed at assisting visually impaired individuals. It can analyze images or real-time video to read out the identified letters, figures, and special characters.

    4. License Plate Recognition: Given its ability to recognize a wide array of symbols, the model could be useful for extracting information from license plates, aiding in security and law enforcement settings.

    5. Handwritten Forms Processing: This computer vision model could be utilized to extract and categorize data from handwritten forms or applications, aiding in the automation of data entry tasks in various organizations.

  7. Data from: Information-Driven Active Audio-Visual Source Localization

    • figshare.com
    zip
    Updated Jan 19, 2016
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Niclas Schult; Thomas Reineking; Thorsten Kluss; Christoph Zetzsche (2016). Information-Driven Active Audio-Visual Source Localization [Dataset]. http://doi.org/10.6084/m9.figshare.1446147.v1
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jan 19, 2016
    Dataset provided by
    Figsharehttp://figshare.com/
    figshare
    Authors
    Niclas Schult; Thomas Reineking; Thorsten Kluss; Christoph Zetzsche
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The raw data recorded during the experiments is available in the folder "data", which includes the subfolders 'robotData-IG','robotData-Random', 'simulationData-IG' and 'simulationData-Random' for the data of the simulation and the robot experiments, respectively. In each step, all data needed for performance evaluations (that is, the action u selected by the system and the corresponding information gain, the sensory measurements z, the state of the system, the state estimate and the particle set itself) were serialized by Python's cPickle-module. Furthermore, the subfolder 'data\extracted_data_csv' contains all the data we used in our Figures in a condensed form, saved to csv-files: all relevant data (and only relevant data) were extracted from the raw data, so that it is not necessary anymore to load and process the binary data recorded during the experiments and you have all the information you need in a human-readable text-based file. The Python module "InformationDriven_AV_SourceTracking_EVALUATION.py" shows how to access the data and includes all the code necessary to read and evaluate the data recorded during the experiments. How to build and run:In addition to a standard Python 2.7 distribution, some Python libraries are necessary to run the code:- numpy (http://www.numpy.org/)- matplotlib(http://matplotlib.org/)- config (https://pypi.python.org/pypi/config/0.3.7)optional (see below):- evaluation/csvData/error- open cv(2) for python [OPTIONAL: If you want to analyze the raw data (not the data saved in the CSV-files) you have to build a few custom modules manually:As some of the modules used in our implementation were written in Cython (http://www.cython.org/) in order to speed up computations, it is necessary to compile these for your system by>> cd src/particleFiltering>> python setup.py build_ext --inplaceThe next step is to manually uncomment the line "# from particleFiltering.belief_Cy import Belief" at the beginning of the file "InformationDriven_AV_SourceTracking_EVALUATION.py' in order to use the functions working on raw data.\OPTIONAL] After installing the necessary libraries (and optionally compiling the Cython-modules), you can start the evaluation script by:>> cd src>> python InformationDriven_AV_SourceTracking_EVALUATION.py ,in order to generate all figures shown in the "results"-section of the manuscript and save them to the "src"-directory. By default, they are saved to a pdf-file, but you can change the file-format by changing the variable 'plotFileType' at the beginning of the evaluation script to '.jpg', '.png', '.tiff' or any other file formats supported by matplotlib. If you want to analyze the data yourself, all steps needed to access and evaluate the recorded data are exemplified in the module "InformationDriven_AV_SourceTracking_EVALUATION.py" and should be fairly self-explanatory. While the figures in our manuscript were generated using the extracted data in the CSV-files (see function 'generatePlots' in "InformationDriven_AV_SourceTracking_EVALUATION.py"), we also included functions which work with the raw data (functions 'evaluateSimulationExperiments_IG_error_raw', 'evaluateSimulationExperiments_random_error_raw','evaluateSimulationExperiments_IG_entropy_raw', 'evaluateSimulationExperiments_random_entropy_raw','evaluateRobotExperiments_IG_error_raw', 'evaluateRobotExperiments_IG_entropy_raw','evaluateRobotExperiments_random_error_raw' and 'evaluateRobotExperiments_random_entropy_raw').These show how to access the raw data and how to generate the same curves as the ones shown in the results section, so that it is transparent how the data stored in the CSV-files can be extracted from the raw data recorded in the experiments.

  8. Characteristics of included studies.

    • plos.figshare.com
    xls
    Updated Jun 3, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Krissy Jordan; Christine Kurtz Landy; Celina Da Silva; Mahdieh Dastjerdi; Bella Grunfeld (2025). Characteristics of included studies. [Dataset]. http://doi.org/10.1371/journal.pone.0325327.t001
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 3, 2025
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Krissy Jordan; Christine Kurtz Landy; Celina Da Silva; Mahdieh Dastjerdi; Bella Grunfeld
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    BackgroundVirtual serious games (VSGs) offer an engaging approach to women’s health education. This review examines the state of research on VSGs, focusing on intended users, design characteristics, and assessed outcomes.MethodsFollowing JBI methodology guidance for the scoping review, searches were conducted in the MEDLINE, CINAHL, EMBASE, Web of Science, and PsycINFO databases from inception to April 22, 2024. Eligible sources included participants: women or females aged 18 years and older, with no restrictions based on health condition or treatment status; concept: VSGs; context: settings where health education is provided. Sources were restricted to English language and peer-reviewed articles. Two reviewers independently screened titles, abstracts, and full texts using eligibility criteria. Data extraction was performed by one reviewer and verified by another using a custom tool. Quantitative (e.g., frequency counting) and qualitative (content analysis) methods were employed. The findings were organized into figures and tables accompanied by a narrative description.Results12 studies from 2008 to 2023, mostly in the U.S. (66.7%), explored various age groups and women’s health, focusing on breast and gynecological cancer (67%). Half (50%) of the VSGs were theory-informed; 41.7% involved users, and 58.3% had partnerships. Game types included tablet (41.7%), mobile (25%), and web (33.3%). Gameplay dosage varied from single session (50%) to self-directed (25%) and specific frequency (25%). Gameplay duration was self-directed (50%) or fixed lengths (50%). Outcomes included knowledge (50%), skills (16.7%), satisfaction (58.3%), health-related metrics (41.7%), and gameplay analysis (16.7%).ConclusionsStudies show increased interest in VSGs for women’s health education, especially regarding breast and gynecological cancer. The focus on theoretical frameworks, user involvement, and collaborations highlights a multidisciplinary approach. Varied game modalities, dosage, and assessed outcomes underscore VSG adaptability. Future research should explore long-term effects of VSGs to advance women’s health education.

  9. D

    Document Workflow Automation AI Market Research Report 2033

    • dataintelo.com
    csv, pdf, pptx
    Updated Sep 30, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dataintelo (2025). Document Workflow Automation AI Market Research Report 2033 [Dataset]. https://dataintelo.com/report/document-workflow-automation-ai-market
    Explore at:
    pdf, pptx, csvAvailable download formats
    Dataset updated
    Sep 30, 2025
    Dataset authored and provided by
    Dataintelo
    License

    https://dataintelo.com/privacy-and-policyhttps://dataintelo.com/privacy-and-policy

    Time period covered
    2024 - 2032
    Area covered
    Global
    Description

    Document Workflow Automation AI Market Outlook



    According to our latest research, the global Document Workflow Automation AI market size reached USD 3.8 billion in 2024, reflecting robust adoption across diverse industries. The market is set to expand at a CAGR of 17.2% from 2025 to 2033, with the total value forecasted to hit USD 14.2 billion by 2033. Key growth drivers include the rising demand for operational efficiency, the proliferation of advanced AI technologies, and increasing regulatory compliance needs. As organizations worldwide continue to digitize and streamline their document-centric processes, the market for Document Workflow Automation AI is poised for sustained growth and innovation.




    The primary growth driver for the Document Workflow Automation AI market is the accelerating digital transformation initiatives undertaken by enterprises globally. Organizations are increasingly recognizing the inefficiencies and costs associated with manual document handling, which often leads to errors, delays, and compliance risks. By leveraging AI-powered automation, businesses can dramatically reduce processing times, enhance accuracy, and ensure consistent compliance with regulatory standards. The integration of machine learning, natural language processing, and robotic process automation has enabled intelligent document classification, data extraction, and workflow routing, further streamlining complex business processes. This surge in adoption is particularly pronounced in sectors such as BFSI, healthcare, and legal, where document management is integral to daily operations and compliance requirements are stringent.




    Another significant factor propelling market growth is the increasing focus on data security and privacy. As organizations handle vast volumes of sensitive documents, the risk of data breaches and non-compliance with regulations such as GDPR, HIPAA, and CCPA has grown exponentially. Document Workflow Automation AI solutions offer robust security features, including encryption, access control, and automated audit trails, which not only mitigate risks but also make compliance reporting more efficient. Additionally, the ability to automate document retention and disposal policies ensures that organizations can manage information lifecycle effectively, reducing storage costs and minimizing legal exposure. The convergence of AI with cybersecurity measures is creating new opportunities for vendors to differentiate their offerings and address evolving customer concerns.




    The proliferation of cloud computing and the advent of hybrid work models have further accelerated the adoption of Document Workflow Automation AI solutions. As remote and distributed teams become the norm, organizations require seamless, scalable, and secure document management platforms accessible from anywhere. Cloud-based AI solutions enable real-time collaboration, version control, and workflow tracking, thus enhancing productivity and business agility. The scalability of cloud platforms also allows organizations of all sizes, from small and medium enterprises to large corporations, to deploy advanced automation without significant upfront investments in infrastructure. The rise of API-driven integrations and low-code/no-code platforms is also making it easier for businesses to customize and expand their workflow automation capabilities, fostering innovation and competitive advantage.




    From a regional perspective, North America continues to dominate the Document Workflow Automation AI market, driven by high technology adoption rates, significant investments in AI research, and a mature regulatory environment. Europe follows closely, with stringent data protection laws and a strong emphasis on digitalization in public and private sectors. The Asia Pacific region is emerging as a high-growth market, fueled by rapid economic development, increasing digital literacy, and government initiatives to modernize administrative processes. Latin America and the Middle East & Africa are also witnessing growing interest, particularly among multinational corporations and government agencies seeking to enhance operational efficiency and compliance. The global landscape is characterized by dynamic innovation, evolving customer expectations, and a competitive vendor ecosystem, setting the stage for sustained market expansion.



    Component Analysis



    The Document Workflow Automation AI market is segmented by component into Software and Ser

  10. Ultraconserved element phylogenomics and biogeography of the agriculturally...

    • data-staging.niaid.nih.gov
    • datasetcatalog.nlm.nih.gov
    • +4more
    zip
    Updated Aug 7, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Michael G. Branstetter; Andreas Müller; Terry L. Griswold; Michael C. Orr; Chao-Dong Zhu (2024). Ultraconserved element phylogenomics and biogeography of the agriculturally important mason bee subgenus Osmia (Osmia) [Dataset]. http://doi.org/10.5061/dryad.7d7wm37t4
    Explore at:
    zipAvailable download formats
    Dataset updated
    Aug 7, 2024
    Dataset provided by
    Key Laboratory of Zoological Systematics and Evolution
    ETH Zurich
    United States Department of Agriculture
    Authors
    Michael G. Branstetter; Andreas Müller; Terry L. Griswold; Michael C. Orr; Chao-Dong Zhu
    License

    https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html

    Description

    One of the most important non‐Apis groups of bees for agriculture is the mason bee subgenus Osmia Panzer (Osmia), or Osmia s.s. (Hymenoptera: Megachilidae). Out of the 29 known species, four have been developed as managed pollinators of orchards. In addition, the group is important as a source of non‐native pollinators, given that several species have been introduced into new areas. Osmia s.s. occurs naturally throughout the northern temperate zone with greatest species richness in Europe and Asia. Here, we integrate phylogenomic data from ultraconserved elements (UCEs), near complete taxon sampling, and a diversity of analytical approaches to infer the phylogeny, divergence times, and biogeographic history of Osmia s.s. We also demonstrate how mitochondrial sequence data can be extracted from ultraconserved element data and combined with sequences from public repositories in order to test the phylogeny, examine species boundaries and identify specimen‐associated, non‐bee DNA. We resolve the phylogeny of Osmia s.s. and show strong support that Nearctic Osmia ribifloris is the sister group to the rest of the subgenus. Biogeographic analyses indicate that the group originated during the Late Miocene in the West Nearctic plus East Palearctic region following dispersal from the East Palearctic to the West Nearctic across the Bering land bridge prior to its closure 5.5–4.8 Ma. The mitochondrial DNA results reveal potential taxonomic synonymies involving Osmia yanbianensis and Osmia opima, and Osmia rufina, Osmia rufinoides, and Osmia taurus. Methods See accompanying paper for more details. UCE Molecular Data Generation To generate a genome-scale dataset of Osmia s.s we used ultra-conserved element (UCE) phylogenomics (Branstetter et al., 2017; Faircloth et al., 2012), an approach that combines the targeted enrichment of thousands of nuclear, UCE loci with multiplexed next-generation sequencing. Enrichment was performed using a recently published bait set that targets loci shared across all Hymenoptera (“hym-v2-bee-ant-specific”; Grab et al., 2019). This bait set is a subset of the principal Hymenoptera bait set (Branstetter et al., 2017) and includes probes synthesized from one ant and one bee genome only. It includes 9,068 unique baits targeting 2,545 UCE loci, plus an additional 264 baits targeting 12 exons from 7 “legacy” markers (ArgK, CAD, EF1aF1 and F2, LwRh, NaK, PolD1, and TopI) that have been used extensively in Sanger sequencing-based studies. The bait set is currently available for purchase from Arbor Biosciences as a catalog kit. For all newly sequenced specimens, we extracted DNA using either the Qiagen DNeasy Blood and Tissue kit (48 specimens), or the Zymo Quick-DNA Miniprep Plus Kit (6 specimens). All extractions were performed using mounted museum specimens that had been collected between 1971-2017. For most specimens we removed and destructively sampled one to two legs, leaving the rest of the specimen intact and able to serve as a “same specimen” DNA voucher. For a couple of smaller species, we performed whole body, non-destructive extractions, in which the specimens were removed from their pin and soaked in proteinase-K. Following the soak, the specimens were rinsed in 95% ethanol, dried, and re-mounted. For each extraction, we followed the manufacturer’s extraction protocol, except for the following modifications (applies to both kit types): (1) the proteinase-K digestion was performed overnight for 12-16 hours; (2) the elution buffer was warmed to 60º C prior to elution; (3) the elution buffer was allowed to incubate in the column for five minutes; and (4) DNA was eluted two times using 60-75 uL of fresh elution buffer each time. After extraction, the concentration of eluted DNA was measured using a Qubit 3.0 fluorometer (Thermo Fisher Scientific) and 1-50 ng of extracted DNA was used for library preparation and target enrichment. Each DNA sample was sheared using a Qsonica Q800R2 acoustic sonicator, with the target fragment size range being 400-600 bp. Having larger fragment sizes in the sequencing pool improves the amount of flanking DNA that is sequenced, which can improve UCE contig lengths following assembly. For recently collected, high quality samples, the sonicator was run for 90-120 seconds (shearing time) at 25% amplitude and with a 10-10 second on-off pulse. For older samples that likely had more degraded DNA, we adjusted the shearing times to between 30-60 seconds. Following sonication, fragmented DNA was purified at 3x volume using a homemade SPRI-bead substitute (Rohland & Reich, 2012). Illumina sequencing libraries were then generated using Kapa Hyper Prep kits and custom, 8 bp, dual-indexing adapters (Glenn et al., 2019). All library preparation steps were performed at quarter volume of the manufacturer’s recommendations, except for PCR, which was done at the full 50 uL volume. Limited cycle PCR was performed for 12 cycles on most samples, but for lower quality samples we increased the number of cycles to 14-16 cycles, usually as a re-PCR of the of the pre-PCR library. Each amplified library was cleaned using 1.0-1.2x SPRI beads in order to remove contaminants and select out small fragments below 200 bp, including possible adapter dimer. The concentration of the final cleaned library was measured using Qubit. Enrichment was performed using the bee-ant UCE bait set described above and by following either a standard UCE protocol (enrichment protocol v1.5 available at ultraconserved.org), based on Blumenstiel et al. (2010), or by using a modified protocol in which we followed Arbor Biosciences v3.02 protocol for enrichment day 1 (more efficient than the standard protocol), and the standard UCE protocol for day 2. In either case, we combined up to 10 samples into equimolar enrichment pools and used 500 ng of pooled DNA for enrichment. On the first day of enrichment, the pooled DNA samples were combined with biotinylated RNA-baits and a set of blocking agents that reduce non-specific binding (salmon sperm, adapter blockers, Cot-1). We used a 20k probe, custom order of the probe set (diluted 1 ul bait to 4 uL H2O) and our own blocking reagents, which included chicken Cot-1 DNA (Applied Genetics Laboratories, Inc.), rather than the human Cot-1 DNA that comes with Arbor Bioscience’s kits. We performed the enrichment incubation at 65ºC for 24 hours using strip tubes and a PCR thermal cycler. For the second day of enrichment we used 50 uL of streptavidin beads per sample and performed on-bead PCR following the three heated (65ºC) wash steps. The enriched pools were amplified for 18 cycles and the resulting products were cleaned with SPRI beads at 1x volume. Following enrichment, each enrichment pool was quantified using qPCR and pooled together into a final sequencing pool at equimolar concentrations. Sequencing pools were either sequenced locally on an in-lab Illumina MiniSeq (2x125) or sent to the University of Utah Genomics Core for sequencing on an Illumina HiSeq 2500 (2x125, v4 chemistry). Sequence Processing and Matrix Generation The raw sequence data were demultiplexed and converted to FASTQ format using BCL2FASTQ (Illumina). The resulting FASTQ reads were then cleaned and processed using the software package PHYLUCE v1.6 (Faircloth, 2016) and associated programs. Within the PHYLUCE environment, the raw reads were trimmed for adapter contamination using ILLUMIPROCESSOR v2.0 (Faircloth, 2013), which incorporates TRIMMOMATIC (Bolger et al., 2014). Trimmed reads were assembled de novo into contigs using both TRINITY v2013-02-25 (Grabherr et al., 2011) and SPADES (Bankevich et al., 2012) for comparative purposes. To identify and extract UCE contigs from the bulk set of contigs we used a PHYLUCE program (match_contigs_to_probes) that uses LASTZ v1.0 (Harris, 2007) to match probe sequences to contig sequences and create a relational database of hits. For this step, we adjusted the min-identity and min-coverage settings to 70 and 75, respectively, which we have found to recover the highest number of UCE loci in bees when using the hym-v2-bee-ant UCE probes and bait file. After extracting UCE contigs, we aligned each UCE locus using MAFFT v7.130b (Katoh & Standley, 2013), trying both the default algorithm setting in PHYLUCE (FFT-NS-i) and the slower but probably more accurate L-INS-i algorithm. To remove poorly aligned regions, we trimmed internal and external regions of alignments using GBLOCKS (Talavera & Castresana, 2007), with reduced stringency parameters (b1:0.5, b2:0.5, b3:12, b4:7). Using PHYLUCE, we then concatenated all loci into a supermatrix and inferred a preliminary tree using IQ-TREE v2.0-rc1 (Minh et al., 2020). The resulting tree showed that several of the older, lower quality samples had exaggerated terminal branch lengths, likely resulting from difficulties in aligning shorter sequence fragments to more complete ones, an issue that is seemingly common in species-level target enrichment datasets containing old samples. To remove these poorly aligned sequences, we used the automated trimming program SPRUCEUP (Borowiec, 2019) on the complete set of UCE alignments. This program identifies outlier alignment sequences in individual samples, rather than alignment columns, and trims the sequences based on user defined cutoffs applied to all samples or to specific samples only. In the configuration file, we set SPRUCEUP to use the Jukes-Cantor-corrected distance calculation, a window size of 20 base pairs (bp), an overlap size of 15 bp, a lognormal distribution, and a cutoff value of 0.98. Additionally, we used manual cutoffs for the samples with the most exaggerated terminal branch lengths (Osmia_maxillaris_BLX43:0.039, Osmia_melanocephala_BLX45:0.035, Osmia_emarginata_infuscata_BLX38:0.04, Osmia_mustelina_umbrosa_BLX47:0.047, Osmia_cornuta_BLX35:0.037, Osmia_cerinthidis_cerinthidis_BLX32:0.04, Osmia_cerinthidis_cerinthidis_BLX176:0.05,

  11. F

    Fume Eliminators Report

    • promarketreports.com
    doc, pdf, ppt
    Updated Aug 25, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Pro Market Reports (2025). Fume Eliminators Report [Dataset]. https://www.promarketreports.com/reports/fume-eliminators-135381
    Explore at:
    pdf, doc, pptAvailable download formats
    Dataset updated
    Aug 25, 2025
    Dataset authored and provided by
    Pro Market Reports
    License

    https://www.promarketreports.com/privacy-policyhttps://www.promarketreports.com/privacy-policy

    Time period covered
    2025 - 2033
    Area covered
    Global
    Variables measured
    Market Size
    Description

    The global fume eliminator market is experiencing robust growth, driven by increasing industrialization, stringent environmental regulations, and a rising focus on worker safety. While precise market size data for 2025 isn't provided, considering the presence of major players like Nederman, BOFA, and Lincoln Electric, and referencing similar industrial equipment markets, a reasonable estimate for the 2025 market size would be approximately $2.5 billion USD. Assuming a Compound Annual Growth Rate (CAGR) of 6% (a conservative estimate considering market trends and technological advancements in fume extraction), the market is projected to reach approximately $3.7 billion by 2033. This growth is fueled by several key factors, including the increasing adoption of automation in manufacturing, the rising demand for efficient and energy-saving fume extraction systems, and a growing awareness of the health risks associated with airborne contaminants in various industrial settings. The market segmentation likely includes different types of fume eliminators (e.g., benchtop, mobile, and centralized systems), serving various industries (e.g., electronics manufacturing, welding, and automotive). Significant trends shaping the market include the development of advanced filtration technologies, the integration of smart sensors and IoT capabilities for predictive maintenance and optimized performance, and the increasing demand for customized solutions tailored to specific industrial applications. However, restraints like high initial investment costs, the need for regular maintenance, and the availability of cost-effective alternatives could potentially impede market growth to some degree. Despite these challenges, the overall outlook remains positive, with continued growth anticipated throughout the forecast period (2025-2033) driven by the long-term trends towards enhanced workplace safety and environmental responsibility. This report provides a detailed analysis of the global fume eliminators market, projecting a market value exceeding $3.5 billion by 2028. It delves into market dynamics, competitive landscapes, and future growth potential, leveraging extensive research and data analysis. This report is crucial for businesses involved in manufacturing, distribution, and utilization of fume eliminators, enabling informed strategic decision-making. Key search terms like "industrial fume extraction," "soldering fume extraction," "welding fume extraction," and "laboratory fume hoods" are incorporated throughout to enhance search engine optimization.

  12. LUMIERE dataset - Pyradiomics features based on DeepBraTumIA segmentations

    • springernature.figshare.com
    txt
    Updated Dec 13, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yannick Suter; Urspeter Knecht; Waldo Valenzuela; Michelle Notter; Ekkehard Hewer; Philippe Schucht; Roland Wiest; Mauricio Reyes (2022). LUMIERE dataset - Pyradiomics features based on DeepBraTumIA segmentations [Dataset]. http://doi.org/10.6084/m9.figshare.21187033.v1
    Explore at:
    txtAvailable download formats
    Dataset updated
    Dec 13, 2022
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Yannick Suter; Urspeter Knecht; Waldo Valenzuela; Michelle Notter; Ekkehard Hewer; Philippe Schucht; Roland Wiest; Mauricio Reyes
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    This CSV contains the radiomic features extracted for all study dates where all four MRI sequences and automated segmentation from DeepBraTumIA are available. The features were extracted from the images resampled to atlas space. Please note that features could not be extracted for studies where a given segmentation label was not found (or too small, see the minimum ROI size in the settings column). See our GitHub repository for a script to customize the extraction.

  13. f

    Data extracts.

    • plos.figshare.com
    xlsx
    Updated Jan 18, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Camlus Otieno Odhus; Ruth Razanajafy Kapanga; Elizabeth Oele (2024). Data extracts. [Dataset]. http://doi.org/10.1371/journal.pgph.0002756.s006
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Jan 18, 2024
    Dataset provided by
    PLOS Global Public Health
    Authors
    Camlus Otieno Odhus; Ruth Razanajafy Kapanga; Elizabeth Oele
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The quality of health care remains generally poor across primary health care settings, especially in low- and middle-income countries where tertiary care tends to take up much of the limited resources despite primary health care being the first (and often the only) point of contact with the health system for nearly 80 per cent of people in these countries. Evidence is needed on barriers and enablers of quality improvement initiatives. This systematic review sought to answer the question: What are the enablers of and barriers to quality improvement in primary health care in low- and middle-income countries? It adopted an integrative review approach with narrative evidence synthesis, which combined qualitative and mixed methods research studies systematically. Using a customized geographic search filter for LMICs developed by the Cochrane Collaboration, Scopus, Academic Search Ultimate, MEDLINE, CINAHL, PSYCHINFO, EMBASE, ProQuest Dissertations and Overton.io (a new database for LMIC literature) were searched in January and February 2023, as were selected websites and journals. 7,077 reports were retrieved. After removing duplicates, reviewers independently screened titles, abstracts and full texts, performed quality appraisal and data extraction, followed by analysis and synthesis. 50 reports from 47 studies were included, covering 52 LMIC settings. Six themes related to barriers and enablers of quality improvement were identified and organized using the model for understanding success in quality (MUSIQ) and the consolidated framework for implementation research (CFIR). These were: microsystem of quality improvement, intervention attributes, implementing organization and team, health systems support and capacity, external environment and structural factors, and execution. Decision makers, practitioners, funders, implementers, and other stakeholders can use the evidence from this systematic review to minimize barriers and amplify enablers to better the chances that quality improvement initiatives will be successful in resource-limited settings. PROSPERO registration: CRD42023395166.

  14. f

    Inclusion and exclusion criteria.

    • plos.figshare.com
    xls
    Updated Jan 18, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Camlus Otieno Odhus; Ruth Razanajafy Kapanga; Elizabeth Oele (2024). Inclusion and exclusion criteria. [Dataset]. http://doi.org/10.1371/journal.pgph.0002756.t001
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jan 18, 2024
    Dataset provided by
    PLOS Global Public Health
    Authors
    Camlus Otieno Odhus; Ruth Razanajafy Kapanga; Elizabeth Oele
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The quality of health care remains generally poor across primary health care settings, especially in low- and middle-income countries where tertiary care tends to take up much of the limited resources despite primary health care being the first (and often the only) point of contact with the health system for nearly 80 per cent of people in these countries. Evidence is needed on barriers and enablers of quality improvement initiatives. This systematic review sought to answer the question: What are the enablers of and barriers to quality improvement in primary health care in low- and middle-income countries? It adopted an integrative review approach with narrative evidence synthesis, which combined qualitative and mixed methods research studies systematically. Using a customized geographic search filter for LMICs developed by the Cochrane Collaboration, Scopus, Academic Search Ultimate, MEDLINE, CINAHL, PSYCHINFO, EMBASE, ProQuest Dissertations and Overton.io (a new database for LMIC literature) were searched in January and February 2023, as were selected websites and journals. 7,077 reports were retrieved. After removing duplicates, reviewers independently screened titles, abstracts and full texts, performed quality appraisal and data extraction, followed by analysis and synthesis. 50 reports from 47 studies were included, covering 52 LMIC settings. Six themes related to barriers and enablers of quality improvement were identified and organized using the model for understanding success in quality (MUSIQ) and the consolidated framework for implementation research (CFIR). These were: microsystem of quality improvement, intervention attributes, implementing organization and team, health systems support and capacity, external environment and structural factors, and execution. Decision makers, practitioners, funders, implementers, and other stakeholders can use the evidence from this systematic review to minimize barriers and amplify enablers to better the chances that quality improvement initiatives will be successful in resource-limited settings. PROSPERO registration: CRD42023395166.

  15. Parameters setting of the proposed classification network.

    • plos.figshare.com
    xls
    Updated Jun 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kai Cao; Tao Deng; Chuanlin Zhang; Limeng Lu; Lin Li (2023). Parameters setting of the proposed classification network. [Dataset]. http://doi.org/10.1371/journal.pone.0276758.t002
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Kai Cao; Tao Deng; Chuanlin Zhang; Limeng Lu; Lin Li
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Parameters setting of the proposed classification network.

  16. LUMIERE dataset - Pyradiomics features based on HD-GLIO-AUTO segmentations

    • springernature.figshare.com
    txt
    Updated Dec 13, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yannick Suter; Urspeter Knecht; Waldo Valenzuela; Michelle Notter; Ekkehard Hewer; Philippe Schucht; Roland Wiest; Mauricio Reyes (2022). LUMIERE dataset - Pyradiomics features based on HD-GLIO-AUTO segmentations [Dataset]. http://doi.org/10.6084/m9.figshare.21187030.v1
    Explore at:
    txtAvailable download formats
    Dataset updated
    Dec 13, 2022
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Yannick Suter; Urspeter Knecht; Waldo Valenzuela; Michelle Notter; Ekkehard Hewer; Philippe Schucht; Roland Wiest; Mauricio Reyes
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    This CSV contains the radiomic features extracted for all study dates where all four MRI sequences and automated segmentation from HD-GLIO-AUTO are available. The features were extracted from the images resampled to atlas space. Please note that features could not be extracted for studies where a given segmentation label was not found (or too small, see the minimum ROI size in the settings column). See our GitHub repository for a script to customize the extraction.

  17. Data Sheet 1_Methods for processing and analyzing images of vascularized...

    • frontiersin.figshare.com
    • datasetcatalog.nlm.nih.gov
    pdf
    Updated Jun 12, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Stephanie J. Hachey; Christopher J. Hatch; Daniela Gaebler; Alexander G. Forsythe; Makena L. Ewald; Alexander L. Chopra; Zhangying Chen; Kapil Thapa; Melvin Hodanu; Jennifer S. Fang; Christopher C. W. Hughes (2025). Data Sheet 1_Methods for processing and analyzing images of vascularized micro-organ and tumor systems.pdf [Dataset]. http://doi.org/10.3389/fbioe.2025.1585003.s001
    Explore at:
    pdfAvailable download formats
    Dataset updated
    Jun 12, 2025
    Dataset provided by
    Frontiers Mediahttp://www.frontiersin.org/
    Authors
    Stephanie J. Hachey; Christopher J. Hatch; Daniela Gaebler; Alexander G. Forsythe; Makena L. Ewald; Alexander L. Chopra; Zhangying Chen; Kapil Thapa; Melvin Hodanu; Jennifer S. Fang; Christopher C. W. Hughes
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Our group has developed and validated an advanced microfluidic platform to improve preclinical modeling of healthy and disease states, enabling extended culture and detailed analysis of tissue-engineered miniaturized organ constructs, or “organs-on-chips.” Within this system, diverse cell types self-organize into perfused microvascular networks under dynamic flow within tissue chambers, effectively mimicking the structure and function of native tissues. This setup facilitates physiological intravascular delivery of nutrients, immune cells, and therapeutic agents, and creates a realistic microenvironment to study cellular interactions and tissue responses. Known as the vascularized micro-organ (VMO), this adaptable platform can be customized to represent various organ systems or tumors, forming a vascularized micro-tumor (VMT) for cancer studies. The VMO/VMT system closely simulates in vivo nutrient exchange and drug delivery within a 3D microenvironment, establishing a high-fidelity model for drug screening and mechanistic studies in vascular biology, cancer, and organ-specific pathologies. Furthermore, the optical transparency of the device supports high-resolution, real-time imaging of fluorescently labeled cells and molecules within the tissue construct, providing key insights into drug responses, cell interactions, and dynamic processes such as epithelial-mesenchymal transition. To manage the extensive imaging data generated, we created standardized, high-throughput workflows for image analysis. This manuscript presents our image processing and analysis pipeline, utilizing a suite of tools in Fiji/ImageJ to streamline data extraction from the VMO/VMT model, substantially reducing manual processing time. Additionally, we demonstrate how these tools can be adapted for analyzing imaging data from traditional in vitro models and microphysiological systems developed by other researchers.

  18. Data from: Dataset distribution.

    • plos.figshare.com
    • datasetcatalog.nlm.nih.gov
    xls
    Updated Mar 19, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Md. Sabbir Hossain; Niloy Basak; Md. Aslam Mollah; Md. Nahiduzzaman; Mominul Ahsan; Julfikar Haider (2025). Dataset distribution. [Dataset]. http://doi.org/10.1371/journal.pone.0318219.t001
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Mar 19, 2025
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Md. Sabbir Hossain; Niloy Basak; Md. Aslam Mollah; Md. Nahiduzzaman; Mominul Ahsan; Julfikar Haider
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Lung cancer (LC) is a leading cause of cancer-related fatalities worldwide, underscoring the urgency of early detection for improved patient outcomes. The main objective of this research is to harness the noble strategies of artificial intelligence for identifying and classifying lung cancers more precisely from CT scan images at the early stage. This study introduces a novel lung cancer detection method, which was mainly focused on Convolutional Neural Networks (CNN) and was later customized for binary and multiclass classification utilizing a publicly available dataset of chest CT scan images of lung cancer. The main contribution of this research lies in its use of a hybrid CNN-SVD (Singular Value Decomposition) method and the use of a robust voting ensemble approach, which results in superior accuracy and effectiveness for mitigating potential errors. By employing contrast-limited adaptive histogram equalization (CLAHE), contrast-enhanced images were generated with minimal noise and prominent distinctive features. Subsequently, a CNN-SVD-Ensemble model was implemented to extract important features and reduce dimensionality. The extracted features were then processed by a set of ML algorithms along with a voting ensemble approach. Additionally, Gradient-weighted Class Activation Mapping (Grad-CAM) was integrated as an explainable AI (XAI) technique for enhancing model transparency by highlighting key influencing regions in the CT scans, which improved interpretability and ensured reliable and trustworthy results for clinical applications. This research offered state-of-the-art results, which achieved remarkable performance metrics with an accuracy, AUC, precision, recall, F1 score, Cohen’s Kappa and Matthews Correlation Coefficient (MCC) of 99.49%, 99.73%, 100%, 99%, 99%, 99.15% and 99.16%, respectively, addressing the prior research gaps and setting a new benchmark in the field. Furthermore, in binary class classification, all the performance indicators attained a perfect score of 100%. The robustness of the suggested approach offered more reliable and impactful insights in the medical field, thus improving existing knowledge and setting the stage for future innovations.

  19. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Aman Anand (2021). iNeuron Projectathon Oct-Nov'21 [Dataset]. https://www.kaggle.com/yekahaaagayeham/ineuron-projectathon-octnov21
Organization logo

iNeuron Projectathon Oct-Nov'21

Explore at:
zip(3335989 bytes)Available download formats
Dataset updated
Oct 22, 2021
Authors
Aman Anand
Description

iNeuron-Projectathon-Oct-Nov-21

Problem Statement:

Design a web portal to automate the various operation performed in machine learning projects to solve specific problems related to supervised or unsupervised use case.. Web portal must have the capabilities to perform below-mentioned task: 1. Extract Transform Load: a. Extract: Portal should provide the capabilities to configure any data source example. Cloud Storage (AWS, Azure, GCP), Database (RDBMS, NoSQL,), and real-time streaming data to extract data into portportal. (Allow feasibility to write cucustom script if required to connect to any data source to extract data) b. Transform: Portal should provide various inbuilt functions/components to apply rich set of transformation to transform extracted data into desired format. c. Load: Portal should be able to save data into any of the cloud storage after extracted data transformed into desired format. d. Allow user to write custom script in python if some of the functionality is not present in the portal. 2. Exploratory Data Analysis: Portal should allow users to perform exploratory data analysis. 3. Data Preparation: data wrangling, feature extraction and feature selection should be automation with minimal user intervention. 4. Application must suggest appropriate machine learning algorithm which is best suitable for the use case and perform best model search operation to automate model development operation. 5. Application should provide feature to deploy model in any of the cloud and application should create prediction API to predict new instances. 6. Application should log each and every detail so that each activity can be audited in future to investigate any of the event. 7. Detail report should be generated for ETL, EDA, Data preparation and Model development and deployment. 8. Create a dashboard to monitor model performance and create various alert mechanism to notify appropriate user to take necessary precaution. 9. Create functionality to implement retraining for existing model if it is necessary. 10.Portal must be designed in such a way that it can be used by multiple organization/user where each organization/user is isolated from other. 11.Portal should provide functionality to manage user. Similar to RBAC concept used in Cloud. (It is not necessary to build so many role but design it in such a way that it can add role in future so that newly created role can also be applied to users.) Organization/User can have multiple user and each user will have specific role. 12.Portal should have a scheduler to schedule training or prediction task and appropriate alert regarding to scheduled job should be notified to subscriber/configured email id. 13.Implement watcher functionality to perform prediction as soon as file arrived at input location.

Approach:

  1. Follow standard guild line to write quality solution for web portal.
  2. Follow OOPS to design solution.
  3. Implement REST API wherever possible.
  4. Implement CI, CD pipeline with automated testing and dockerization. (Use container or Kubernetes to deploy your dockerized application)
  5. CI, CD pipeline should have different environment example ST, SST, Production. Note: Feel free to use any of the technology to design your solution.

Results:

You have to build a solution that should summarize the various news articles from different reading categories.

Project Evaluation metrics:

Code:  You are supposed to write a code in a modular fashion  Safe: It can be used without causing harm.  Testable: It can be tested at the code level.  Maintainable: It can be maintained, even as your codebase grows.  Portable: It works the same in every environment (operating system)  You have to maintain your code on GitHub.  You have to keep your GitHub repo public so that anyone can check your code.  Proper readme file you have to maintain for any project development.  You should include basic workflow and execution of the entire project in the readme

file on GitHub  Follow the coding standards: https://www.python.org/dev/peps/pep-0008/

Database: Based on development requirement feel free to choose any database (SQL,

NoSQL) or use multiple database.

Cloud:

 You can use any cloud platform for this entire solution hosting like AWS, Azure or GCP.

API Details or User Interface:

  1. Web portal should be designed like any cloud platform.
  2. Model developed using web portal should have functionality to expose API to test prediction.

Logging:

 Logging is a must for every action performed by your code use the python logging library for this.

DevOps Pipeline:

Use source version control tool to implement CI, CD pipeline, e.g.: Azure Devops, Github, Circle CI.

Deployment:

 You can host your application in the cloud platform using automated CI, CD pipeline.

Solutions Design:

 You have to submit complete solution design strate...

Search
Clear search
Close search
Google apps
Main menu