23 datasets found
  1. Z

    Dataset for a tutorial dedicated to the Sankey diagram

    • data.niaid.nih.gov
    • zenodo.org
    Updated Aug 18, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Antoine Lamer; Manel Ismail; Rémi Lenain (2022). Dataset for a tutorial dedicated to the Sankey diagram [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7004010
    Explore at:
    Dataset updated
    Aug 18, 2022
    Authors
    Antoine Lamer; Manel Ismail; Rémi Lenain
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The dataset is a standard table representing steps of patient care. It contains 4 standard variables : a patient identifier, the label of the step, the start date and the end date of the step. One patient may have several steps. The step labels are synthetic (i.e., A, B, C, D, E, F) and may correspond to passages in care unit, successive administrations of drugs or carrying out of medical procedures.

    This dataset is used for a tutorial dedicated to the Sankey diagram : https://gitlab.com/d8096/health_data_science_tutorials/-/tree/main/tutorials/sankey_diagram

  2. Synthetic total-field magnetic anomaly data and code to perform Euler...

    • figshare.com
    txt
    Updated May 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Leonardo Uieda; Vanderlei C. Oliveira Jr.; Valeria C. F. Barbosa (2023). Synthetic total-field magnetic anomaly data and code to perform Euler deconvolution on it [Dataset]. http://doi.org/10.6084/m9.figshare.923450.v1
    Explore at:
    txtAvailable download formats
    Dataset updated
    May 30, 2023
    Dataset provided by
    figshare
    Figsharehttp://figshare.com/
    Authors
    Leonardo Uieda; Vanderlei C. Oliveira Jr.; Valeria C. F. Barbosa
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Synthetic data, source code, and supplementary text for the article "Euler deconvolution of potential field data" by Leonardo Uieda, Vanderlei C. Oliveira Jr., and Valéria C. F. Barbosa. This is part of a tutorial submitted to The Leading Edge (http://library.seg.org/journal/tle). Results were generated using the open-source Python package Fatiando a Terra version 0.2 (http://www.fatiando.org). This material along with the manuscript can also be found at https://github.com/pinga-lab/paper-tle-euler-tutorial Synthetic data and model Examples in the tutorial use synthetic data generated with the IPython notebook create_synthetic_data.ipynb. File synthetic_data.txt has 4 columns: x (north), y (east), z (down) and the total field magnetic anomaly. x, y, and z are in meters. The total field anomaly is in nanoTesla (nT). File metadata.json contains extra information about the data, such as inclination and declination of the inducing field (in degrees), shape of the data grid (number of points in y and x, respectively), the area containing the data (W, E, S, N, in meters), and the model boundaries (W, E, S, N, top, bottom, in meters). File model.pickle is a serialized version of the model used to generate the data. It contains a list of instances of the PolygonalPrism class of Fatiando a Terra. The serialization was done using the cPickle Python module. Reproducing the results in the tutorial The notebook euler-deconvolution-examples.ipynb runs the Euler deconvolution on the synthetic data and generates the figures for the manuscript. It also presents a more detailed explanation of the method and more tests than went into the finished manuscript.

  3. u

    Many Models in R: A Tutorial - National Child Development Study: Age 46,...

    • datacatalogue.ukdataservice.ac.uk
    Updated Jul 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Wright, L, University College London (2023). Many Models in R: A Tutorial - National Child Development Study: Age 46, Sweep 7, 2004-2005: Synthetic Data, 2023 [Dataset]. http://doi.org/10.5255/UKDA-SN-856610
    Explore at:
    Dataset updated
    Jul 31, 2023
    Authors
    Wright, L, University College London
    Time period covered
    Jan 1, 2004 - Jan 1, 2005
    Area covered
    United Kingdom
    Description

    The deposit contains a dataset created for the paper, 'Many Models in R: A Tutorial'. ncds.Rds is an R format synthetic dataset created with the synthpop dataset in R using data from the National Child Development Study (NCDS), a birth cohort of individuals born in a single week of March 1958 in Britain. The dataset contains data on fourteen biomarkers collected at the age 46/47 sweep of the survey, four measures of cognitive ability from age 11 and 16, and three covariates, sex, body mass index at age 11 and father's social class. The data is only intended to be used in the tutorial - it is not to be used for drawing statistical inferences.

    This project contains data used in the paper, "Many Models in R: A Tutorial". The data are a simplified, synthetic and imputed version of the National Child Development Study. There are variables for 14 biomarkers from the age 46/47 biomedical survey, 4 measures of cognitive ability from tests at ages 11 and 16, and 3 covariates (sex, father's socioeconomic class and BMI at age 11).

  4. R

    Synthetic Fruit Old Dataset

    • universe.roboflow.com
    zip
    Updated Apr 14, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Brad Dwyer (2020). Synthetic Fruit Old Dataset [Dataset]. https://universe.roboflow.com/brad-dwyer/synthetic-fruit-old/dataset/1
    Explore at:
    zipAvailable download formats
    Dataset updated
    Apr 14, 2020
    Dataset authored and provided by
    Brad Dwyer
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Variables measured
    Fruits Bounding Boxes
    Description

    About this dataset

    This dataset contains 6,000 example images generated with the process described in Roboflow's How to Create a Synthetic Dataset tutorial.

    The images are composed of a background (randomly selected from Google's Open Images dataset) and a number of fruits (from Horea94's Fruit Classification Dataset) superimposed on top with a random orientation, scale, and color transformation. All images are 416x550 to simulate a smartphone aspect ratio.

    To generate your own images, follow our tutorial or download the code.

    Example: https://blog.roboflow.ai/content/images/2020/04/synthetic-fruit-examples.jpg" alt="Example Image">

  5. h

    synthetic-data-gemini-2.0-ComplexConfigurations

    • huggingface.co
    Updated Feb 7, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    vivek (2025). synthetic-data-gemini-2.0-ComplexConfigurations [Dataset]. https://huggingface.co/datasets/vivek-dodia/synthetic-data-gemini-2.0-ComplexConfigurations
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 7, 2025
    Authors
    vivek
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    MikroTik RouterOS Configuration Dataset

    A structured dataset containing MikroTik RouterOS configuration guides and tutorials generated using Gemini 2.0 model_name="gemini-2.0-flash-exp".

      Dataset Details
    

    3000+ configuration examples Source: Synthetic data generated from gemini-2.0-flash-exp LLM Format: Parquet file with structured columns

      Columns
    

    filename: Original MD file name title: Configuration guide title prompt: Scenario description… See the full description on the dataset page: https://huggingface.co/datasets/vivek-dodia/synthetic-data-gemini-2.0-ComplexConfigurations.

  6. Synthetic Lego brick dataset for object detection

    • kaggle.com
    zip
    Updated Nov 15, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mantas Gribulis (2021). Synthetic Lego brick dataset for object detection [Dataset]. https://www.kaggle.com/datasets/mantasgr/synthetic-lego-brick-dataset-for-object-detection/code
    Explore at:
    zip(54438517 bytes)Available download formats
    Dataset updated
    Nov 15, 2021
    Authors
    Mantas Gribulis
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Context

    This a Lego bricks image dataset that is annotated in a PASCAL VOC format ready for ML object detection pipeline. Additionally I made tutorials on how to: - Generate synthetic images and create bounding box annotations in Pascal VOC format using Blender. - Train ML models (YoloV5 and SSD) for detecting multiple objects in an image. The tutorial with Blender scripts for rendering the dataset and Jupyter notebooks for training ML models can be found here: https://github.com/mantyni/Multi-object-detection-lego

    Content

    Dataset contains: Lego Brick images in JPG format, 300x300 resolution Annotations in PASCAL-VOC format There's 6 Lego bricks in this dataset, each appearing approximately 600 times across the dataset: brick_2x2, brick_2x4, brick_1x6, plate_1x2, plate_2x2, plate_2x4

    Lego brick 3D models obtained from: Mecabricks - https://www.mecabricks.com/

    First 500 images are of individual Lego bricks rendered in different angles and backgrounds. Images afterwards are of multiple bricks. Each image is rendered using different backgrounds, brick colour and shadow variations to enable Sim2Real transfer. After training ML (YoloV5 and SSD) models on synthetic dataset I then tested it on real images achieving ~70% detection accuracy.

    Inspiration

    The main purpose of this project is to show how to create your own realistic synthetic image datasets for training computer vision models without needing real world data.

  7. ⚙️ SQL Tutorial Exercise Data

    • kaggle.com
    zip
    Updated Oct 2, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    mexwell (2023). ⚙️ SQL Tutorial Exercise Data [Dataset]. https://www.kaggle.com/datasets/mexwell/sql-tutorial-exercise-data
    Explore at:
    zip(3701453 bytes)Available download formats
    Dataset updated
    Oct 2, 2023
    Authors
    mexwell
    Description

    This dataset was created to be the base of the data.world SQL tutorial exercises. Data was genererated using Synthea, a synthetic patient generator that models the medical history of synthetic patients. Their mission is to output high-quality synthetic, realistic but not real, patient data and associated health records covering every aspect of healthcare. The resulting data is free from cost, privacy, and security restrictions, enabling research with Health IT data that is otherwise legally or practically unavailable. De-identified real data still presents a challenge in the medical field because there are peopel who excel at re-identification of these data. For that reason the average medical center, etc. will not share their patient data. Most governmental data is at the hospital level. NHANES data is an exception.

    You can read Synthea's first academic paper here.

    Original Data

    Acknowlegement

    Foto von Rubaitul Azad auf Unsplash

  8. Z

    Land Cover Fraction Mapping with FORCE - Supplemental Data

    • data.niaid.nih.gov
    Updated Jan 13, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Schug, Franz; Frantz, David (2023). Land Cover Fraction Mapping with FORCE - Supplemental Data [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7529762
    Explore at:
    Dataset updated
    Jan 13, 2023
    Dataset provided by
    University of Trier
    University of Wisconsin - Madison
    Authors
    Schug, Franz; Frantz, David
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This upload contains data required to replicate a tutorial that applies regression-based unmixing of spectral-temporal metrics for sub-pixel land cover mapping with synthetically created training data. The tutorial uses the Framework for Operational Radiometric Correction for Environmental monitoring.

    This dataset contains intermediate and final results of the workflow described in that tutorial as well as auxiliary data such as parameter files.

    Please refer to the above mentioned tutorial for more information.

  9. C4 200M Grammar Error Correction dataset

    • kaggle.com
    zip
    Updated Apr 18, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dario Cioni (2023). C4 200M Grammar Error Correction dataset [Dataset]. https://www.kaggle.com/datasets/dariocioni/c4200m/discussion
    Explore at:
    zip(15601869562 bytes)Available download formats
    Dataset updated
    Apr 18, 2023
    Authors
    Dario Cioni
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Grammar Error Correction synthetic dataset consisting of 185 million sentence pairs, created using a Tagged Corruption modelon Google's C4 dataset.

    This version of the dataset was extracted from "https://huggingface.co/datasets/liweili/c4_200m">Li Liwei's HuggingFace dataset and converted to TSV format.

    The corruption edits by Felix Stahlberg and Shankar Kumar are licensed under CC BY 4.0. C4 dataset was released by AllenAI under the terms of ODC-BY By using this, you are also bound by the Common Crawl terms of use in respect of the content contained in the dataset.

    Format

    This dataset is converted in Parquet format, but a TSV format is available in previous versions. The reason of the conversion was the poor performance in accessing each file. I'm open to request and suggestions on how to better handle such a big dataset.

    This dataset is available in TSV format, splitted in 10 files of approximately 18M samples each. Each sample is a couple formed by the incorrect and the corrected sentences. | Incorrect | Corrected| | ------------- |:-------------:| | Much many brands and sellers still in the market. | Many brands and sellers still in the market. | | She likes playing in park and come here every week | She likes playing in the park and comes here every week |

    Usage

    I'm planning of releasing a notebook where I'll show Grammar Error Correction using a seq2seq architecture based on BERT and LSTM. Until then, you can try to build your own model!

    This dataset can be used to train sequence-to-sequence models, based on encoder-decoder approach.
    The task is quite similar to the NMT task, here are some tutorials: - NLP from scratch: translation with a seq2seq network and attention - Language Translation with nn.Transformers and TorchText

    https://production-media.paperswithcode.com/tasks/gec_foTfIZW.png" alt="Grammar Error Correction example">

    Acknowledgments

    Thanks to the dataset creators Felix Stahlberg and Shankar Kumar and to Li Liwei for first giving access to the processed dataset.

    References

  10. GenoCAD Tutorials

    • figshare.com
    zip
    Updated May 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mary Mangan; Mandy Wilson; Laura Adam; Jean Peccoud (2023). GenoCAD Tutorials [Dataset]. http://doi.org/10.6084/m9.figshare.153827.v15
    Explore at:
    zipAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    figshare
    Figsharehttp://figshare.com/
    Authors
    Mary Mangan; Mandy Wilson; Laura Adam; Jean Peccoud
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This tutorial includes two PowerPoint presentations developed by Mary Mangan from OpenHelix. Students should start with the Introduction prior to moving on to the Advanced tutorial. The slides decks include numerous comments that will help students go through the tutorials. In order to perform the hands on activities students need to download the GenoCAD Training Set. This dataset includes a list of parts and a grammar used as part of the GenoCAD Introductory tutorial. In order to import this data set in GenoCAD, proceed as follows: 1- Log into GenoCAD, create an account if you don't already have one. 2- Click on the Parts tab. 3- Click on the Grammars tab. 4- Click on the Add/Import Grammar button. 5- Using the "choose file" button, select the grammar file (.genocad) and click on import grammar. 6- Click on "use existing icon set" and click on "continue import". Upon completion of this procedure you should have a new grammar with a library of 37 parts in your workspace.

    The tutorial also includes a series of additional exercises that will be used to reinforce the concepts introduced in the tutorial. Please visit the GenoCAD page for videos of the tutotials.

  11. LlamaIndex tutorial resources

    • kaggle.com
    zip
    Updated Dec 9, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Matthias (2023). LlamaIndex tutorial resources [Dataset]. https://www.kaggle.com/datasets/hiarsl/10k-forms
    Explore at:
    zip(128969069 bytes)Available download formats
    Dataset updated
    Dec 9, 2023
    Authors
    Matthias
    Description
    • The dataset contains input data (e.g., conference articles) that can be used when playing around with fine-tuning of embeddings for RAG applications with LlamaIndex (other use cases are possible as well of course). The dataset furthermore contains synthetic queries created using the input data and (fine-tuned) embedding models trained using the synthetic queries.
    • The form 10-K files in this dataset are used in tutorials from LlamaIndex (e.g., Fine-tuning an Adapter, Embedding fine-tuning)
    • Data is used in this public notebook: https://www.kaggle.com/code/hiarsl/fine-tuning-embeddings-with-llamaindex
  12. GenoCAD Training Set I

    • figshare.com
    zip
    Updated Jun 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mary Mangan; Mandy Wilson; Laura Adam; Jean Peccoud (2023). GenoCAD Training Set I [Dataset]. http://doi.org/10.6084/m9.figshare.153827.v2
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    figshare
    Figsharehttp://figshare.com/
    Authors
    Mary Mangan; Mandy Wilson; Laura Adam; Jean Peccoud
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This fileset includes a grammar file (.zip) and a list of parts used as part of the GenoCAD Introductory tutorial. In order to import this data set in GenoCAD, proceed as follows: 1- Log into GenoCAD, create an account if you don't already have one. 2- Click on the Parts tab. 3- Click on the Grammars tab. 4- Click on the Add/Import Grammar button. 5- Using the "choose file" button, select the grammar file (.zip) and click on import grammar. 6- Click on "use existing icon set" and click on "continue import". 7- Click on "1-Parts" to return to the parts management tool. 8- On the Libraries tab, click on the "New Library" button. 9- Select the "Training Set E.Coli v3" grammar and give a name to the parts library such as "Training Set Parts Library". 10- Click on the "My Parts" tab and click on the "Import parts" button. 11- Select the "Training Set Parts Library" parts library and the tab-delimited option. Select the Training_Set_Parts_v3.txt file and click import. Upon completion of this procedure you should have a new grammar with a library of 37 parts in your workspace.

  13. Synthetic data generating parameters. The table summarizes the generating...

    • plos.figshare.com
    xls
    Updated Jun 5, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Andrea Ranieri; Floriana Pichiorri; Emma Colamarino; Febo Cincotti; Donatella Mattia; Jlenia Toppi (2025). Synthetic data generating parameters. The table summarizes the generating parameters for synthetic networks showing the corresponding symbol, name and range after the application of the constraints in Section e.2. [Dataset]. http://doi.org/10.1371/journal.pone.0319031.t002
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 5, 2025
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Andrea Ranieri; Floriana Pichiorri; Emma Colamarino; Febo Cincotti; Donatella Mattia; Jlenia Toppi
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Synthetic data generating parameters. The table summarizes the generating parameters for synthetic networks showing the corresponding symbol, name and range after the application of the constraints in Section e.2.

  14. LMS Tracking Dataset

    • kaggle.com
    zip
    Updated May 6, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Prasad Patil (2024). LMS Tracking Dataset [Dataset]. https://www.kaggle.com/datasets/prasad22/lms-tracking-dataset
    Explore at:
    zip(5419 bytes)Available download formats
    Dataset updated
    May 6, 2024
    Authors
    Prasad Patil
    Description

    This dataset was collected by a edtech startup. The startup is into teaching entrepreneurial life-skills in animated-gamified format through its video series to kids between the age group of 6-14 years. Through its learning management system the company tracks the progress made by all of its subscribers on the platform. Company records platform content usage activity data and tries to follow up with parents if there is any inactiveness on the platform by their child. Here's more information about the dataset

    Dataset Information:

    • Child Name: Name of the subscriber kid
    • Email Address: Email address created by parent
    • Contact: Contact details of the parent
    • follow up: Responses received by the company employee after progress follow-up over the phone.
    • response: segregating the follow-up responses in to categories
    • Introduction: Tutorial 1
    • Activity:- Know your personality, a fun way:Tutorial 2
    • A Simple Quiz on the previous Video: Quiz on the Tutorial 2
    • Lets see what ‘Product’ is…:Tutorial 3
    • A Simple Quiz on the previous Video:Quiz on the Tutorial 3
    • Product that represents me: Tutorial 4
    • Let's see what 'Service' means: Tutorial 5
    • A Simple Quiz on the previous Video:Quiz on the Tutorial 5
    • Instruction for 'Product & Service' worksheet:Tutorial 6
    • Activity:- Product and Service Worksheet: Exercise on Tutorial 6
    • Instructions for Product Word Association:Tutorial 7
    • Activity:- Product Word Association:Exercise on Tutorial 7
    • Life without products??.... Impossible !:Tutorial 8
    • What Is a Need?:Tutorial 9
    • A Simple Quiz on the previous Video:Quiz on the Tutorial 9
    • Summary of Session 1: Summarizing all the learnings from the Tutorials 1-9
    • Your Feedback on Session 1: Feedback page

    There is some missing data as well. I hope it would be good dataset for beginners practicing their NLP skills.

    Image by Steven Weirather from Pixabay

    Note: This dataset is partially synthetic meaning names, email and contact details mentioned are not of the actual customers. Kindly use it for educational and research purposes.

  15. Z

    Test data in spreadsheet format

    • data-staging.niaid.nih.gov
    Updated Jan 24, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Goedhart, J. (2020). Test data in spreadsheet format [Dataset]. https://data-staging.niaid.nih.gov/resources?id=zenodo_1003222
    Explore at:
    Dataset updated
    Jan 24, 2020
    Dataset provided by
    SILS - University of Amsterdam
    Authors
    Goedhart, J.
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Synthetic test data for a tutorial that explains how to convert spreadsheet data to tidy data.

  16. Z

    SpikeInterface tutorial and results datasets

    • data-staging.niaid.nih.gov
    • data.niaid.nih.gov
    Updated Jun 14, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Buccino, Alessio P. (2023). SpikeInterface tutorial and results datasets [Dataset]. https://data-staging.niaid.nih.gov/resources?id=zenodo_3256070
    Explore at:
    Dataset updated
    Jun 14, 2023
    Dataset provided by
    Center for Integrative Neuroplasticity, University of Oslo
    Authors
    Buccino, Alessio P.
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Synthetic extracellular recording generated by MEArec for the SpikeInterface tutorials and results.

    Dataset descriptions

    recordings_36cells_four-tetrodes_30.0_10.0uV_20-06-2019_14_48.h5 : extracellular recording on 4 tetrodes on a shank. Each tetrode is in a diamond configuration and the inter-tetrode distance is 300 um. There are 36 ground-truth neurons distributed over the 4 tetrodes. The duration is 30 seconds and the noise is uncorrelated Gaussian noise with 10 uV standard deviation. It is used in this notebook notebook.

    recordings_50cells_SqMEA-10-15um_60.0_10.0uV_27-03-2019_13-31.h5 : extracellular recordings on a square MEA in a 10x10 configuration with 15 um inter-electrode distance. There are 50 ground-truth neurons. The duration is 60 seconds and the noise is uncorrelated Gaussian noise with 10 uV standard deviation. It is used in this notebook.

  17. r

    Data from: Palladium-catalysed hydroxylation and alkoxylation

    • resodate.org
    Updated Jun 30, 2016
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Stephan Enthaler; Anna Company (2016). Palladium-catalysed hydroxylation and alkoxylation [Dataset]. http://doi.org/10.14279/depositonce-5359
    Explore at:
    Dataset updated
    Jun 30, 2016
    Dataset provided by
    Technische Universität Berlin
    DepositOnce
    Authors
    Stephan Enthaler; Anna Company
    Description

    The formation of oxygen–carbon bonds is one of the fundamental transformations in organic synthesis. In this regard the application of palladium-based catalysts has been extensively studied during recent years. Nowadays it is an established methodology and the success has been proven in manifold synthetic procedures. This tutorial review summarizes the advances on palladium-catalysed C–O bond formation, means hydroxylation and alkoxylation reactions.

  18. Eye Gaze

    • kaggle.com
    zip
    Updated Jun 27, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    4Quant (2018). Eye Gaze [Dataset]. https://www.kaggle.com/4quant/eye-gaze
    Explore at:
    zip(4791808309 bytes)Available download formats
    Dataset updated
    Jun 27, 2018
    Dataset authored and provided by
    4Quant
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    Context

    The main reason for making this dataset is the publication of the paper: Learning from Simulated and Unsupervised Images through Adversarial Training and the idea of the SimGAN. The dataset and kernels should make it easier to get started making SimGAN networks and testing them out and comparing them to other approaches like KNN, GAN, InfoGAN and the like.

    Content

    gaze.csv: A full table of values produced by the UnityEyes tool for every image in the gaze.h5 file

    gaze.json: A json version of the CSV table (easier to read in pandas)

    gaze.h5: The synthetic gazes from the UnityEyes tool

    real_gaze.h5: The gaze images from MPII packed into a single hdf5

    Acknowledgements

    The synthetic images were generated with the windows version of UnityEyes http://www.cl.cam.ac.uk/research/rainbow/projects/unityeyes/tutorial.html

    The real images were taken from https://www.mpi-inf.mpg.de/departments/computer-vision-and-multimodal-computing/research/gaze-based-human-computer-interaction/appearance-based-gaze-estimation-in-the-wild-mpiigaze/, which can be cited like this: Appearance-based Gaze Estimation in the Wild, X. Zhang, Y. Sugano, M. Fritz and A. Bulling, Proc. of the IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), June, p.4511-4520, (2015).

    Inspiration

    Enhancement:

    • One of the challenges (as covered in the paper) is enhancing the simulated images by using the real images. One possible approach is using the SimGAN which is implemented for reference in one of the notebooks. There are a number of other approaches (pix2pix, CycleGAN) which could have interesting results.

    Gaze Detection:

    • The synthetic dataset has the gaze information since it was generated by UnityEyes with a predefined look-vector. The overview notebook covers what this vector means and how each component can be interpreted. It would be very useful to have a simple, quick network for automatically generating this look vector from an image
  19. f

    ANOVA results for the distributions in Fig 8. Four different one-way ANOVA...

    • plos.figshare.com
    xls
    Updated Jun 5, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Andrea Ranieri; Floriana Pichiorri; Emma Colamarino; Febo Cincotti; Donatella Mattia; Jlenia Toppi (2025). ANOVA results for the distributions in Fig 8. Four different one-way ANOVA for each combination of and in the toy example #2. The corresponding p and F values are shown in this table. [Dataset]. http://doi.org/10.1371/journal.pone.0319031.t004
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 5, 2025
    Dataset provided by
    PLOS ONE
    Authors
    Andrea Ranieri; Floriana Pichiorri; Emma Colamarino; Febo Cincotti; Donatella Mattia; Jlenia Toppi
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    ANOVA results for the distributions in Fig 8. Four different one-way ANOVA for each combination of and in the toy example #2. The corresponding p and F values are shown in this table.

  20. f

    Classification performances for the Breast Cancer Wisconsin dataset....

    • plos.figshare.com
    xls
    Updated Jun 5, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Andrea Ranieri; Floriana Pichiorri; Emma Colamarino; Febo Cincotti; Donatella Mattia; Jlenia Toppi (2025). Classification performances for the Breast Cancer Wisconsin dataset. Performance indices are extracted by the confusion matrix obtained comparing the original ground truth and the labels of the Fiedler vector. [Dataset]. http://doi.org/10.1371/journal.pone.0319031.t005
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 5, 2025
    Dataset provided by
    PLOS ONE
    Authors
    Andrea Ranieri; Floriana Pichiorri; Emma Colamarino; Febo Cincotti; Donatella Mattia; Jlenia Toppi
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Classification performances for the Breast Cancer Wisconsin dataset. Performance indices are extracted by the confusion matrix obtained comparing the original ground truth and the labels of the Fiedler vector.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Antoine Lamer; Manel Ismail; Rémi Lenain (2022). Dataset for a tutorial dedicated to the Sankey diagram [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7004010

Dataset for a tutorial dedicated to the Sankey diagram

Explore at:
Dataset updated
Aug 18, 2022
Authors
Antoine Lamer; Manel Ismail; Rémi Lenain
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

The dataset is a standard table representing steps of patient care. It contains 4 standard variables : a patient identifier, the label of the step, the start date and the end date of the step. One patient may have several steps. The step labels are synthetic (i.e., A, B, C, D, E, F) and may correspond to passages in care unit, successive administrations of drugs or carrying out of medical procedures.

This dataset is used for a tutorial dedicated to the Sankey diagram : https://gitlab.com/d8096/health_data_science_tutorials/-/tree/main/tutorials/sankey_diagram

Search
Clear search
Close search
Google apps
Main menu