4 datasets found
  1. All NanoPUZZLES ISA-TAB-Nano datasets

    • data.europa.eu
    • nanocommons.github.io
    unknown
    Updated Dec 24, 2015
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Zenodo (2015). All NanoPUZZLES ISA-TAB-Nano datasets [Dataset]. https://data.europa.eu/data/datasets/oai-zenodo-org-35493?locale=da
    Explore at:
    unknown(58723973)Available download formats
    Dataset updated
    Dec 24, 2015
    Dataset authored and provided by
    Zenodohttp://zenodo.org/
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This file is a ZIP archive which contains ALL publicly released ISA-TAB-Nano datasets developed within the NanoPUZZLES EU project [http://www.nanopuzzles.eu]. The (meta)data in these datasets were extracted from literature references. These datasets are also available via FigShare (see below). ****Any necessary updates, e.g. to correct errors not spotted during the review of the datasets within the NanoPUZZLES project prior to their being released, will be uploaded to FigShare and the changes documented in the FigShare dataset descriptions. This Zenodo entry corresponds to the original publicly released versions of these datasets.**** *****Before working with these datasets, you are strongly advised to read the following text - especially the "Disclaimers".***** ISA-TAB-Nano [1,2,3] has been proposed as a nanomaterial data exchange standard. As is explained in the README file contained within each dataset, as well as the "Investigation Description" field of the Investigation file regarding dataset specific deviations, the manner in which certain data and metadata were recorded within these datasets deviates from the expectations of the generic ISA-TAB-Nano specification. Marchese Robinson et al. [3], distributed within each dataset, discusses this in more detail. However, some additional new business rules, going beyond those described in Marchese Robinson et al. [3], may also have been applied to each dataset - as documented in the README file. Each dataset was developed using Excel-based templates developed in the NanoPUZZLES project [4]. (N.B. The latest version of the templates, at the time of writing, was version 4 as opposed to version 3 which was described in Marchese Robinson et al. [3]. This latest version of the templates should be contained within the README file of each dataset.) Since these templates were iteratively updated, not all datasets may be perfectly consistent with the latest version - although efforts were made to minimise inconsistencies. The three copies of each dataset contained within each individual [DATASET ID]_all_copies.zip are as follows: (a) [DATASET ID].zip: the original dataset prepared within Excel (b) [DATASET ID]-txt_opt-N.zip: a tab-delimited text version of each dataset prepared using version 2.0 of the cited Python program [5], with the -N flag selected (designed to minimise inconsistencies with the latest version of the NanoPUZZLES templates) (c) [DATASET ID]-txt_opt-a_opt-c_opt-N.zip: a tab-delimited text version of each dataset prepared using version 2.0 of the cited Python program [5], with the -N, -a (truncate ontology IDs) and -c (remove Investigation file comments) flags selected, as required for submission to the nanoDMS online database system [3,6]. The original datasets prepared in Excel were prepared via manual curation. In some cases, it was necessary to extract data from graphs. In some cases, the GSYS software program was employed to facilitate estimation of the values of numerical data points reported in graphs [7,8]. Disclaimers: (1) this work has not undergone peer review (2) no endorsement by third parties should be inferred (3) *You are strongly advised to read the README file and the "Investigation Description" field of the Investigation file before working with anyone of these datasets. The latter field may document dataset specific caveats such as possible problems or uncertainties associated with curation from the original reference(s). *Other such comments may be found in Study, Material or Assay file "Comment" fields. Cited references: [1] Thomas, D.G. et al. BMC Biotechnol. 2013, 13, 2. doi:10.1186/1472-6750-13-2 [2] https://wiki.nci.nih.gov/display/ICR/ISA-TAB-Nano (accessed 18th of December 2015) [3] Marchese Robinson, R.L. et al. Beilstein J. Nanotechnol. 2015, 6, 1978–1999. doi:10.3762/bjnano.6.202 [4] http://www.myexperiment.org/files/1356.html (accessed 18th of December 2015) [5] https://github.com/RichardLMR/xls2txtISA.NANO.archive (accessed 18th of December 2015) [6] http://biocenitc-deq.urv.cat/nanodms (accessed 18th of December 2015) [7] http://www.jcprg.org/gsys/2.4/ (last accessed 11th of April 2016) [8] R. Suzuki, "Introduction, Design and Implementation of Digitization Software GSYS", IAEA Report INDC(NDS)-0629, p. 19, IAEA, Vienna, Austria (2013) FigShare versions: https://figshare.com/articles/NanoPUZZLES_ISA_TAB_Nano_dataset_Cytotoxicity_and_some_physicochemical_data_reported_by_Wang_et_al_2014_DOI_10_3109_17435390_2013_796534_/2056140 https://figshare.com/articles/NanoPUZZLES_ISA_TAB_Nano_dataset_Zebrafish_mortality_and_basic_nanomaterial_composition_data_extracted_from_Kovriznych_et_al_2013_doi_10_2478_intox_2013_0012_/2056137 https://figshare.com/articles/NanoPUZZLES_ISA_TAB_Nano_dataset_Physicochemical_and_in_vitro_cytotoxicity_data_LDH_membrane_damage_extracted_from_Sayes_and_Ivanov_2010_DOI_10_1111_j_1539_6924_2010_01438_x_/2056134 https://figshare.com/articles/NanoPUZZLES_ISA_TAB_Nano_dataset_Cytotoxicity_and_phys

  2. m

    The banksia plot: a method for visually comparing point estimates and...

    • bridges.monash.edu
    • datasetcatalog.nlm.nih.gov
    • +1more
    txt
    Updated Oct 15, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Simon Turner; Amalia Karahalios; Elizabeth Korevaar; Joanne E. McKenzie (2024). The banksia plot: a method for visually comparing point estimates and confidence intervals across datasets [Dataset]. http://doi.org/10.26180/25286407.v2
    Explore at:
    txtAvailable download formats
    Dataset updated
    Oct 15, 2024
    Dataset provided by
    Monash University
    Authors
    Simon Turner; Amalia Karahalios; Elizabeth Korevaar; Joanne E. McKenzie
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Companion data for the creation of a banksia plot:Background:In research evaluating statistical analysis methods, a common aim is to compare point estimates and confidence intervals (CIs) calculated from different analyses. This can be challenging when the outcomes (and their scale ranges) differ across datasets. We therefore developed a plot to facilitate pairwise comparisons of point estimates and confidence intervals from different statistical analyses both within and across datasets.Methods:The plot was developed and refined over the course of an empirical study. To compare results from a variety of different studies, a system of centring and scaling is used. Firstly, the point estimates from reference analyses are centred to zero, followed by scaling confidence intervals to span a range of one. The point estimates and confidence intervals from matching comparator analyses are then adjusted by the same amounts. This enables the relative positions of the point estimates and CI widths to be quickly assessed while maintaining the relative magnitudes of the difference in point estimates and confidence interval widths between the two analyses. Banksia plots can be graphed in a matrix, showing all pairwise comparisons of multiple analyses. In this paper, we show how to create a banksia plot and present two examples: the first relates to an empirical evaluation assessing the difference between various statistical methods across 190 interrupted time series (ITS) data sets with widely varying characteristics, while the second example assesses data extraction accuracy comparing results obtained from analysing original study data (43 ITS studies) with those obtained by four researchers from datasets digitally extracted from graphs from the accompanying manuscripts.Results:In the banksia plot of statistical method comparison, it was clear that there was no difference, on average, in point estimates and it was straightforward to ascertain which methods resulted in smaller, similar or larger confidence intervals than others. In the banksia plot comparing analyses from digitally extracted data to those from the original data it was clear that both the point estimates and confidence intervals were all very similar among data extractors and original data.Conclusions:The banksia plot, a graphical representation of centred and scaled confidence intervals, provides a concise summary of comparisons between multiple point estimates and associated CIs in a single graph. Through this visualisation, patterns and trends in the point estimates and confidence intervals can be easily identified.This collection of files allows the user to create the images used in the companion paper and amend this code to create their own banksia plots using either Stata version 17 or R version 4.3.1

  3. SciTail (Multiple-choice science exams)

    • kaggle.com
    zip
    Updated Nov 29, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Devastator (2022). SciTail (Multiple-choice science exams) [Dataset]. https://www.kaggle.com/datasets/thedevastator/futuristic-natural-language-inference-with-the-s
    Explore at:
    zip(7959679 bytes)Available download formats
    Dataset updated
    Nov 29, 2022
    Authors
    The Devastator
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    SciTail (Multiple-choice science exams)

    27,026 Multiple-choice science exams and web sentences

    By Huggingface Hub [source]

    About this dataset

    The Scitail dataset is your gateway to unlocking powerful and advanced Sci-Fi Natural Language Inference (NLI) algorithms. With data sourced from popular books, movies, and TV shows in the genre, this dataset gives you the opportunity to develop and train NLI algorithms capable of understanding complex sci-fi conversations. Containing seven distinct formats including training sets for both predictor format and datagem format as well as testing sets in tsv format and SNLI format - all containing the same fields but in varied structures - this is an essential resource for any scientist looking to explore the realm of sci-fi NLI! Train your algorithm today with Scitail; unlock a future of supercharged Sci-Fi language processing!

    More Datasets

    For more datasets, click here.

    Featured Notebooks

    • 🚨 Your notebook can be here! 🚨!

    How to use the dataset

    This guide will explain how to use the Scitail dataset for Natural Language Inference (NLI). NLI is a machine learning task which involves making predictions about a statement’s labels, such as entailment, contradiction, or neutral. The Scitail dataset contains sci-fi samples sourced from various sources such as books, movies and TV shows that can be used to train and evaluate NLI algorithms.

    The Scitail dataset is split into seven different formats: Dataset Gem format for testing and training, Predictor format for validation and training, .TSV format for testing and validation. Each of these formats contain the same data fields in different forms; including premise, hypothesis, label (entailment/contradiction/neutral), label assigned by annotators etc.

    To get started using this dataset we recommend downloading the datasets in whichever format you prefer from Kaggle. All files are stored as csv’s with each row representing a single data point in the form of premise-hypothesis pairs with labels assigned by annotators which indicate whether two statements entail one another or not.

    Once you have downloaded your preferred datasets it’s time to prepare them for training or evaluation purposes; this includes formatting them correctly so they can be used properly by algorithms. To do so we suggest splitting your chosen file(s) into separate sets β€” training/validation β€” such that you have selected samples that are sufficiently representative of real-world language samples that demonstrate positive entailing relations as well examples where no entailing relation exists between two statements or uncertainty exists due to lack of evidence provided within a pair’s context i.e., neutral relation between two statements if ambiguity regarding outcome exists based on premises provided within those statements is present

    Research Ideas

    • Develop and fine-tune NLI algorithms with different levels of Sci-Fi language complexity.
    • Use the annotator labels to develop an automated human-in-the-loop approach to NLI algorithms.
    • Incorporate the hypothesis graph structure into existing models to improve accuracy and reduce error rates in identifying contextual comparisons between premises and hypotheses in Sci-Fi texts

    Acknowledgements

    If you use this dataset in your research, please credit the original authors. Data Source

    License

    License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.

    Columns

    File: dgem_format_test.csv | Column name | Description | |:-------------------------------|:-----------------------------------------------------------------------------------| | premise | The premise of the statement (String). | | hypothesis | The hypothesis of the statement (String). | | label | The label of the statement – either entailment, neutral or contradiction (String). | | hypothesis_graph_structure | A graph structure of the hypothesis (Graph) |

    File: predictor_format_validation.csv | Column name | Description ...

  4. School Enrollment, Primary (% Net)

    • kaggle.com
    zip
    Updated Dec 17, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Hafiz Amsal (2024). School Enrollment, Primary (% Net) [Dataset]. https://www.kaggle.com/datasets/hafizamsal/school-enrollment-primary-net
    Explore at:
    zip(46642 bytes)Available download formats
    Dataset updated
    Dec 17, 2024
    Authors
    Hafiz Amsal
    License

    https://www.worldbank.org/en/about/legal/terms-of-use-for-datasetshttps://www.worldbank.org/en/about/legal/terms-of-use-for-datasets

    Description

    Kaggle Dataset Description

    Title: School Enrollment, Primary (% Net)
    Subtitle: Exploring global trends in access to primary education.

    Detailed Description:
    This dataset contains data on net primary school enrollment rates, sourced from the World Bank. It measures the proportion of children enrolled in primary education who belong to the official age group for that level, expressed as a percentage of the total population of that age group.

    Key Highlights: - Annual data for countries worldwide.
    - Metric: Net primary school enrollment (%).
    - Use cases: Analyze trends, compare regional disparities, and study relationships with socio-economic factors like GDP, literacy, and gender equality.

    4. Exploratory Data Analysis (EDA)

    Notebook Ideas

    1. Data Cleaning:

      • Handle missing or inconsistent data points.
      • Normalize data for comparison across regions.
      • Aggregate data by regions (e.g., high-income vs. low-income countries).
    2. Visualizations:

      • Line Graph: Trends in net enrollment rates over time for selected countries.
      • Heatmap: Net enrollment rates by region and year.
      • Scatterplot: Correlation between net enrollment and GDP, literacy rates, or gender equality.
      • Bar Chart: Top and bottom countries by net enrollment for a specific year.
    3. Descriptive Analysis:

      • Highlight regions with near-universal enrollment.
      • Identify countries with significant improvements or declines in enrollment rates.
      • Analyze trends in gender disparities (if available).

    5. Predictive Analysis (Optional)

    • Use time-series forecasting (e.g., ARIMA or Prophet) to predict future enrollment rates for specific regions or countries.
    • Apply clustering algorithms to group countries with similar educational trends.

    6. Kaggle Notebook

    Create a Kaggle notebook with:
    1. Data Cleaning: Show how missing or inconsistent values are handled.
    2. EDA: Include visualizations like heatmaps, scatterplots, and line graphs.
    3. Insights: Highlight findings such as regions with the highest net enrollment or disparities over time.
    4. Optional Predictive Modeling: Use forecasting models to predict future enrollment trends.

    7. Call to Action

    For GitHub:

    • Share the GitHub repository link on LinkedIn, Twitter, and relevant forums.
    • Invite collaboration:
      • "Fork this repository and contribute by adding insights, analyses, or visualizations!"

    GitHub Link: https://github.com/AmsalAli/Primary_School_Enrollment_Trends

    For Kaggle:

    • Encourage upvotes:
      • "If this dataset is helpful, please upvote to make it more visible to the Kaggle community!"
    • Engage users with questions:
      • "Which countries have achieved universal primary school enrollment?"
      • "How does GDP or literacy impact primary school enrollment rates?"

    Kaggle Link: https://www.kaggle.com/datasets/yourusername/primary-school-enrollment

    8. LinkedIn Post

    Post Title:
    πŸ“š Global Trends in Primary School Enrollment 🌍

    Post Body:
    Excited to share my latest dataset on net primary school enrollment rates, sourced from the World Bank. This dataset measures the proportion of children enrolled in primary education who belong to the official age group for that level, offering key insights into education access globally.

    πŸ“‚ Explore the Dataset:
    - GitHub Repository: https://github.com/AmsalAli/Primary_School_Enrollment_Trends
    - Kaggle Dataset: https://www.kaggle.com/datasets/yourusername/primary-school-enrollment

    Why It Matters:

    Education is fundamental to global development. This dataset is ideal for:
    - Trend Analysis: Analyze primary school enrollment across countries and regions.
    - Regional Comparisons: Explore disparities in education access.
    - Correlations: Study relationships between enrollment rates, GDP, gender equality, and literacy.

    πŸ“ˆ Get Involved:
    - Use this dataset for analysis and visualizations.
    - Share your insights and findings.
    - Upvote on Kaggle if you find it useful to help others discover it!

    ❓ What trends or correlations do you see?
    - Which countries have achieved near-universal primary school enrollment?
    - What factors drive improvements in education access?

    Let me know your thoughts, and feel free to share this resource with your network! 🌟

    DataScience #Education #SchoolEnrollment #GlobalD...

  5. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Zenodo (2015). All NanoPUZZLES ISA-TAB-Nano datasets [Dataset]. https://data.europa.eu/data/datasets/oai-zenodo-org-35493?locale=da
Organization logo

All NanoPUZZLES ISA-TAB-Nano datasets

Explore at:
unknown(58723973)Available download formats
Dataset updated
Dec 24, 2015
Dataset authored and provided by
Zenodohttp://zenodo.org/
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

This file is a ZIP archive which contains ALL publicly released ISA-TAB-Nano datasets developed within the NanoPUZZLES EU project [http://www.nanopuzzles.eu]. The (meta)data in these datasets were extracted from literature references. These datasets are also available via FigShare (see below). ****Any necessary updates, e.g. to correct errors not spotted during the review of the datasets within the NanoPUZZLES project prior to their being released, will be uploaded to FigShare and the changes documented in the FigShare dataset descriptions. This Zenodo entry corresponds to the original publicly released versions of these datasets.**** *****Before working with these datasets, you are strongly advised to read the following text - especially the "Disclaimers".***** ISA-TAB-Nano [1,2,3] has been proposed as a nanomaterial data exchange standard. As is explained in the README file contained within each dataset, as well as the "Investigation Description" field of the Investigation file regarding dataset specific deviations, the manner in which certain data and metadata were recorded within these datasets deviates from the expectations of the generic ISA-TAB-Nano specification. Marchese Robinson et al. [3], distributed within each dataset, discusses this in more detail. However, some additional new business rules, going beyond those described in Marchese Robinson et al. [3], may also have been applied to each dataset - as documented in the README file. Each dataset was developed using Excel-based templates developed in the NanoPUZZLES project [4]. (N.B. The latest version of the templates, at the time of writing, was version 4 as opposed to version 3 which was described in Marchese Robinson et al. [3]. This latest version of the templates should be contained within the README file of each dataset.) Since these templates were iteratively updated, not all datasets may be perfectly consistent with the latest version - although efforts were made to minimise inconsistencies. The three copies of each dataset contained within each individual [DATASET ID]_all_copies.zip are as follows: (a) [DATASET ID].zip: the original dataset prepared within Excel (b) [DATASET ID]-txt_opt-N.zip: a tab-delimited text version of each dataset prepared using version 2.0 of the cited Python program [5], with the -N flag selected (designed to minimise inconsistencies with the latest version of the NanoPUZZLES templates) (c) [DATASET ID]-txt_opt-a_opt-c_opt-N.zip: a tab-delimited text version of each dataset prepared using version 2.0 of the cited Python program [5], with the -N, -a (truncate ontology IDs) and -c (remove Investigation file comments) flags selected, as required for submission to the nanoDMS online database system [3,6]. The original datasets prepared in Excel were prepared via manual curation. In some cases, it was necessary to extract data from graphs. In some cases, the GSYS software program was employed to facilitate estimation of the values of numerical data points reported in graphs [7,8]. Disclaimers: (1) this work has not undergone peer review (2) no endorsement by third parties should be inferred (3) *You are strongly advised to read the README file and the "Investigation Description" field of the Investigation file before working with anyone of these datasets. The latter field may document dataset specific caveats such as possible problems or uncertainties associated with curation from the original reference(s). *Other such comments may be found in Study, Material or Assay file "Comment" fields. Cited references: [1] Thomas, D.G. et al. BMC Biotechnol. 2013, 13, 2. doi:10.1186/1472-6750-13-2 [2] https://wiki.nci.nih.gov/display/ICR/ISA-TAB-Nano (accessed 18th of December 2015) [3] Marchese Robinson, R.L. et al. Beilstein J. Nanotechnol. 2015, 6, 1978–1999. doi:10.3762/bjnano.6.202 [4] http://www.myexperiment.org/files/1356.html (accessed 18th of December 2015) [5] https://github.com/RichardLMR/xls2txtISA.NANO.archive (accessed 18th of December 2015) [6] http://biocenitc-deq.urv.cat/nanodms (accessed 18th of December 2015) [7] http://www.jcprg.org/gsys/2.4/ (last accessed 11th of April 2016) [8] R. Suzuki, "Introduction, Design and Implementation of Digitization Software GSYS", IAEA Report INDC(NDS)-0629, p. 19, IAEA, Vienna, Austria (2013) FigShare versions: https://figshare.com/articles/NanoPUZZLES_ISA_TAB_Nano_dataset_Cytotoxicity_and_some_physicochemical_data_reported_by_Wang_et_al_2014_DOI_10_3109_17435390_2013_796534_/2056140 https://figshare.com/articles/NanoPUZZLES_ISA_TAB_Nano_dataset_Zebrafish_mortality_and_basic_nanomaterial_composition_data_extracted_from_Kovriznych_et_al_2013_doi_10_2478_intox_2013_0012_/2056137 https://figshare.com/articles/NanoPUZZLES_ISA_TAB_Nano_dataset_Physicochemical_and_in_vitro_cytotoxicity_data_LDH_membrane_damage_extracted_from_Sayes_and_Ivanov_2010_DOI_10_1111_j_1539_6924_2010_01438_x_/2056134 https://figshare.com/articles/NanoPUZZLES_ISA_TAB_Nano_dataset_Cytotoxicity_and_phys

Search
Clear search
Close search
Google apps
Main menu