Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This file is a ZIP archive which contains ALL publicly released ISA-TAB-Nano datasets developed within the NanoPUZZLES EU project [http://www.nanopuzzles.eu]. The (meta)data in these datasets were extracted from literature references. These datasets are also available via FigShare (see below). ****Any necessary updates, e.g. to correct errors not spotted during the review of the datasets within the NanoPUZZLES project prior to their being released, will be uploaded to FigShare and the changes documented in the FigShare dataset descriptions. This Zenodo entry corresponds to the original publicly released versions of these datasets.**** *****Before working with these datasets, you are strongly advised to read the following text - especially the "Disclaimers".***** ISA-TAB-Nano [1,2,3] has been proposed as a nanomaterial data exchange standard. As is explained in the README file contained within each dataset, as well as the "Investigation Description" field of the Investigation file regarding dataset specific deviations, the manner in which certain data and metadata were recorded within these datasets deviates from the expectations of the generic ISA-TAB-Nano specification. Marchese Robinson et al. [3], distributed within each dataset, discusses this in more detail. However, some additional new business rules, going beyond those described in Marchese Robinson et al. [3], may also have been applied to each dataset - as documented in the README file. Each dataset was developed using Excel-based templates developed in the NanoPUZZLES project [4]. (N.B. The latest version of the templates, at the time of writing, was version 4 as opposed to version 3 which was described in Marchese Robinson et al. [3]. This latest version of the templates should be contained within the README file of each dataset.) Since these templates were iteratively updated, not all datasets may be perfectly consistent with the latest version - although efforts were made to minimise inconsistencies. The three copies of each dataset contained within each individual [DATASET ID]_all_copies.zip are as follows: (a) [DATASET ID].zip: the original dataset prepared within Excel (b) [DATASET ID]-txt_opt-N.zip: a tab-delimited text version of each dataset prepared using version 2.0 of the cited Python program [5], with the -N flag selected (designed to minimise inconsistencies with the latest version of the NanoPUZZLES templates) (c) [DATASET ID]-txt_opt-a_opt-c_opt-N.zip: a tab-delimited text version of each dataset prepared using version 2.0 of the cited Python program [5], with the -N, -a (truncate ontology IDs) and -c (remove Investigation file comments) flags selected, as required for submission to the nanoDMS online database system [3,6]. The original datasets prepared in Excel were prepared via manual curation. In some cases, it was necessary to extract data from graphs. In some cases, the GSYS software program was employed to facilitate estimation of the values of numerical data points reported in graphs [7,8]. Disclaimers: (1) this work has not undergone peer review (2) no endorsement by third parties should be inferred (3) *You are strongly advised to read the README file and the "Investigation Description" field of the Investigation file before working with anyone of these datasets. The latter field may document dataset specific caveats such as possible problems or uncertainties associated with curation from the original reference(s). *Other such comments may be found in Study, Material or Assay file "Comment" fields. Cited references: [1] Thomas, D.G. et al. BMC Biotechnol. 2013, 13, 2. doi:10.1186/1472-6750-13-2 [2] https://wiki.nci.nih.gov/display/ICR/ISA-TAB-Nano (accessed 18th of December 2015) [3] Marchese Robinson, R.L. et al. Beilstein J. Nanotechnol. 2015, 6, 1978β1999. doi:10.3762/bjnano.6.202 [4] http://www.myexperiment.org/files/1356.html (accessed 18th of December 2015) [5] https://github.com/RichardLMR/xls2txtISA.NANO.archive (accessed 18th of December 2015) [6] http://biocenitc-deq.urv.cat/nanodms (accessed 18th of December 2015) [7] http://www.jcprg.org/gsys/2.4/ (last accessed 11th of April 2016) [8] R. Suzuki, "Introduction, Design and Implementation of Digitization Software GSYS", IAEA Report INDC(NDS)-0629, p. 19, IAEA, Vienna, Austria (2013) FigShare versions: https://figshare.com/articles/NanoPUZZLES_ISA_TAB_Nano_dataset_Cytotoxicity_and_some_physicochemical_data_reported_by_Wang_et_al_2014_DOI_10_3109_17435390_2013_796534_/2056140 https://figshare.com/articles/NanoPUZZLES_ISA_TAB_Nano_dataset_Zebrafish_mortality_and_basic_nanomaterial_composition_data_extracted_from_Kovriznych_et_al_2013_doi_10_2478_intox_2013_0012_/2056137 https://figshare.com/articles/NanoPUZZLES_ISA_TAB_Nano_dataset_Physicochemical_and_in_vitro_cytotoxicity_data_LDH_membrane_damage_extracted_from_Sayes_and_Ivanov_2010_DOI_10_1111_j_1539_6924_2010_01438_x_/2056134 https://figshare.com/articles/NanoPUZZLES_ISA_TAB_Nano_dataset_Cytotoxicity_and_phys
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Companion data for the creation of a banksia plot:Background:In research evaluating statistical analysis methods, a common aim is to compare point estimates and confidence intervals (CIs) calculated from different analyses. This can be challenging when the outcomes (and their scale ranges) differ across datasets. We therefore developed a plot to facilitate pairwise comparisons of point estimates and confidence intervals from different statistical analyses both within and across datasets.Methods:The plot was developed and refined over the course of an empirical study. To compare results from a variety of different studies, a system of centring and scaling is used. Firstly, the point estimates from reference analyses are centred to zero, followed by scaling confidence intervals to span a range of one. The point estimates and confidence intervals from matching comparator analyses are then adjusted by the same amounts. This enables the relative positions of the point estimates and CI widths to be quickly assessed while maintaining the relative magnitudes of the difference in point estimates and confidence interval widths between the two analyses. Banksia plots can be graphed in a matrix, showing all pairwise comparisons of multiple analyses. In this paper, we show how to create a banksia plot and present two examples: the first relates to an empirical evaluation assessing the difference between various statistical methods across 190 interrupted time series (ITS) data sets with widely varying characteristics, while the second example assesses data extraction accuracy comparing results obtained from analysing original study data (43 ITS studies) with those obtained by four researchers from datasets digitally extracted from graphs from the accompanying manuscripts.Results:In the banksia plot of statistical method comparison, it was clear that there was no difference, on average, in point estimates and it was straightforward to ascertain which methods resulted in smaller, similar or larger confidence intervals than others. In the banksia plot comparing analyses from digitally extracted data to those from the original data it was clear that both the point estimates and confidence intervals were all very similar among data extractors and original data.Conclusions:The banksia plot, a graphical representation of centred and scaled confidence intervals, provides a concise summary of comparisons between multiple point estimates and associated CIs in a single graph. Through this visualisation, patterns and trends in the point estimates and confidence intervals can be easily identified.This collection of files allows the user to create the images used in the companion paper and amend this code to create their own banksia plots using either Stata version 17 or R version 4.3.1
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
By Huggingface Hub [source]
The Scitail dataset is your gateway to unlocking powerful and advanced Sci-Fi Natural Language Inference (NLI) algorithms. With data sourced from popular books, movies, and TV shows in the genre, this dataset gives you the opportunity to develop and train NLI algorithms capable of understanding complex sci-fi conversations. Containing seven distinct formats including training sets for both predictor format and datagem format as well as testing sets in tsv format and SNLI format - all containing the same fields but in varied structures - this is an essential resource for any scientist looking to explore the realm of sci-fi NLI! Train your algorithm today with Scitail; unlock a future of supercharged Sci-Fi language processing!
For more datasets, click here.
- π¨ Your notebook can be here! π¨!
This guide will explain how to use the Scitail dataset for Natural Language Inference (NLI). NLI is a machine learning task which involves making predictions about a statementβs labels, such as entailment, contradiction, or neutral. The Scitail dataset contains sci-fi samples sourced from various sources such as books, movies and TV shows that can be used to train and evaluate NLI algorithms.
The Scitail dataset is split into seven different formats: Dataset Gem format for testing and training, Predictor format for validation and training, .TSV format for testing and validation. Each of these formats contain the same data fields in different forms; including premise, hypothesis, label (entailment/contradiction/neutral), label assigned by annotators etc.
To get started using this dataset we recommend downloading the datasets in whichever format you prefer from Kaggle. All files are stored as csvβs with each row representing a single data point in the form of premise-hypothesis pairs with labels assigned by annotators which indicate whether two statements entail one another or not.
Once you have downloaded your preferred datasets itβs time to prepare them for training or evaluation purposes; this includes formatting them correctly so they can be used properly by algorithms. To do so we suggest splitting your chosen file(s) into separate sets β training/validation β such that you have selected samples that are sufficiently representative of real-world language samples that demonstrate positive entailing relations as well examples where no entailing relation exists between two statements or uncertainty exists due to lack of evidence provided within a pairβs context i.e., neutral relation between two statements if ambiguity regarding outcome exists based on premises provided within those statements is present
- Develop and fine-tune NLI algorithms with different levels of Sci-Fi language complexity.
- Use the annotator labels to develop an automated human-in-the-loop approach to NLI algorithms.
- Incorporate the hypothesis graph structure into existing models to improve accuracy and reduce error rates in identifying contextual comparisons between premises and hypotheses in Sci-Fi texts
If you use this dataset in your research, please credit the original authors. Data Source
License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.
File: dgem_format_test.csv | Column name | Description | |:-------------------------------|:-----------------------------------------------------------------------------------| | premise | The premise of the statement (String). | | hypothesis | The hypothesis of the statement (String). | | label | The label of the statement β either entailment, neutral or contradiction (String). | | hypothesis_graph_structure | A graph structure of the hypothesis (Graph) |
File: predictor_format_validation.csv | Column name | Description ...
Facebook
Twitterhttps://www.worldbank.org/en/about/legal/terms-of-use-for-datasetshttps://www.worldbank.org/en/about/legal/terms-of-use-for-datasets
Title: School Enrollment, Primary (% Net)
Subtitle: Exploring global trends in access to primary education.
Detailed Description:
This dataset contains data on net primary school enrollment rates, sourced from the World Bank. It measures the proportion of children enrolled in primary education who belong to the official age group for that level, expressed as a percentage of the total population of that age group.
Key Highlights:
- Annual data for countries worldwide.
- Metric: Net primary school enrollment (%).
- Use cases: Analyze trends, compare regional disparities, and study relationships with socio-economic factors like GDP, literacy, and gender equality.
Data Cleaning:
Visualizations:
Descriptive Analysis:
Create a Kaggle notebook with:
1. Data Cleaning: Show how missing or inconsistent values are handled.
2. EDA: Include visualizations like heatmaps, scatterplots, and line graphs.
3. Insights: Highlight findings such as regions with the highest net enrollment or disparities over time.
4. Optional Predictive Modeling: Use forecasting models to predict future enrollment trends.
GitHub Link: https://github.com/AmsalAli/Primary_School_Enrollment_Trends
Kaggle Link: https://www.kaggle.com/datasets/yourusername/primary-school-enrollment
Post Title:
π Global Trends in Primary School Enrollment π
Post Body:
Excited to share my latest dataset on net primary school enrollment rates, sourced from the World Bank. This dataset measures the proportion of children enrolled in primary education who belong to the official age group for that level, offering key insights into education access globally.
π Explore the Dataset:
- GitHub Repository: https://github.com/AmsalAli/Primary_School_Enrollment_Trends
- Kaggle Dataset: https://www.kaggle.com/datasets/yourusername/primary-school-enrollment
Education is fundamental to global development. This dataset is ideal for:
- Trend Analysis: Analyze primary school enrollment across countries and regions.
- Regional Comparisons: Explore disparities in education access.
- Correlations: Study relationships between enrollment rates, GDP, gender equality, and literacy.
π Get Involved:
- Use this dataset for analysis and visualizations.
- Share your insights and findings.
- Upvote on Kaggle if you find it useful to help others discover it!
β What trends or correlations do you see?
- Which countries have achieved near-universal primary school enrollment?
- What factors drive improvements in education access?
Let me know your thoughts, and feel free to share this resource with your network! π
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This file is a ZIP archive which contains ALL publicly released ISA-TAB-Nano datasets developed within the NanoPUZZLES EU project [http://www.nanopuzzles.eu]. The (meta)data in these datasets were extracted from literature references. These datasets are also available via FigShare (see below). ****Any necessary updates, e.g. to correct errors not spotted during the review of the datasets within the NanoPUZZLES project prior to their being released, will be uploaded to FigShare and the changes documented in the FigShare dataset descriptions. This Zenodo entry corresponds to the original publicly released versions of these datasets.**** *****Before working with these datasets, you are strongly advised to read the following text - especially the "Disclaimers".***** ISA-TAB-Nano [1,2,3] has been proposed as a nanomaterial data exchange standard. As is explained in the README file contained within each dataset, as well as the "Investigation Description" field of the Investigation file regarding dataset specific deviations, the manner in which certain data and metadata were recorded within these datasets deviates from the expectations of the generic ISA-TAB-Nano specification. Marchese Robinson et al. [3], distributed within each dataset, discusses this in more detail. However, some additional new business rules, going beyond those described in Marchese Robinson et al. [3], may also have been applied to each dataset - as documented in the README file. Each dataset was developed using Excel-based templates developed in the NanoPUZZLES project [4]. (N.B. The latest version of the templates, at the time of writing, was version 4 as opposed to version 3 which was described in Marchese Robinson et al. [3]. This latest version of the templates should be contained within the README file of each dataset.) Since these templates were iteratively updated, not all datasets may be perfectly consistent with the latest version - although efforts were made to minimise inconsistencies. The three copies of each dataset contained within each individual [DATASET ID]_all_copies.zip are as follows: (a) [DATASET ID].zip: the original dataset prepared within Excel (b) [DATASET ID]-txt_opt-N.zip: a tab-delimited text version of each dataset prepared using version 2.0 of the cited Python program [5], with the -N flag selected (designed to minimise inconsistencies with the latest version of the NanoPUZZLES templates) (c) [DATASET ID]-txt_opt-a_opt-c_opt-N.zip: a tab-delimited text version of each dataset prepared using version 2.0 of the cited Python program [5], with the -N, -a (truncate ontology IDs) and -c (remove Investigation file comments) flags selected, as required for submission to the nanoDMS online database system [3,6]. The original datasets prepared in Excel were prepared via manual curation. In some cases, it was necessary to extract data from graphs. In some cases, the GSYS software program was employed to facilitate estimation of the values of numerical data points reported in graphs [7,8]. Disclaimers: (1) this work has not undergone peer review (2) no endorsement by third parties should be inferred (3) *You are strongly advised to read the README file and the "Investigation Description" field of the Investigation file before working with anyone of these datasets. The latter field may document dataset specific caveats such as possible problems or uncertainties associated with curation from the original reference(s). *Other such comments may be found in Study, Material or Assay file "Comment" fields. Cited references: [1] Thomas, D.G. et al. BMC Biotechnol. 2013, 13, 2. doi:10.1186/1472-6750-13-2 [2] https://wiki.nci.nih.gov/display/ICR/ISA-TAB-Nano (accessed 18th of December 2015) [3] Marchese Robinson, R.L. et al. Beilstein J. Nanotechnol. 2015, 6, 1978β1999. doi:10.3762/bjnano.6.202 [4] http://www.myexperiment.org/files/1356.html (accessed 18th of December 2015) [5] https://github.com/RichardLMR/xls2txtISA.NANO.archive (accessed 18th of December 2015) [6] http://biocenitc-deq.urv.cat/nanodms (accessed 18th of December 2015) [7] http://www.jcprg.org/gsys/2.4/ (last accessed 11th of April 2016) [8] R. Suzuki, "Introduction, Design and Implementation of Digitization Software GSYS", IAEA Report INDC(NDS)-0629, p. 19, IAEA, Vienna, Austria (2013) FigShare versions: https://figshare.com/articles/NanoPUZZLES_ISA_TAB_Nano_dataset_Cytotoxicity_and_some_physicochemical_data_reported_by_Wang_et_al_2014_DOI_10_3109_17435390_2013_796534_/2056140 https://figshare.com/articles/NanoPUZZLES_ISA_TAB_Nano_dataset_Zebrafish_mortality_and_basic_nanomaterial_composition_data_extracted_from_Kovriznych_et_al_2013_doi_10_2478_intox_2013_0012_/2056137 https://figshare.com/articles/NanoPUZZLES_ISA_TAB_Nano_dataset_Physicochemical_and_in_vitro_cytotoxicity_data_LDH_membrane_damage_extracted_from_Sayes_and_Ivanov_2010_DOI_10_1111_j_1539_6924_2010_01438_x_/2056134 https://figshare.com/articles/NanoPUZZLES_ISA_TAB_Nano_dataset_Cytotoxicity_and_phys