Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Raw data outputs 1-18 Raw data output 1. Differentially expressed genes in AML CSCs compared with GTCs as well as in TCGA AML cancer samples compared with normal ones. This data was generated based on the results of AML microarray and TCGA data analysis. Raw data output 2. Commonly and uniquely differentially expressed genes in AML CSC/GTC microarray and TCGA bulk RNA-seq datasets. This data was generated based on the results of AML microarray and TCGA data analysis. Raw data output 3. Common differentially expressed genes between training and test set samples the microarray dataset. This data was generated based on the results of AML microarray data analysis. Raw data output 4. Detailed information on the samples of the breast cancer microarray dataset (GSE52327) used in this study. Raw data output 5. Differentially expressed genes in breast CSCs compared with GTCs as well as in TCGA BRCA cancer samples compared with normal ones. Raw data output 6. Commonly and uniquely differentially expressed genes in breast cancer CSC/GTC microarray and TCGA BRCA bulk RNA-seq datasets. This data was generated based on the results of breast cancer microarray and TCGA BRCA data analysis. CSC, and GTC are abbreviations of cancer stem cell, and general tumor cell, respectively. Raw data output 7. Differential and common co-expression and protein-protein interaction of genes between CSC and GTC samples. This data was generated based on the results of AML microarray and STRING database-based protein-protein interaction data analysis. CSC, and GTC are abbreviations of cancer stem cell, and general tumor cell, respectively. Raw data output 8. Differentially expressed genes between AML dormant and active CSCs. This data was generated based on the results of AML scRNA-seq data analysis. Raw data output 9. Uniquely expressed genes in dormant or active AML CSCs. This data was generated based on the results of AML scRNA-seq data analysis. Raw data output 10. Intersections between the targeting transcription factors of AML key CSC genes and differentially expressed genes between AML CSCs vs GTCs and between dormant and active AML CSCs or the uniquely expressed genes in either class of CSCs. Raw data output 11. Targeting desirableness score of AML key CSC genes and their targeting transcription factors. These scores were generated based on an in-house scoring function described in the Methods section. Raw data output 12. CSC-specific targeting desirableness score of AML key CSC genes and their targeting transcription factors. These scores were generated based on an in-house scoring function described in the Methods section. Raw data output 13. The protein-protein interactions between AML key CSC genes with themselves and their targeting transcription factors. This data was generated based on the results of AML microarray and STRING database-based protein-protein interaction data analysis. Raw data output 14. The previously confirmed associations of genes having the highest targeting desirableness and CSC-specific targeting desirableness scores with AML or other cancers’ (stem) cells as well as hematopoietic stem cells. These data were generated based on a PubMed database-based literature mining. Raw data output 15. Drug score of available drugs and bioactive small molecules targeting AML key CSC genes and/or their targeting transcription factors. These scores were generated based on an in-house scoring function described in the Methods section. Raw data output 16. CSC-specific drug score of available drugs and bioactive small molecules targeting AML key CSC genes and/or their targeting transcription factors. These scores were generated based on an in-house scoring function described in the Methods section. Raw data output 17. Candidate drugs for experimental validation. These drugs were selected based on their respective (CSC-specific) drug scores. CSC is the abbreviation of cancer stem cell. Raw data output 18. Detailed information on the samples of the AML microarray dataset GSE30375 used in this study.
Facebook
TwitterThis dataset was created by Pinky Verma
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Sample data for exercises in Further Adventures in Data Cleaning.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Complete annotations for the tabular data are presented below. Tab Fig 1: (A) The heatmap data of G protein family members in the hippocampal tissue of 6-month-old Wildtype (n = 6) and 5xFAD (n = 6) mice; (B) The heatmap data of G protein family members in the cortical tissue of 6-month-old Wildtype (n = 6) and 5xFAD (n = 6) mice; (C) The data in the overlapping part of the Venn diagram (132 elements); (D) The data information for creating volcano plot; (E) The data information for creating heatmap of GPCR-related DEGs; (F) Expression of Gnb5 in the large sample dataset GSE44772; Control, n = 303; AD, n = 387; (H) Statistical analysis of Gnb5 protein levels from panel G; Wildtype, n = 4; 5xFAD, n = 4; (J) Statistical analysis of Gnb5 protein levels from panel I; Wildtype, n = 4; 5xFAD, n = 4; (L) Quantitative analysis of Gnb5 fluorescence intensity in 5xFAD and Wildtype groups; Wildtype, n = 4; 5xFAD, n = 4. Tab Fig 2: (D) qPCR data of Gnb5 knockout in hippocampal tissue; Gnb5F/F, n = 6; Gnb5-CCKO, n = 6; (E–I, L–N) Animal behavioral tests in mice, Gnb5F/F, n = 22; Gnb5-CCKO, n = 16; (E) Total distance traveled in the open field experiment; (F) Training curve in the Morris water maze (MWM); (F-day6) Data from the sixth day of MWM training; (G) Percentage of time spent by the mouse in the target quadrant in the MWM; (H) Statistical analysis of the number of times the mouse traverses the target quadrant in the MWM; (I) Latency to first reach the target quadrant in the MWM; (L) Baseline freezing percentage of mice in an identical testing context; (M) Percentage of freezing time of mice during the Context phase; (N) Percentage of freezing time of mice during the Cue phase. Tab Fig 3: (D–F, H) MWM tests in mice; Wildtype+AAV-GFP, n = 20; Wildtype+AAV-Gnb5-GFP, n = 23; 5xFAD + AAV-GFP, n = 23; 5xFAD + AAV-Gnb5-GFP, n = 26; (D) Training curve in the MWM; (D-day6) Data from the sixth day of MWM training; (E) Percentage of time spent in the target quadrant in the MWM; (F) Statistical analysis of the number of entries in the target quadrant in the MWM; (H) Movement speed of mice in the MWM; (I–K) The contextual fear conditioning test in mice; 5xFAD + AAV-GFP, n = 23; 5xFAD + AAV-Gnb5-GFP, n = 26; (I) Baseline freezing percentage of mice in an identical testing context; (J) Percentage of freezing time of mice during the Context phase; (K) Percentage of freezing time of mice during the Cue phase; (L) Total distance traveled in the open field test; (M) Percentage of time spent in the center area during the open field test. Tab Fig 4: (B, C) Quantification of Aβ plaques in the hippocampus sections from Wildtype and 5xFAD mice injected with either AAV-Gnb5 or AAV-GFP; Wildtype+AAV-GFP, n = 4; Wildtype+AAV-Gnb5-GFP, n = 4; 5xFAD + AAV-GFP, n = 4; 5xFAD + AAV-Gnb5-GFP, n = 4; (B) Quantification of Aβ plaques number; (C) Quantification of Aβ plaques size; (F, G) Quantification of Aβ pylaques from indicted mice lines; WT&Gnb5F/F&CamKIIa-CreERT+Vehicle, n = 4; 5xFAD&Gnb5F/F&CamKIIa-CreERT+Vehicle, n = 4; 5xFAD&Gnb5F/F&CamKIIa-CreERT+Tamoxifen, n = 4; (F) Quantification of Aβ plaque size; (G) Quantification of Aβ plaque number. Tab Fig 5: (B) Overexpression of Gnb5-AAV in 5xFAD mice affects the expression of proteins related to APP cleavage (BACE1, β-CTF, Nicastrin and APP); Statistical analysis of protein levels; n = 4, respectively; (D) Tamoxifen-induced Gnb5 knockdown in 5xFAD mice affects APP-cleaving proteins; Statistical analysis of protein levels; n = 4, respectively; (F) Gnb5-CCKO mice show altered expression of APP-cleaving proteins; Statistical analysis of protein levels; n = 6, respectively. Tab Fig 7: (C, D) Quantification of Aβ plaques in the overexpressed full-length Gnb5, truncated fragments, and mutant truncated fragment AAV in 5xFAD mice; n = 4, respectively; (C) Quantification of Aβ plaques size; (D) Quantification of Aβ plaques number; (F) Effect of overexpressing full-length Gnb5, truncated fragments, and mutant truncated fragment viruses on the expression of proteins related to APP cleavage process in 5xFAD; Statistical analysis of protein levels; n = 3, respectively. (XLSX)
Facebook
Twitterhttps://borealisdata.ca/api/datasets/:persistentId/versions/2.1/customlicense?persistentId=doi:10.5683/SP3/RQTHKBhttps://borealisdata.ca/api/datasets/:persistentId/versions/2.1/customlicense?persistentId=doi:10.5683/SP3/RQTHKB
The national Survey of Information Technology Occupations, conducted in 2002 on behalf of the Software Human Resource Council (SHRC), is the first to shed light on the IT labour market in both the public and private sectors. IT employers and employees were surveyed separately, but simultaneously. The employer survey consisted of questions on occupation profile, hiring and recruitment, employee retention, and training and development. The employee survey had questions on the occupational history of IT employees, salary, education, training, and skills. The target population consisted of private sector locations with at least six employees, and with at least one employee working in IT, as well as public-sector divisions with at least one IT employee. The NSITO is a three-stage survey. First, a sample of employers in both private and public sectors is selected; this is stage 1. The questions asked in stage 1 are essentially about the IT workforce. Stage 2 involves selecting a maximum of two occupations (out of 25) per employer. The questions asked in this stage deal with hiring, training and retaining employees in the selected occupations. In stage 3, a maximum of 10 employees are sampled for each occupation selected in stage 2. Among the subjects that employees are asked about are training, previous employment and demographic characteristics. For National Survey of Information Technology Occupations data, refer to Statistics Canada.
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This excel sheet contains the sample data set used for the training for ML models can be provided to the reader.
Facebook
Twitterhttp://www.gnu.org/licenses/lgpl-3.0.htmlhttp://www.gnu.org/licenses/lgpl-3.0.html
On the official website the dataset is available over SQL server (localhost) and CSVs to be used via Power BI Desktop running on Virtual Lab (Virtaul Machine). As per first two steps of Importing data are executed in the virtual lab and then resultant Power BI tables are copied in CSVs. Added records till year 2022 as required.
this dataset will be helpful in case you want to work offline with Adventure Works data in Power BI desktop in order to carry lab instructions as per training material on official website. The dataset is useful in case you want to work on Power BI desktop Sales Analysis example from Microsoft website PL 300 learning.
Download the CSV file(s) and import in Power BI desktop as tables. The CSVs are named as tables created after first two steps of importing data as mentioned in the PL-300 Microsoft Power BI Data Analyst exam lab.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Figures in scientific publications are critically important because they often show the data supporting key findings. Our systematic review of research articles published in top physiology journals (n = 703) suggests that, as scientists, we urgently need to change our practices for presenting continuous data in small sample size studies. Papers rarely included scatterplots, box plots, and histograms that allow readers to critically evaluate continuous data. Most papers presented continuous data in bar and line graphs. This is problematic, as many different data distributions can lead to the same bar or line graph. The full data may suggest different conclusions from the summary statistics. We recommend training investigators in data presentation, encouraging a more complete presentation of data, and changing journal editorial policies. Investigators can quickly make univariate scatterplots for small sample size studies using our Excel templates.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Finding a good data source is the first step toward creating a database. Cardiovascular illnesses (CVDs) are the major cause of death worldwide. CVDs include coronary heart disease, cerebrovascular disease, rheumatic heart disease, and other heart and blood vessel problems. According to the World Health Organization, 17.9 million people die each year. Heart attacks and strokes account for more than four out of every five CVD deaths, with one-third of these deaths occurring before the age of 70 A comprehensive database for factors that contribute to a heart attack has been constructed , The main purpose here is to collect characteristics of Heart Attack or factors that contribute to it. As a result, a form is created to accomplish this. Microsoft Excel was used to create this form. Figure 1 depicts the form which It has nine fields, where eight fields for input fields and one field for output field. Age, gender, heart rate, systolic BP, diastolic BP, blood sugar, CK-MB, and Test-Troponin are representing the input fields, while the output field pertains to the presence of heart attack, which is divided into two categories (negative and positive).negative refers to the absence of a heart attack, while positive refers to the presence of a heart attack.Table 1 show the detailed information and max and min of values attributes for 1319 cases in the whole database.To confirm the validity of this data, we looked at the patient files in the hospital archive and compared them with the data stored in the laboratories system. On the other hand, we interviewed the patients and specialized doctors. Table 2 is a sample for 1320 cases, which shows 44 cases and the factors that lead to a heart attack in the whole database,After collecting this data, we checked the data if it has null values (invalid values) or if there was an error during data collection. The value is null if it is unknown. Null values necessitate special treatment. This value is used to indicate that the target isn’t a valid data element. When trying to retrieve data that isn't present, you can come across the keyword null in Processing. If you try to do arithmetic operations on a numeric column with one or more null values, the outcome will be null. An example of a null values processing is shown in Figure 2.The data used in this investigation were scaled between 0 and 1 to guarantee that all inputs and outputs received equal attention and to eliminate their dimensionality. Prior to the use of AI models, data normalization has two major advantages. The first is to avoid overshadowing qualities in smaller numeric ranges by employing attributes in larger numeric ranges. The second goal is to avoid any numerical problems throughout the process.After completion of the normalization process, we split the data set into two parts - training and test sets. In the test, we have utilized1060 for train 259 for testing Using the input and output variables, modeling was implemented.
Facebook
TwitterThe Adventure Works dataset is a comprehensive and widely used sample database provided by Microsoft for educational and testing purposes. It's designed to represent a fictional company, Adventure Works Cycles, which is a global manufacturer of bicycles and related products. The dataset is often used for learning and practicing various data management, analysis, and reporting skills.
1. Company Overview: - Industry: Bicycle manufacturing - Operations: Global presence with various departments such as sales, production, and human resources.
2. Data Structure: - Tables: The dataset includes a variety of tables, typically organized into categories such as: - Sales: Information about sales orders, products, and customer details. - Production: Data on manufacturing processes, inventory, and product specifications. - Human Resources: Employee details, departments, and job roles. - Purchasing: Vendor information and purchase orders.
3. Sample Tables: - Sales.SalesOrderHeader: Contains information about sales orders, including order dates, customer IDs, and total amounts. - Sales.SalesOrderDetail: Details of individual items within each sales order, such as product ID, quantity, and unit price. - Production.Product: Information about the products being manufactured, including product names, categories, and prices. - Production.ProductCategory: Data on product categories, such as bicycles and accessories. - Person.Person: Contains personal information about employees and contacts, including names and addresses. - Purchasing.Vendor: Information on vendors that supply the company with materials.
4. Usage: - Training and Education: It's widely used for teaching SQL, data analysis, and database management. - Testing and Demonstrations: Useful for testing software features and demonstrating data-related functionalities.
5. Tools: - The dataset is often used with Microsoft SQL Server, but it's also compatible with other relational database systems.
The Adventure Works dataset provides a rich and realistic environment for practicing a range of data-related tasks, from querying and reporting to data modeling and analysis.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This Excel-file contains a set of data from a scoping review on methods for fire evacuation training in buildings. The review follows the PRISMA approach (Transparent Reporting of Systematic Reviews and Meta-Analyses) and systematically identifies 73 sources among scientific literature published between 1997 and 2022. The dataset contains information excerpted through a custom template on the employed training methods and technology, study information, participants, and contents of the discussion of the 73 sources of evidence that were identified in the systematic review process.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Trainees knowledge and skill transfer into practice in their working facility, northeast Ethiopia, October/2019 (N = 107).
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This zip file contains data files for 3 activities described in the accompanying PPT slides 1. an excel spreadsheet for analysing gain scores in a 2 group, 2 times data array. this activity requires access to –https://campbellcollaboration.org/research-resources/effect-size-calculator.html to calculate effect size.2. an AMOS path model and SPSS data set for an autoregressive, bivariate path model with cross-lagging. This activity is related to the following article: Brown, G. T. L., & Marshall, J. C. (2012). The impact of training students how to write introductions for academic essays: An exploratory, longitudinal study. Assessment & Evaluation in Higher Education, 37(6), 653-670. doi:10.1080/02602938.2011.5632773. an AMOS latent curve model and SPSS data set for a 3-time latent factor model with an interaction mixed model that uses GPA as a predictor of the LCM start and slope or change factors. This activity makes use of data reported previously and a published data analysis case: Peterson, E. R., Brown, G. T. L., & Jun, M. C. (2015). Achievement emotions in higher education: A diary study exploring emotions across an assessment event. Contemporary Educational Psychology, 42, 82-96. doi:10.1016/j.cedpsych.2015.05.002andBrown, G. T. L., & Peterson, E. R. (2018). Evaluating repeated diary study responses: Latent curve modeling. In SAGE Research Methods Cases Part 2. Retrieved from http://methods.sagepub.com/case/evaluating-repeated-diary-study-responses-latent-curve-modeling doi:10.4135/9781526431592
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Dataset on the Adoption of CAD/CAM Technology in Dominican Dental Practices
Description This dataset contains raw, anonymized data collected from a cross-sectional survey conducted between September and October 2024 among practicing dentists in the Dominican Republic. The objective of the study was to evaluate the adoption, knowledge, and use of Computer-Aided Design and Computer-Aided Manufacturing (CAD/CAM) technology in clinical practice.
Contents
Sample size: 370 respondents (general dentists and specialists).
Data type: Survey responses (Excel/CSV).
Variables included:
Demographics (age, gender, years in practice, type of practice).
Professional background (specialty, postgraduate training, work setting).
Knowledge and training in CAD/CAM.
Availability of CAD/CAM equipment.
Clinical use of CAD/CAM (indications, frequency, workflow).
Perceived barriers to adoption (cost, training, access to materials).
Attitudes and willingness to adopt digital technologies.
Format Data are provided in Microsoft Excel (.xlsx) with variables in columns and anonymized responses in rows.
Ethics All participants provided informed consent. No personal identifiers are included in the dataset.
Facebook
TwitterAttribution-NonCommercial 3.0 (CC BY-NC 3.0)https://creativecommons.org/licenses/by-nc/3.0/
License information was derived automatically
All raw data, processed data, and statistical outputs on which the study’s conclusions are based. Three types of data are present: questionnaire, performance, and physiological. Processed performance and questionnaire, as well as associated statistical outputs are data are in .sav and .spv format (SPSS). Physiological data and associated statistical outputs can be opened using Excel.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Corpus for the ICDAR2019 Competition on Post-OCR Text Correction (October 2019)Christophe Rigaud, Antoine Doucet, Mickael Coustaty, Jean-Philippe Moreuxhttp://l3i.univ-larochelle.fr/ICDAR2019PostOCR-------------------------------------------------------------------------------These are the supplementary materials for the ICDAR 2019 paper ICDAR 2019 Competition on Post-OCR Text CorrectionPlease use the following citation:@inproceedings{rigaud2019pocr,title=""ICDAR 2019 Competition on Post-OCR Text Correction"",author={Rigaud, Christophe and Doucet, Antoine and Coustaty, Mickael and Moreux, Jean-Philippe},year={2019},booktitle={Proceedings of the 15th International Conference on Document Analysis and Recognition (2019)}}
Description: The corpus accounts for 22M OCRed characters along with the corresponding Gold Standard (GS). The documents come from different digital collections available, among others, at the National Library of France (BnF) and the British Library (BL). The corresponding GS comes both from BnF's internal projects and external initiatives such as Europeana Newspapers, IMPACT, Project Gutenberg, Perseus and Wikisource. Repartition of the dataset- ICDAR2019_Post_OCR_correction_training_18M.zip: 80% of the full dataset, provided to train participants' methods.- ICDAR2019_Post_OCR_correction_evaluation_4M: 20% of the full dataset used for the evaluation (with Gold Standard made publicly after the competition).- ICDAR2019_Post_OCR_correction_full_22M: full dataset made publicly available after the competition. Special case for Finnish language Material from the National Library of Finland (Finnish dataset FI > FI1) are not allowed to be re-shared on other website. Please follow these guidelines to get and format the data from the original website.1. Go to https://digi.kansalliskirjasto.fi/opendata/submit?set_language=en;2. Download OCR Ground Truth Pages (Finnish Fraktur) [v1](4.8GB) from Digitalia (2015-17) package;3. Convert the Excel file ""~/metadata/nlf_ocr_gt_tescomb5_2017.xlsx"" as Comma Separated Format (.csv) by using save as function in a spreadsheet software (e.g. Excel, Calc) and copy it into ""FI/FI1/HOWTO_get_data/input/"";4. Go to ""FI/FI1/HOWTO_get_data/"" and run ""script_1.py"" to generate the full ""FI1"" dataset in ""output/full/"";4. Run ""script_2.py"" to split the ""output/full/"" dataset into ""output/training/"" and ""output/evaluation/"" sub sets.At the end of the process, you should have a ""training"", ""evaluation"" and ""full"" folder with 1579528, 380817 and 1960345 characters respectively.
Licenses: free to use for non-commercial uses, according to sources in details- BG1: IMPACT - National Library of Bulgaria: CC BY NC ND- CZ1: IMPACT - National Library of the Czech Republic: CC BY NC SA- DE1: Front pages of Swiss newspaper NZZ: Creative Commons Attribution 4.0 International (https://zenodo.org/record/3333627)- DE2: IMPACT - German National Library: CC BY NC ND- DE3: GT4Hist-dta19 dataset: CC-BY-SA 4.0 (https://zenodo.org/record/1344132)- DE4: GT4Hist - EarlyModernLatin: CC-BY-SA 4.0 (https://zenodo.org/record/1344132)- DE5: GT4Hist - Kallimachos: CC-BY-SA 4.0 (https://zenodo.org/record/1344132)- DE6: GT4Hist - RefCorpus-ENHG-Incunabula: CC-BY-SA 4.0 (https://zenodo.org/record/1344132)- DE7: GT4Hist - RIDGES-Fraktur: CC-BY-SA 4.0 (https://zenodo.org/record/1344132)- EN1: IMPACT - British Library: CC BY NC SA 3.0- ES1: IMPACT - National Library of Spain: CC BY NC SA- FI1: National Library of Finland: no re-sharing allowed, follow the above section to get the data. (https://digi.kansalliskirjasto.fi/opendata)- FR1: HIMANIS Project: CC0 (https://www.himanis.org)- FR2: IMPACT - National Library of France: CC BY NC SA 3.0- FR3: RECEIPT dataset: CC0 (http://findit.univ-lr.fr)- NL1: IMPACT - National library of the Netherlands: CC BY- PL1: IMPACT - National Library of Poland: CC BY- SL1: IMPACT - Slovak National Library: CC BY NCText post-processing such as cleaning and alignment have been applied on the resources mentioned above, so that the Gold Standard and the OCRs provided are not necessarily identical to the originals.
Structure- **Content** [./lang_type/sub_folder/#.txt] - ""[OCR_toInput] "" => Raw OCRed text to be de-noised. - ""[OCR_aligned] "" => Aligned OCRed text. - ""[ GS_aligned] "" => Aligned Gold Standard text.The aligned OCRed/GS texts are provided for training and test purposes. The alignment was made at the character level using ""@"" symbols. ""#"" symbols correspond to the absence of GS either related to alignment uncertainties or related to unreadable characters in the source document. For a better view of the alignment, make sure to disable the ""word wrap"" option in your text editor.The Error Rate and the quality of the alignment vary according to the nature and the state of degradation of the source documents. Periodicals (mostly historical newspapers) for example, due to their complex layout and their original fonts have been reported to be especially challenging. In addition, it should be mentioned that the quality of Gold Standard also varies as the dataset aggregates resources from different projects that have their own annotation procedure, and obviously contains some errors.
ICDAR2019 competitionInformation related to the tasks, formats and the evaluation metrics are details on :https://sites.google.com/view/icdar2019-postcorrectionocr/evaluation
References - IMPACT, European Commission's 7th Framework Program, grant agreement 215064 - Uwe Springmann, Christian Reul, Stefanie Dipper, Johannes Baiter (2018). Ground Truth for training OCR engines on historical documents in German Fraktur and Early Modern Latin. - https://digi.nationallibrary.fi , Wiipuri, 31.12.1904, Digital Collections of National Library of Finland- EU Horizon 2020 research and innovation programme grant agreement No 770299
Contact- christophe.rigaud(at)univ-lr.fr- antoine.doucet(at)univ-lr.fr- mickael.coustaty(at)univ-lr.fr- jean-philippe.moreux(at)bnf.frL3i - University of la Rochelle, http://l3i.univ-larochelle.frBnF - French National Library, http://www.bnf.fr
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Leaf Area Index (LAI) is a fundamental vegetation structural variable that drives energy and mass exchanges between the plant and the atmosphere. Moderate-resolution (300m – 7km) global LAI data products have been widely applied to track global vegetation changes, drive Earth system models, monitor crop growth and productivity, etc. Yet, cutting-edge applications in climate adaptation, hydrology, and sustainable agriculture require LAI information at higher spatial resolution (< 100m) to model and understand heterogeneous landscapes.
This dataset was built to assist a machine-learning-based approach for mapping LAI from 30m-resolution Landsat images across the contiguous US (CONUS). The data was derived from the Moderate Resolution Imaging Spectroradiometer (MODIS) Version 6 LAI/FPAR, Landsat Collection 1 surface reflectance, and NLCD Land Cover datasets over 2006 – 2018 using Google Earth Engine. Each record/sample/row includes a MODIS LAI value, corresponding Landsat surface reflectance in green, red, NIR, SWIR1 bands, a land cover (biome) type, geographic location, and other auxiliary information. Each sample represents a MODIS LAI pixel (500m) within which a single biome type dominates 90% of the area. The spatial homogeneity of the samples was further controlled by a screening process based on the coefficient of variation of the Landsat surface reflectance. In total, there are approximately 1.6 million samples, stratified by biome, Landsat sensor, and saturation status from the MODIS LAI algorithm. This dataset can be used to train machine learning models and generate LAI maps for Landsat 5, 7, 8 surface reflectance images within CONUS. Detailed information on the sample generation and quality control can be found in the related journal article. Resources in this dataset:Resource Title: README. File Name: LAI_train_samples_CONUS_README.txtResource Description: Description and metadata of the main datasetResource Software Recommended: Notepad,url: https://www.microsoft.com/en-us/p/windows-notepad/9msmlrh6lzf3?activetab=pivot:overviewtab Resource Title: LAI_training_samples_CONUS. File Name: LAI_train_samples_CONUS_v0.1.1.csvResource Description: This CSV file consists of the training samples for estimating Leaf Area Index based on Landsat surface reflectance images (Collection 1 Tire 1). Each sample has a MODIS LAI value and corresponding surface reflectance derived from Landsat pixels within the MODIS pixel.
Contact: Yanghui Kang (kangyanghui@gmail.com)
Column description
UID: Unique identifier. Format: LATITUDE_LONGITUDE_SENSOR_PATHROW_DATE
Landsat_ID: Landsat image ID
Date: Landsat image date in "YYYYMMDD"
Latitude: Latitude (WGS84) of the MODIS LAI pixel center
Longitude: Longitude (WGS84) of the MODIS LAI pixel center
MODIS_LAI: MODIS LAI value in "m2/m2"
MODIS_LAI_std: MODIS LAI standard deviation in "m2/m2"
MODIS_LAI_sat: 0 - MODIS Main (RT) method used no saturation; 1 - MODIS Main (RT) method with saturation
NLCD_class: Majority class code from the National Land Cover Dataset (NLCD)
NLCD_frequency: Percentage of the area cover by the majority class from NLCD
Biome: Biome type code mapped from NLCD (see below for more information)
Blue: Landsat surface reflectance in the blue band
Green: Landsat surface reflectance in the green band
Red: Landsat surface reflectance in the red band
Nir: Landsat surface reflectance in the near infrared band
Swir1: Landsat surface reflectance in the shortwave infrared 1 band
Swir2: Landsat surface reflectance in the shortwave infrared 2 band
Sun_zenith: Solar zenith angle from the Landsat image metadata. This is a scene-level value.
Sun_azimuth: Solar azimuth angle from the Landsat image metadata. This is a scene-level value.
NDVI: Normalized Difference Vegetation Index computed from Landsat surface reflectance
EVI: Enhanced Vegetation Index computed from Landsat surface reflectance
NDWI: Normalized Difference Water Index computed from Landsat surface reflectance
GCI: Green Chlorophyll Index = Nir/Green - 1
Biome code
1 - Deciduous Forest
2 - Evergreen Forest
3 - Mixed Forest
4 - Shrubland
5 - Grassland/Pasture
6 - Cropland
7 - Woody Wetland
8 - Herbaceous Wetland
Reference Dataset: All data was accessed through Google Earth Engine Gorelick, N., Hancher, M., Dixon, M., Ilyushchenko, S., Thau, D., & Moore, R. (2017). Google Earth Engine: Planetary-scale geospatial analysis for everyone. Remote Sensing of Environment. MODIS Version 6 Leaf Area Index/FPAR 4-day L5 Global 500m Myneni, R., Y. Knyazikhin, T. Park. MOD15A2H MODIS/Terra Leaf Area Index/FPAR 8-Day L4 Global 500m SIN Grid V006. 2015, distributed by NASA EOSDIS Land Processes DAAC, https://doi.org/10.5067/MODIS/MOD15A2H.006 Landsat 5/7/8 Collection 1 Surface Reflectance Landsat Level-2 Surface Reflectance Science Product courtesy of the U.S. Geological Survey. Masek, J.G., Vermote, E.F., Saleous N.E., Wolfe, R., Hall, F.G., Huemmrich, K.F., Gao, F., Kutler, J., and Lim, T-K. (2006). A Landsat surface reflectance dataset for North America, 1990–2000. IEEE Geoscience and Remote Sensing Letters 3(1):68-72. http://dx.doi.org/10.1109/LGRS.2005.857030. Vermote, E., Justice, C., Claverie, M., & Franch, B. (2016). Preliminary analysis of the performance of the Landsat 8/OLI land surface reflectance product. Remote Sensing of Environment. http://dx.doi.org/10.1016/j.rse.2016.04.008. National Land Cover Dataset (NLCD) Yang, Limin, Jin, Suming, Danielson, Patrick, Homer, Collin G., Gass, L., Bender, S.M., Case, Adam, Costello, C., Dewitz, Jon A., Fry, Joyce A., Funk, M., Granneman, Brian J., Liknes, G.C., Rigge, Matthew B., Xian, George, A new generation of the United States National Land Cover Database—Requirements, research priorities, design, and implementation strategies: ISPRS Journal of Photogrammetry and Remote Sensing, v. 146, p. 108–123, at https://doi.org/10.1016/j.isprsjprs.2018.09.006 Resource Software Recommended: Microsoft Excel,url: https://www.microsoft.com/en-us/microsoft-365/excel
Facebook
TwitterThe data produced by this project will be behavioural and physiological measures of reactions shown by hens living in different environments. These data will be thermal images, logged physiological and physical condition data, behavioural data recorded on digital video, and results from preference and cognitive bias tests. In line with the BBSRC Statement on Safeguarding Good Scientific Practice the above data will be retained for a period of ten years after completion of the project. Secondary use We cannot identify immediate opportunities for secondary use, although it is possible that the data could be used in work by others examining the ability of humans to assess animal welfare, or in training welfare assessors. We would be happy to provide data for these purposes. Methods and timeframe for data sharing Following publication of our results, or within 3 years of the termination of the grant, whichever is sooner, our data will be made available on request to bona fide researchers. All data will be labelled using systematic filenames for data identification. Behavioural, physiological and physical condition data will be supplied in a standard electronic spreadsheet format (Microsoft Excel files) and will be supplied with accompanying metadata describing the history of the birds and the details of the tasks from which the data were obtained. Video footage of behavioural tests will be available electronically in a standard movie format (e.g. MPG files). Thermal image files will be stored as a standard photograph format (e.g. JPEG), suitable for export into most thermal image software packages (e.g. ThermaCam reporter). Data extracted from thermal image files for our studies will be made available in a standard electronic spreadsheet format (Microsoft Excel).
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The dataset contains two sub-folders:1. Load sharing 2. function approximation1. Load sharing ("data.zip"):Each folder corresponds with a subject.the monopolar, single differential and double differential data issaved in the corresponding sub-folders 'mono', 'sd' and 'dd' respectively.In each subfolder, the data is saved as '30.mat','50mat',or '70.mat' corresponding with 30%,50% or 70% MVC isometric flexion-extension.The recording protocol can be found the word file 'report.doc' in this folder inThe subsection: experimental recording.Structure of the '.mat' files :They all have the same structure:Raw_Torque : The measured Torque in ADC numbersstructure 'TAB_ARV' , the EMG envelopes for 'BB', 'BR', 'TM', 'TL' (Read report for the methods and acronyms).2. function approximation ("fun_approx.zip")Multiple benchmark examples including a piecewise single variable function, five nonlinear dynamic plants with various nonlinear structures, the chaotic Mackey Glass time series (with different signal to noise ratio (SNR) and various chaotic degree) and the real-world Box-Jenkins gas furnace system are considered to verify the effectiveness of the proposed FJWNN model. The description ("info.pdf") and the entire simulated data as well as the results of our method on the training and test sets (in excel files) were provided.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Electrocardiography (ECG) is a key diagnostic tool to assess the cardiac condition of a patient. Automatic ECG interpretation algorithms as diagnosis support systems promise large reliefs for the medical personnel - only on the basis of the number of ECGs that are routinely taken. However, the development of such algorithms requires large training datasets and clear benchmark procedures. In our opinion, both aspects are not covered satisfactorily by existing freely accessible ECG datasets.
The PTB-XL ECG dataset is a large dataset of 21799 clinical 12-lead ECGs from 18869 patients of 10 second length. The raw waveform data was annotated by up to two cardiologists, who assigned potentially multiple ECG statements to each record. The in total 71 different ECG statements conform to the SCP-ECG standard and cover diagnostic, form, and rhythm statements. To ensure comparability of machine learning algorithms trained on the dataset, we provide recommended splits into training and test sets. In combination with the extensive annotation, this turns the dataset into a rich resource for the training and the evaluation of automatic ECG interpretation algorithms. The dataset is complemented by extensive metadata on demographics, infarction characteristics, likelihoods for diagnostic ECG statements as well as annotated signal properties.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Raw data outputs 1-18 Raw data output 1. Differentially expressed genes in AML CSCs compared with GTCs as well as in TCGA AML cancer samples compared with normal ones. This data was generated based on the results of AML microarray and TCGA data analysis. Raw data output 2. Commonly and uniquely differentially expressed genes in AML CSC/GTC microarray and TCGA bulk RNA-seq datasets. This data was generated based on the results of AML microarray and TCGA data analysis. Raw data output 3. Common differentially expressed genes between training and test set samples the microarray dataset. This data was generated based on the results of AML microarray data analysis. Raw data output 4. Detailed information on the samples of the breast cancer microarray dataset (GSE52327) used in this study. Raw data output 5. Differentially expressed genes in breast CSCs compared with GTCs as well as in TCGA BRCA cancer samples compared with normal ones. Raw data output 6. Commonly and uniquely differentially expressed genes in breast cancer CSC/GTC microarray and TCGA BRCA bulk RNA-seq datasets. This data was generated based on the results of breast cancer microarray and TCGA BRCA data analysis. CSC, and GTC are abbreviations of cancer stem cell, and general tumor cell, respectively. Raw data output 7. Differential and common co-expression and protein-protein interaction of genes between CSC and GTC samples. This data was generated based on the results of AML microarray and STRING database-based protein-protein interaction data analysis. CSC, and GTC are abbreviations of cancer stem cell, and general tumor cell, respectively. Raw data output 8. Differentially expressed genes between AML dormant and active CSCs. This data was generated based on the results of AML scRNA-seq data analysis. Raw data output 9. Uniquely expressed genes in dormant or active AML CSCs. This data was generated based on the results of AML scRNA-seq data analysis. Raw data output 10. Intersections between the targeting transcription factors of AML key CSC genes and differentially expressed genes between AML CSCs vs GTCs and between dormant and active AML CSCs or the uniquely expressed genes in either class of CSCs. Raw data output 11. Targeting desirableness score of AML key CSC genes and their targeting transcription factors. These scores were generated based on an in-house scoring function described in the Methods section. Raw data output 12. CSC-specific targeting desirableness score of AML key CSC genes and their targeting transcription factors. These scores were generated based on an in-house scoring function described in the Methods section. Raw data output 13. The protein-protein interactions between AML key CSC genes with themselves and their targeting transcription factors. This data was generated based on the results of AML microarray and STRING database-based protein-protein interaction data analysis. Raw data output 14. The previously confirmed associations of genes having the highest targeting desirableness and CSC-specific targeting desirableness scores with AML or other cancers’ (stem) cells as well as hematopoietic stem cells. These data were generated based on a PubMed database-based literature mining. Raw data output 15. Drug score of available drugs and bioactive small molecules targeting AML key CSC genes and/or their targeting transcription factors. These scores were generated based on an in-house scoring function described in the Methods section. Raw data output 16. CSC-specific drug score of available drugs and bioactive small molecules targeting AML key CSC genes and/or their targeting transcription factors. These scores were generated based on an in-house scoring function described in the Methods section. Raw data output 17. Candidate drugs for experimental validation. These drugs were selected based on their respective (CSC-specific) drug scores. CSC is the abbreviation of cancer stem cell. Raw data output 18. Detailed information on the samples of the AML microarray dataset GSE30375 used in this study.