Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
df_force_kin_filtered.csv is the data sheet used for the DATA3 python notebook to analyse kinematics and dynamics combined. It contains the footfalls that hava data for both: kinematics and dynamics. To see how this file is generated, read the first half of the jupyter notebook
Facebook
TwitterThis dataset was created by Balirwa Alvin Daniel
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset compares four cities FIXED-line broadband internet speeds: - Melbourne, AU - Bangkok, TH - Shanghai, CN - Los Angeles, US - Alice Springs, AU
ERRATA: 1.Data is for Q3 2020, but some files are labelled incorrectly as 02-20 of June 20. They all should read Sept 20, or 09-20 as Q3 20, rather than Q2. Will rename and reload. Amended in v7.
*lines of data for each geojson file; a line equates to a 600m^2 location, inc total tests, devices used, and average upload and download speed - MEL 16181 locations/lines => 0.85M speedtests (16.7 tests per 100people) - SHG 31745 lines => 0.65M speedtests (2.5/100pp) - BKK 29296 lines => 1.5M speedtests (14.3/100pp) - LAX 15899 lines => 1.3M speedtests (10.4/100pp) - ALC 76 lines => 500 speedtests (2/100pp)
Geojsons of these 2* by 2* extracts for MEL, BKK, SHG now added, and LAX added v6. Alice Springs added v15.
This dataset unpacks, geospatially, data summaries provided in Speedtest Global Index (linked below). See Jupyter Notebook (*.ipynb) to interrogate geo data. See link to install Jupyter.
** To Do Will add Google Map versions so everyone can see without installing Jupyter. - Link to Google Map (BKK) added below. Key:Green > 100Mbps(Superfast). Black > 500Mbps (Ultrafast). CSV provided. Code in Speedtestv1.1.ipynb Jupyter Notebook. - Community (Whirlpool) surprised [Link: https://whrl.pl/RgAPTl] that Melb has 20% at or above 100Mbps. Suggest plot Top 20% on map for community. Google Map link - now added (and tweet).
** Python melb = au_tiles.cx[144:146 , -39:-37] #Lat/Lon extract shg = tiles.cx[120:122 , 30:32] #Lat/Lon extract bkk = tiles.cx[100:102 , 13:15] #Lat/Lon extract lax = tiles.cx[-118:-120, 33:35] #lat/Lon extract ALC=tiles.cx[132:134, -22:-24] #Lat/Lon extract
Histograms (v9), and data visualisations (v3,5,9,11) will be provided. Data Sourced from - This is an extract of Speedtest Open data available at Amazon WS (link below - opendata.aws).
**VERSIONS v.24 Add tweet and google map of Top 20% (over 100Mbps locations) in Mel Q322. Add v.1.5 MEL-Superfast notebook, and CSV of results (now on Google Map; link below). v23. Add graph of 2022 Broadband distribution, and compare 2020 - 2022. Updated v1.4 Jupyter notebook. v22. Add Import ipynb; workflow-import-4cities. v21. Add Q3 2022 data; five cities inc ALC. Geojson files. (2020; 4.3M tests 2022; 2.9M tests)
v20. Speedtest - Five Cities inc ALC. v19. Add ALC2.ipynb. v18. Add ALC line graph. v17. Added ipynb for ALC. Added ALC to title.v16. Load Alice Springs Data Q221 - csv. Added Google Map link of ALC. v15. Load Melb Q1 2021 data - csv. V14. Added Melb Q1 2021 data - geojson. v13. Added Twitter link to pics. v12 Add Line-Compare pic (fastest 1000 locations) inc Jupyter (nbn-intl-v1.2.ipynb). v11 Add Line-Compare pic, plotting Four Cities on a graph. v10 Add Four Histograms in one pic. v9 Add Histogram for Four Cities. Add NBN-Intl.v1.1.ipynb (Jupyter Notebook). v8 Renamed LAX file to Q3, rather than 03. v7 Amended file names of BKK files to correctly label as Q3, not Q2 or 06. v6 Added LAX file. v5 Add screenshot of BKK Google Map. v4 Add BKK Google map(link below), and BKK csv mapping files. v3 replaced MEL map with big key version. Prev key was very tiny in top right corner. v2 Uploaded MEL, SHG, BKK data and Jupyter Notebook v1 Metadata record
** LICENCE AWS data licence on Speedtest data is "CC BY-NC-SA 4.0", so use of this data must be: - non-commercial (NC) - reuse must be share-alike (SA)(add same licence). This restricts the standard CC-BY Figshare licence.
** Other uses of Speedtest Open Data; - see link at Speedtest below.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This JavaScript code has been developed to retrieve NDSI_Snow_Cover from MODIS version 6 for SNOTEL sites using the Google Earth Engine platform. To successfully run the code, you should have a Google Earth Engine account. An input file, called NWM_grid_Western_US_polygons_SNOTEL_ID.zip, is required to run the code. This input file includes 1 km grid cells of the NWM containing SNOTEL sites. You need to upload this input file to the Assets tap in the Google Earth Engine code editor. You also need to import the MOD10A1.006 Terra Snow Cover Daily Global 500m collection to the Google Earth Engine code editor. You may do this by searching for the product name in the search bar of the code editor.
The JavaScript works for s specified time range. We found that the best period is a month, which is the maximum allowable time range to do the computation for all SNOTEL sites on Google Earth Engine. The script consists of two main loops. The first loop retrieves data for the first day of a month up to day 28 through five periods. The second loop retrieves data from day 28 to the beginning of the next month. The results will be shown as graphs on the right-hand side of the Google Earth Engine code editor under the Console tap. To save results as CSV files, open each time-series by clicking on the button located at each graph's top right corner. From the new web page, you can click on the Download CSV button on top.
Here is the link to the script path: https://code.earthengine.google.com/?scriptPath=users%2Figarousi%2Fppr2-modis%3AMODIS-monthly
Then, run the Jupyter Notebook (merge_downloaded_csv_files.ipynb) to merge the downloaded CSV files that are stored for example in a folder called output/from_GEE into one single CSV file which is merged.csv. The Jupyter Notebook then applies some preprocessing steps and the final output is NDSI_FSCA_MODIS_C6.csv.
Facebook
TwitterFollowing the procedure of Jupyter notebook, users can create SUMMA input using *.csv files. If users want to create new SUMMA input, they can prepare input by csv format. After that, users are able to simulate SUMMA with PySUMMA and Plotting with SUMMA output by the various way.
Following the step of this notebooks 1. Creating SUMMA input from *.csv files 2. Run SUMMA Model using PySUMMA 3. Plotting with SUMMA output - Time series Plotting - 2D Plotting (heatmap, hovmoller) - Calculating water balance variables and Plotting - Spatial Plotting with shapefile
Facebook
TwitterNowadays, there is a growing tendency to use Python and R in the analytics world for physical/statistical modeling and data visualization. As scientists, analysts, or statisticians, we oftentimes choose the tool that allows us to perform the task in the quickest and most accurate way possible. For some, that means Python. For others, that means R. For many, that means a combination of the two. However, it may take considerable time to switch between these two languages, passing data and models through .csv files or database systems. There's a solution that allows researchers to quickly and easily interface R and Python together in one single Jupyter Notebook. Here we provide a Jupyter Notebook that serves as a tutorial showing how to interface R and Python together in a Jupyter Notebook on CUAHSI JupyterHub. This tutorial walks you through the installation of rpy2 library and shows simple examples illustrating this interface.
Facebook
TwitterArcGIS Survey123 utilizes CSV data in several workflows, including external choice lists, the search() appearance, and pulldata() calculations. When you need to periodically update the CSV content used in a survey, a useful method is to upload the CSV files to your ArcGIS organization and link the CSV items to your survey. Once linked, any updates to the CSV items will automatically pull through to your survey without the need to republish the survey. To learn more about linking items to a survey, see Linked content.This notebook demonstrates how to automate updating a CSV item in your ArcGIS organization.Note: It is recommended to run this notebook on your computer in Jupyter Notebook or ArcGIS Pro, as that will provide the best experience when reading locally stored CSV files. If you intend to schedule this notebook in ArcGIS Online or ArcGIS Notebook Server, additional configuration may be required to read CSV files from online file storage, such as Microsoft OneDrive or Google Drive.
Facebook
TwitterObjective Daily COVID-19 data reported by the World Health Organization (WHO) may provide the basis for political ad hoc decisions including travel restrictions. Data reported by countries, however, is heterogeneous and metrics to evaluate its quality are scarce. In this work, we analyzed COVID-19 case counts provided by WHO and developed tools to evaluate country-specific reporting behaviors. Methods In this retrospective cross-sectional study, COVID-19 data reported daily to WHO from 3rd January 2020 until 14th June 2021 were analyzed. We proposed the concepts of binary reporting rate and relative reporting behavior and performed descriptive analyses for all countries with these metrics. We developed a score to evaluate the consistency of incidence and binary reporting rates. Further, we performed spectral clustering of the binary reporting rate and relative reporting behavior to identify salient patterns in these metrics. Results Our final analysis included 222 countries and regions...., Data collection COVID-19 data was downloaded from WHO. Using a public repository, we have added the countries' full names to the WHO data set using the two-letter abbreviations for each country to merge both data sets. The provided COVID-19 data covers January 2020 until June 2021. We uploaded the final data set used for the analyses of this paper. Data processing We processed data using a Jupyter Notebook with a Python kernel and publically available external libraries. This upload contains the required Jupyter Notebook (reporting_behavior.ipynb) with all analyses and some additional work, a README, and the conda environment yml (env.yml)., Any text editor including Microsoft Excel and their free alternatives can open the uploaded CSV file. Any web browser and some code editors (like the freely available Visual Studio Code) can show the uploaded Jupyter Notebook if the required Python environment is set up correctly.
Facebook
TwitterThe file task_data.csv contains an example data set that has been artificially generated. The set consists of 400 samples where for each sample there are 10 different sensor readings available. The samples have been divided into two classes where the class label is either 1 or -1. The class labels define to what particular class a particular sample belongs.
Your task is to rank the sensors according to their importance/predictive power with respect to the class labels of the samples. Your solution should be a Python script or a Jupyter notebook file that generates a ranking of the sensors from the provided CSV file. The ranking should be in decreasing order where the first sensor is the most important one.
Additionally, please include an analysis of your method and results, with possible topics including:
Hint: There are many reasonable solutions to our task. We are looking for good, insightful ones that are the least arbitrary.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This is an enriched version of the Code4ML dataset, a large-scale corpus of annotated Python code snippets, competition summaries, and data descriptions sourced from Kaggle. The initial release includes approximately 2.5 million snippets of machine learning code extracted from around 100,000 Jupyter notebooks. A portion of these snippets has been manually annotated by human assessors through a custom-built, user-friendly interface designed for this task.
The original dataset is organized into multiple CSV files, each containing structured data on different entities:
Table 1. code_blocks.csv structure
| Column | Description |
| code_blocks_index | Global index linking code blocks to markup_data.csv. |
| kernel_id | Identifier for the Kaggle Jupyter notebook from which the code block was extracted. |
| code_block_id |
Position of the code block within the notebook. |
| code_block |
The actual machine learning code snippet. |
Table 2. kernels_meta.csv structure
| Column | Description |
| kernel_id | Identifier for the Kaggle Jupyter notebook. |
| kaggle_score | Performance metric of the notebook. |
| kaggle_comments | Number of comments on the notebook. |
| kaggle_upvotes | Number of upvotes the notebook received. |
| kernel_link | URL to the notebook. |
| comp_name | Name of the associated Kaggle competition. |
Table 3. competitions_meta.csv structure
| Column | Description |
| comp_name | Name of the Kaggle competition. |
| description | Overview of the competition task. |
| data_type | Type of data used in the competition. |
| comp_type | Classification of the competition. |
| subtitle | Short description of the task. |
| EvaluationAlgorithmAbbreviation | Metric used for assessing competition submissions. |
| data_sources | Links to datasets used. |
| metric type | Class label for the assessment metric. |
Table 4. markup_data.csv structure
| Column | Description |
| code_block | Machine learning code block. |
| too_long | Flag indicating whether the block spans multiple semantic types. |
| marks | Confidence level of the annotation. |
| graph_vertex_id | ID of the semantic type. |
The dataset allows mapping between these tables. For example:
kernel_id column.comp_name. To maintain quality, kernels_meta.csv includes only notebooks with available Kaggle scores.In addition, data_with_preds.csv contains automatically classified code blocks, with a mapping back to code_blocks.csvvia the code_blocks_index column.
The updated Code4ML 2.0 corpus introduces kernels extracted from Meta Kaggle Code. These kernels correspond to the kaggle competitions launched since 2020. The natural descriptions of the competitions are retrieved with the aim of LLM.
Notebooks in kernels_meta2.csv may not have a Kaggle score but include a leaderboard ranking (rank), providing additional context for evaluation.
competitions_meta_2.csv is enriched with data_cards, decsribing the data used in the competitions.
The Code4ML 2.0 corpus is a versatile resource, enabling training and evaluation of models in areas such as:
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
In this archive, you can find all the data used in the paper "ReSplit: Improving the Structure of Jupyter Notebooks by Re-Splitting Their Cells".
sklearn_full_cells.csv is the dataset from the paper of Pimentel et al. filtered with only Data Science notebooks. complete.csv is the dataset obtained after the full run of ReSplit on the dataset: both merging and splitting. split.csv is the dataset obtained after running only the splitting part of our dataset. merged.csv is the dataset obtained after running only the merging part of our dataset. duplicates_id.csv contains the IDs of the duplicate notebooks for deduplication. changes.csv contains the IDs of the datasets, as well as their length before and after running ReSplit. survey.csv is the table with the results of the survey.
In the dataset CSVs, each line is a cell that has a unique identifier and an identifier of the corresonding notebook.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset was originally curated by Software Carpentry, a branch of The Carpentries non-profit organization, and is based on data from the Gapminder Foundation. It consists of six tabular CSV files containing GDP data for various countries across different years. The dataset was initially prepared for the Software Carpentry tutorial "Plotting and Programming in Python" and is also reused in the Galaxy Training Network (GTN) tutorial "Use Jupyter Notebooks in Galaxy."
This GTN tutorial provides an introduction to launching a Jupyter Notebook in Galaxy, installing dependencies, and importing and exporting data. It serves as a setup guide for a Jupyter Notebook environment that can be used to follow the Software Carpentry tutorial "Plotting and Programming in Python."
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Data from: Rates of Compact Object Coalescence
Brief overview: This Zenodo entry contains the data that has been used to make the figures for the living review "Rates of Compact Object Coalescence" by Ilya Mandel & Floor Broekgaarden (2021). To reproduce the figures, download all the *.csv files and run the jupyter notebook created to reproduce the results in the publicly available Github directory https://github.com/FloorBroekgaarden/Rates_of_Compact_Object_Coalescence (the exact jupyter notebook can be found here)
For any suggestions, questions or inquiry, please email one, or both, of the authors:
Ilya Mandel: ilya.mandel@monash.edu
Floor Broekgaarden: floor.broekgaarden@cfa.harvard.edu
We very much welcome suggestions for additional/missing literature with rate predictions or measurements.
Extra figures: Extra figures that can be used can be found here:
Vertical figures: https://docs.google.com/presentation/d/1GqJ0k2zpnxBGwIYNeQ0BfsLSU7H2942gspL-PN_iaJY/edit?usp=sharing
The authors are currently working on making an interactive tool for plotting the rates that will be available soon. In the mean time, feel free to send requests for plots/figures to the authors.
Reference If you use this data/code for publication, please cite both the paper: Mandel & Broekgaarden (2021) (https://ui.adsabs.harvard.edu/abs/2021arXiv210714239M/abstract) and the dataset on Zenodo through it's doi (see tabs on the right of this zenodo entry)
Details datafiles:
The PDF COC_rates_supplementary_material.pdf attached (and in the Github repository) describes how each of the rates in the data files of this Zenodo entry are retrieved. The other 26 files are .csv files, where each csv file contains the rates from one specific double compact object type: NS-NS, NS-BH or BH-BH, and specific rate group (isolated binary evolution, gravitational wave observations etc.). The files in this entry are:
Data_Mandel_and_Broekgaarden_2021.zip all the files below conveniently in one zip file so that you only have to do 1 download.
COC_rates_supplementary_material.pdf # PDF document describing how the rates are retrieved and quoted rom each study
BH-BH_rates_CHE.csv # BH-BH rates for chemically homogeneous evolution
BH-BH_rates_flybys.csv # BH-BH rates for formation from wide isolated binaries with dynamical interactions from flybys
BH-BH_rates_globular-clusters.csv # BH-BH rates for dynamical formation in globular clusters
BH-BH_rates_isolated-binary-evolution.csv # BH-BH rates for isolated binary evolution
BH-BH_rates_nuclear-clusters.csv # BH-BH rates for (dynamical )formation in (active) nuclear star clusters
BH-BH_rates_observations-GWs.csv # BH-BH rates for observations from gravitational waves
BH-BH_rates_population-III.csv # BH-BH rates for population-III stars
BH-BH_rates_primordial.csv # BH-BH rates for primordial formation
BH-BH_rates_triples.csv. # BH-BH rates for formation in (hierarchical) triples
BH-BH_rates_young-stellar-clusters.csv # BH-BH rates for dynamical formation in young/open star clusters
NS-BH_rates_CHE.csv # NS-BH rates for chemically homogeneous evolution
NS-BH_rates_flybys.csv # BH-BH rates for formation from wide isolated binaries with dynamical interactions from flybys
NS-BH_rates_globular-clusters.csv # NS-BH rates for dynamical formation in globular clusters
NS-BH_rates_isolated-binary-evolution.csv. # NS-BH rates for isolated binary evolution
NS-BH_rates_nuclear-clusters.csv # NS-BH rates for (dynamical )formation in (active) nuclear star clusters
NS-BH_rates_observations-GWs.csv # NS-BH rates for observations from gravitational waves
NS-BH_rates_population-III.csv # NS-BH rates for population-III stars
NS-BH_rates_triples.csv # NS-BH rates for formation in (hierarchical) triples
NS-BH_rates_young-stellar-clusters.csv # BH-BH rates for dynamical formation in young/open star clusters
NS-NS_rates_globular-clusters.csv # NS-NS rates for dynamical formation in globular clusters
NS-NS_rates_isolated-binary-evolution.csv # NS-NS rates for isolated binary evolution
NS-NS_rates_nuclear-clusters.csv # NS-NS rates for (dynamical )formation in (active) nuclear star clusters
NS-NS_rates_observations-GWs.csv # NS-NS rates for observations from gravitational waves
NS-NS_rates_observations-kilonovae.csv # NS-NS rates for observations from kilonovae
NS-NS_rates_observations-pulsars.csv # NS-NS rates for observations from Galactic pulsars
NS-NS_rates_observations-sGRBs.csv # NS-NS rates for observations short gamma-ray bursts
NS-NS_rates_triples.csv # NS-NS rates for formation in (hierarchical) triples
NS-NS_rates_young-stellar-clusters.csv # NS-NS rates for dynamical formation in young/open star clusters
Each csv file contains the following header: ADS year # year of the paper in the ADS entry ADS month # month of the paper in the ADS entry ADS abstract link # link to the ADS abstract ArXiv link # link to the ArXiv version of the paper First Author # name of the first author label string # label of the study, that corresponds to the label in the figure code (optional) # name of the code used in this study type of limit (for plotting, see jupyter notebook for a dictionary) # integer, that is used to map to a certain limit visualization in the plot (e.g. scatter points vs upper limit).
Each entry takes two columns in the csv files. One for the rates (quoted under the header 'rate [Gpc^-3 yr^-1]') and one for "notes" where we sometimes added notes about the rates (such as whether it is an upper or lower limit).
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset provides electricity consumption data collected from the building management system of GreEn-ER. This building, located in Grenoble, hosts Grenoble-INP Ense³ Engineering School and the G2ELab (Grenoble Electrical Engineering Laboratory). It brings together in one place the teaching and research actors around new energy technologies. The electricity consumption of the building is highly monitored with plus than 300 meters. The data from each meter is available in one csv file, which contains two columns. One contains the Timestamp and the other contains de electricity consumption in kWh. The sampling rate for all data is 10 min. There are data available for 2017 and 2018. The dataset also contains data of the external temperature for 2017 and 2018. The files are structured as follows: - The main folder called "Data" contains 2 sub-folders, each one corresponding to one year (2017 and 2018). - Each sub-folder contains 3 other sub-folders, each one corresponding to a sector of the building. - The main folder "Data" also contains the csv files with the electricity consumption data of the whole building and a file called "Temp.csv" with the temperature data. - The separator used in the csv files is ";". - The sampling rate is 10 min and the unity of the consumption is kWh. It means that each sample corresponds to the energy consumption in these 10 minutes. So if the user wants to retrieve the mean power in this period (that corresponds to each sample), the value must be multiplied by 6. - Four Jupyter Notebook files, a format that allows combining text, graphics and code in python are also available. These files allow exploring all the data within the dataset. - These jupyter notebook files contains all the metadata necessary for understanding the system, like drawings of the system design, of the building etc. - Each file is named by the number of its meter. These numbers can be retrieved in tables and drawings available in the Jupyter Notebooks. - A couple of csv files with the system design are also available. They are called "TGBT1_n.csv", "TGBT2_n.csv" and "PREDIS-MHI_n.csv".
Facebook
TwitterT1DiabetesGranada
A longitudinal multi-modal dataset of type 1 diabetes mellitus
Documented by:
Rodriguez-Leon, C., Aviles-Perez, M. D., Banos, O., Quesada-Charneco, M., Lopez-Ibarra, P. J., Villalonga, C., & Munoz-Torres, M. (2023). T1DiabetesGranada: a longitudinal multi-modal dataset of type 1 diabetes mellitus. Scientific Data, 10(1), 916. https://doi.org/10.1038/s41597-023-02737-4
Background
Type 1 diabetes mellitus (T1D) patients face daily difficulties in keeping their blood glucose levels within appropriate ranges. Several techniques and devices, such as flash glucose meters, have been developed to help T1D patients improve their quality of life. Most recently, the data collected via these devices is being used to train advanced artificial intelligence models to characterize the evolution of the disease and support its management. The main problem for the generation of these models is the scarcity of data, as most published works use private or artificially generated datasets. For this reason, this work presents T1DiabetesGranada, a open under specific permission longitudinal dataset that not only provides continuous glucose levels, but also patient demographic and clinical information. The dataset includes 257780 days of measurements over four years from 736 T1D patients from the province of Granada, Spain. This dataset progresses significantly beyond the state of the art as one the longest and largest open datasets of continuous glucose measurements, thus boosting the development of new artificial intelligence models for glucose level characterization and prediction.
Data Records
The data are stored in four comma-separated values (CSV) files which are available in T1DiabetesGranada.zip. These files are described in detail below.
Patient_info.csv
Patient_info.csv is the file containing information about the patients, such as demographic data, start and end dates of blood glucose level measurements and biochemical parameters, number of biochemical parameters or number of diagnostics. This file is composed of 736 records, one for each patient in the dataset, and includes the following variables:
Patient_ID – Unique identifier of the patient. Format: LIB19XXXX.
Sex – Sex of the patient. Values: F (for female), masculine (for male)
Birth_year – Year of birth of the patient. Format: YYYY.
Initial_measurement_date – Date of the first blood glucose level measurement of the patient in the Glucose_measurements.csv file. Format: YYYY-MM-DD.
Final_measurement_date – Date of the last blood glucose level measurement of the patient in the Glucose_measurements.csv file. Format: YYYY-MM-DD.
Number_of_days_with_measures – Number of days with blood glucose level measurements of the patient, extracted from the Glucose_measurements.csv file. Values: ranging from 8 to 1463.
Number_of_measurements – Number of blood glucose level measurements of the patient, extracted from the Glucose_measurements.csv file. Values: ranging from 400 to 137292.
Initial_biochemical_parameters_date – Date of the first biochemical test to measure some biochemical parameter of the patient, extracted from the Biochemical_parameters.csv file. Format: YYYY-MM-DD.
Final_biochemical_parameters_date – Date of the last biochemical test to measure some biochemical parameter of the patient, extracted from the Biochemical_parameters.csv file. Format: YYYY-MM-DD.
Number_of_biochemical_parameters – Number of biochemical parameters measured on the patient, extracted from the Biochemical_parameters.csv file. Values: ranging from 4 to 846.
Number_of_diagnostics – Number of diagnoses realized to the patient, extracted from the Diagnostics.csv file. Values: ranging from 1 to 24.
Glucose_measurements.csv
Glucose_measurements.csv is the file containing the continuous blood glucose level measurements of the patients. The file is composed of more than 22.6 million records that constitute the time series of continuous blood glucose level measurements. It includes the following variables:
Patient_ID – Unique identifier of the patient. Format: LIB19XXXX.
Measurement_date – Date of the blood glucose level measurement. Format: YYYY-MM-DD.
Measurement_time – Time of the blood glucose level measurement. Format: HH:MM:SS.
Measurement – Value of the blood glucose level measurement in mg/dL. Values: ranging from 40 to 500.
Biochemical_parameters.csv
Biochemical_parameters.csv is the file containing data of the biochemical tests performed on patients to measure their biochemical parameters. This file is composed of 87482 records and includes the following variables:
Patient_ID – Unique identifier of the patient. Format: LIB19XXXX.
Reception_date – Date of receipt in the laboratory of the sample to measure the biochemical parameter. Format: YYYY-MM-DD.
Name – Name of the measured biochemical parameter. Values: 'Potassium', 'HDL cholesterol', 'Gammaglutamyl Transferase (GGT)', 'Creatinine', 'Glucose', 'Uric acid', 'Triglycerides', 'Alanine transaminase (GPT)', 'Chlorine', 'Thyrotropin (TSH)', 'Sodium', 'Glycated hemoglobin (Ac)', 'Total cholesterol', 'Albumin (urine)', 'Creatinine (urine)', 'Insulin', 'IA ANTIBODIES'.
Value – Value of the biochemical parameter. Values: ranging from -4.0 to 6446.74.
Diagnostics.csv
Diagnostics.csv is the file containing diagnoses of diabetes mellitus complications or other diseases that patients have in addition to type 1 diabetes mellitus. This file is composed of 1757 records and includes the following variables:
Patient_ID – Unique identifier of the patient. Format: LIB19XXXX.
Code – ICD-9-CM diagnosis code. Values: subset of 594 of the ICD-9-CM codes (https://www.cms.gov/Medicare/Coding/ICD9ProviderDiagnosticCodes/codes).
Description – ICD-9-CM long description. Values: subset of 594 of the ICD-9-CM long description (https://www.cms.gov/Medicare/Coding/ICD9ProviderDiagnosticCodes/codes).
Technical Validation
Blood glucose level measurements are collected using FreeStyle Libre devices, which are widely used for healthcare in patients with T1D. Abbott Diabetes Care, Inc., Alameda, CA, USA, the manufacturer company, has conducted validation studies of these devices concluding that the measurements made by their sensors compare to YSI analyzer devices (Xylem Inc.), the gold standard, yielding results of 99.9% of the time within zones A and B of the consensus error grid. In addition, other studies external to the company concluded that the accuracy of the measurements is adequate.
Moreover, it was also checked in most cases the blood glucose level measurements per patient were continuous (i.e. a sample at least every 15 minutes) in the Glucose_measurements.csv file as they should be.
Usage Notes
For data downloading, it is necessary to be authenticated on the Zenodo platform, accept the Data Usage Agreement and send a request specifying full name, email, and the justification of the data use. This request will be processed by the Secretary of the Department of Computer Engineering, Automatics, and Robotics of the University of Granada and access to the dataset will be granted.
The files that compose the dataset are CSV type files delimited by commas and are available in T1DiabetesGranada.zip. A Jupyter Notebook (Python v. 3.8) with code that may help to a better understanding of the dataset, with graphics and statistics, is available in UsageNotes.zip.
Graphs_and_stats.ipynb
The Jupyter Notebook generates tables, graphs and statistics for a better understanding of the dataset. It has four main sections, one dedicated to each file in the dataset. In addition, it has useful functions such as calculating the patient age, deleting a patient list from a dataset file and leaving only a patient list in a dataset file.
Code Availability
The dataset was generated using some custom code located in CodeAvailability.zip. The code is provided as Jupyter Notebooks created with Python v. 3.8. The code was used to conduct tasks such as data curation and transformation, and variables extraction.
Original_patient_info_curation.ipynb
In the Jupyter Notebook is preprocessed the original file with patient data. Mainly irrelevant rows and columns are removed, and the sex variable is recoded.
Glucose_measurements_curation.ipynb
In the Jupyter Notebook is preprocessed the original file with the continuous glucose level measurements of the patients. Principally rows without information or duplicated rows are removed and the variable with the timestamp is transformed into two new variables, measurement date and measurement time.
Biochemical_parameters_curation.ipynb
In the Jupyter Notebook is preprocessed the original file with patient data of the biochemical tests performed on patients to measure their biochemical parameters. Mainly irrelevant rows and columns are removed and the variable with the name of the measured biochemical parameter is translated.
Diagnostic_curation.ipynb
In the Jupyter Notebook is preprocessed the original file with patient data of the diagnoses of diabetes mellitus complications or other diseases that patients have in addition to T1D.
Get_patient_info_variables.ipynb
In the Jupyter Notebook it is coded the feature extraction process from the files Glucose_measurements.csv, Biochemical_parameters.csv and Diagnostics.csv to complete the file Patient_info.csv. It is divided into six sections, the first three to extract the features from each of the mentioned files and the next three to add the extracted features to the resulting new file.
Data Usage Agreement
The conditions for use are as follows:
You confirm that you will not attempt to re-identify research participants for any reason, including for re-identification theory research.
You commit to keeping the T1DiabetesGranada dataset confidential and secure and will not redistribute data or Zenodo account credentials.
You will require
Facebook
TwitterThe task_data.csv contains an example data set that has been artificially generated.
The set consists of 400 samples where for each sample there are 10 different sensor readings available.
The samples have been divided into two classes where the class label is either 1 or -1.
The class labels define to what particular class a particular sample belongs.
There's a story behind every dataset and here's your opportunity to share yours.
There are 10 Sensors: from Sensor0 till Sensor9. Target : class_label. Sample Index.
Your task if you choose to accept it?
is to rank the sensors according to their importance/predictive power with respect to the class labels of the samples. Your solution should be a Python script or a Jupyter notebook file that generates a ranking of the sensors from the provided CSV file. The ranking should be in decreasing order where the first sensor is the most important one.
Facebook
TwitterThis notebook has been developed to download specific variables at specific sites from National Water Model V2.0 (NWM) Retrospective run results in Google Cloud. It has been set up to retrieve data at SNOTEL sites. An input file SNOTEL_indices_at_NWM.csv maps from SNOTEL site identifiers to NWM X and Y indices (Xindex and Yindex). A shell script (gget.sh) uses Google utilities (gsutil) to retrieve NWM grid file results for a fixed (limited) block of time. A python function then reads a set of designated variables from a set of designated sites from NWM grid files into CSV files for further analysis.
The input file SNOTEL_indices_at_NWM.csv is generated using Garousi-Nejad and Tarboton (2021).
Reference: Garousi-Nejad, I., D. Tarboton (2021). Notebook to get the indices of National Water Model V2.0 grid cells containing SNOTL sites, HydroShare, http://www.hydroshare.org/resource/7839e3f3b4f54940bd3591b24803cacf
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset contains source code of Modelica models of Glucose-Insulin regulation using different techniques.
Accompanying Jupyter notebook is demo for system analysis (parameter estimation) of artificial data and to match model simulation able to be used in Teaching class.
Thanks to the MYBINDER service, the Jupyter notebook can be viewed and executed as
conda install -c conda-forge pyfmi matplotlib
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The present repository contains data and code related to our manuscript "Structural comparison of protein-RNA homologous interfaces reveals widespread overall conservation contrasted with versatility in polar contacts". In the manuscript, we analyze the evolution of protein-RNA interfaces by building a dataset of protein-RNA interologs (homologous interfaces) and exploring how interface contacts are conserved between homologous interfaces, as well as possible explanations for non-conserved contacts.
This repository contains the following files:
DataAnalysisNotebook.ipynb is a Jupyter notebook to reproduce contact conservation analysis and all figures from our manuscript, and to explore data
env.yaml is an environment file in order to build a Conda/Mamba environment to run the Jupyter notebook
2022-02-21-PDB.csv contains data from the PDB about 3D structures of complexes containing interacting protein and RNA chains (PDB structure identifier, chain identifiers, experimental technique and resolution)
2022-02-21-PDB_proteinchainscontactingRNAchains.groupbp.tsv contains more detailed information about interacting protein and RNA chains from these complexes (PDB and chain identifiers, protein and RNA size, interface size and number of contacts)
2022-02-21-PDB_proteinchainscontactingRNAchains.groupbp.txt.selectXE_2.50_p30_r10_pi5_ri5_rep_bc-100.out_RNAcl_0.99.tsv contains the same detailed information, restricted to the filtered dataset used as a starting point in our interolog search pipeline
PDBinterfaceAlign.csv contains information about the structural alignment of pairs of protein-RNA interactions (structural alignment TM-scores, sequence identity and coverage)
DataInterologsParam.tsv contains information about a pre-filtered set of 2587 potential interologs (including interface RMSD, sequence identity and coverage and interface size)
DataInterologsContactsFixedSASA.tsv contains detailed information about conserved and non-conserved contacts in the final set of 2022 interologs (atomic contacts, apolar contacts, hydrogen bonds, salt bridges and stacking information for aminoacid-nucleotide pairs, as well as information about whether each belongs to the interface, secondary structures, and the aminoacid surface accessibility and evolutionary conservation metrics) - compared to version 1, the calculation of solvent accessibility was fixed for a number of interolog pairs
DataCons.csv contains precomputed contact conservation metrics for each of the 2022 interolog pairs, for fast reproduction of manuscript figures
DataInterologsContactsResampledMaintainStructSeqId.tsv, DataInterologsContactsShuffled.tsv and DataInterologsShuffled.tsv relate to baselines computed for contact conservation assessment
clan.txt, clan_membership.txt, ecod.latest.domains.uniq.txt, rfam_interfaces_977.txt, DataGroupsECOD.tsv, DataGroupesRFAM.tsv, DataGroupsRFAMClan.tsv, DataInterfaceGroupsECOD.tsv and DataInterfaceGroupsRFAM.tsv relate to the ECOD (respectively Rfam) classification of protein domains (respectively RNA) in protein-RNA interfaces from our dataset
ListeIntraHbonds.pkl and ListeIntraSaltBridges.pkl are pickle-format data files containing intra-molecular hydrogen bonds and salt bridges (respectively) that are used to analyse scenarii of compensation for non-conserved polar contacts.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
df_force_kin_filtered.csv is the data sheet used for the DATA3 python notebook to analyse kinematics and dynamics combined. It contains the footfalls that hava data for both: kinematics and dynamics. To see how this file is generated, read the first half of the jupyter notebook