100+ datasets found

csv file for jupyter notebook
figshare.com
txt
Updated Nov 21, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Johanna Schultz (2022). csv file for jupyter notebook [Dataset]. http://doi.org/10.6084/m9.figshare.21590175.v1
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.21590175.v1
Dataset updated
Nov 21, 2022
Dataset provided by
Figsharehttp://figshare.com/
Authors
Johanna Schultz
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
df_force_kin_filtered.csv is the data sheet used for the DATA3 python notebook to analyse kinematics and dynamics combined. It contains the footfalls that hava data for both: kinematics and dynamics. To see how this file is generated, read the first half of the jupyter notebook
Cancer_dataset
kaggle.com
zip
Updated Oct 4, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Balirwa Alvin Daniel (2024). Cancer_dataset [Dataset]. https://www.kaggle.com/datasets/balirwaalvindaniel/cancer-dataset
Explore at:
zip(192617 bytes)Available download formats
Dataset updated
Oct 4, 2024
Authors
Balirwa Alvin Daniel
Description
Dataset

This dataset was created by Balirwa Alvin Daniel

Contents
Data Cleaning, Translation & Split of the Dataset for the Automatic...
zenodo.org
data.niaid.nih.gov
bin, csv +1
Updated Apr 24, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Juliane Köhler; Juliane Köhler (2025). Data Cleaning, Translation & Split of the Dataset for the Automatic Classification of Documents for the Classification System for the Berliner Handreichungen zur Bibliotheks- und Informationswissenschaft [Dataset]. http://doi.org/10.5281/zenodo.6957842
Explore at:
text/x-python, csv, binAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.6957842
Dataset updated
Apr 24, 2025
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Juliane Köhler; Juliane Köhler
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Cleaned_Dataset.csv – The combined CSV files of all scraped documents from DABI, e-LiS, o-bib and Springer.

Data_Cleaning.ipynb – The Jupyter Notebook with python code for the analysis and cleaning of the original dataset.

ger_train.csv – The German training set as CSV file.

ger_validation.csv – The German validation set as CSV file.

en_test.csv – The English test set as CSV file.

en_train.csv – The English training set as CSV file.

en_validation.csv – The English validation set as CSV file.

splitting.py – The python code for splitting a dataset into train, test and validation set.

DataSetTrans_de.csv – The final German dataset as a CSV file.

DataSetTrans_en.csv – The final English dataset as a CSV file.

translation.py – The python code for translating the cleaned dataset.
Speedtest Open Data - Four International cities - MEL, BKK, SHG, LAX plus...
figshare.com
txt
Updated May 30, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Richard Ferrers; Speedtest Global Index (2023). Speedtest Open Data - Four International cities - MEL, BKK, SHG, LAX plus ALC - 2020, 2022 [Dataset]. http://doi.org/10.6084/m9.figshare.13621169.v24
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.13621169.v24
Dataset updated
May 30, 2023
Dataset provided by
Figsharehttp://figshare.com/
Authors
Richard Ferrers; Speedtest Global Index
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset compares four cities FIXED-line broadband internet speeds: - Melbourne, AU - Bangkok, TH - Shanghai, CN - Los Angeles, US - Alice Springs, AU

ERRATA: 1.Data is for Q3 2020, but some files are labelled incorrectly as 02-20 of June 20. They all should read Sept 20, or 09-20 as Q3 20, rather than Q2. Will rename and reload. Amended in v7.

LAX file named 0320, when should be Q320. Amended in v8.

*lines of data for each geojson file; a line equates to a 600m^2 location, inc total tests, devices used, and average upload and download speed - MEL 16181 locations/lines => 0.85M speedtests (16.7 tests per 100people) - SHG 31745 lines => 0.65M speedtests (2.5/100pp) - BKK 29296 lines => 1.5M speedtests (14.3/100pp) - LAX 15899 lines => 1.3M speedtests (10.4/100pp) - ALC 76 lines => 500 speedtests (2/100pp)

Geojsons of these 2* by 2* extracts for MEL, BKK, SHG now added, and LAX added v6. Alice Springs added v15.

This dataset unpacks, geospatially, data summaries provided in Speedtest Global Index (linked below). See Jupyter Notebook (*.ipynb) to interrogate geo data. See link to install Jupyter.

** To Do Will add Google Map versions so everyone can see without installing Jupyter. - Link to Google Map (BKK) added below. Key:Green > 100Mbps(Superfast). Black > 500Mbps (Ultrafast). CSV provided. Code in Speedtestv1.1.ipynb Jupyter Notebook. - Community (Whirlpool) surprised [Link: https://whrl.pl/RgAPTl] that Melb has 20% at or above 100Mbps. Suggest plot Top 20% on map for community. Google Map link - now added (and tweet).

** Python melb = au_tiles.cx[144:146 , -39:-37] #Lat/Lon extract shg = tiles.cx[120:122 , 30:32] #Lat/Lon extract bkk = tiles.cx[100:102 , 13:15] #Lat/Lon extract lax = tiles.cx[-118:-120, 33:35] #lat/Lon extract ALC=tiles.cx[132:134, -22:-24] #Lat/Lon extract

Histograms (v9), and data visualisations (v3,5,9,11) will be provided. Data Sourced from - This is an extract of Speedtest Open data available at Amazon WS (link below - opendata.aws).

**VERSIONS v.24 Add tweet and google map of Top 20% (over 100Mbps locations) in Mel Q322. Add v.1.5 MEL-Superfast notebook, and CSV of results (now on Google Map; link below). v23. Add graph of 2022 Broadband distribution, and compare 2020 - 2022. Updated v1.4 Jupyter notebook. v22. Add Import ipynb; workflow-import-4cities. v21. Add Q3 2022 data; five cities inc ALC. Geojson files. (2020; 4.3M tests 2022; 2.9M tests)

Melb 14784 lines Avg download speed 69.4M Tests 0.39M

SHG 31207 lines Avg 233.7M Tests 0.56M

ALC 113 lines Avg 51.5M Test 1092

BKK 29684 lines Avg 215.9M Tests 1.2M

LAX 15505 lines Avg 218.5M Tests 0.74M

v20. Speedtest - Five Cities inc ALC. v19. Add ALC2.ipynb. v18. Add ALC line graph. v17. Added ipynb for ALC. Added ALC to title.v16. Load Alice Springs Data Q221 - csv. Added Google Map link of ALC. v15. Load Melb Q1 2021 data - csv. V14. Added Melb Q1 2021 data - geojson. v13. Added Twitter link to pics. v12 Add Line-Compare pic (fastest 1000 locations) inc Jupyter (nbn-intl-v1.2.ipynb). v11 Add Line-Compare pic, plotting Four Cities on a graph. v10 Add Four Histograms in one pic. v9 Add Histogram for Four Cities. Add NBN-Intl.v1.1.ipynb (Jupyter Notebook). v8 Renamed LAX file to Q3, rather than 03. v7 Amended file names of BKK files to correctly label as Q3, not Q2 or 06. v6 Added LAX file. v5 Add screenshot of BKK Google Map. v4 Add BKK Google map(link below), and BKK csv mapping files. v3 replaced MEL map with big key version. Prev key was very tiny in top right corner. v2 Uploaded MEL, SHG, BKK data and Jupyter Notebook v1 Metadata record

** LICENCE AWS data licence on Speedtest data is "CC BY-NC-SA 4.0", so use of this data must be: - non-commercial (NC) - reuse must be share-alike (SA)(add same licence). This restricts the standard CC-BY Figshare licence.

** Other uses of Speedtest Open Data; - see link at Speedtest below.
H
JavaScript code for retrieval of MODIS Collection 6 NDSI snow cover at...
beta.hydroshare.org
hydroshare.org
+1more
zip
Updated Feb 11, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Irene Garousi-Nejad; David Tarboton (2022). JavaScript code for retrieval of MODIS Collection 6 NDSI snow cover at SNOTEL sites and a Jupyter Notebook to merge/reprocess data [Dataset]. http://doi.org/10.4211/hs.d287f010b2dd48edb0573415a56d47f8
Explore at:
zip(52.2 KB)Available download formats
Unique identifier
https://doi.org/10.4211/hs.d287f010b2dd48edb0573415a56d47f8
Dataset updated
Feb 11, 2022
Dataset provided by
HydroShare
Authors
Irene Garousi-Nejad; David Tarboton
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered

Description
This JavaScript code has been developed to retrieve NDSI_Snow_Cover from MODIS version 6 for SNOTEL sites using the Google Earth Engine platform. To successfully run the code, you should have a Google Earth Engine account. An input file, called NWM_grid_Western_US_polygons_SNOTEL_ID.zip, is required to run the code. This input file includes 1 km grid cells of the NWM containing SNOTEL sites. You need to upload this input file to the Assets tap in the Google Earth Engine code editor. You also need to import the MOD10A1.006 Terra Snow Cover Daily Global 500m collection to the Google Earth Engine code editor. You may do this by searching for the product name in the search bar of the code editor.

The JavaScript works for s specified time range. We found that the best period is a month, which is the maximum allowable time range to do the computation for all SNOTEL sites on Google Earth Engine. The script consists of two main loops. The first loop retrieves data for the first day of a month up to day 28 through five periods. The second loop retrieves data from day 28 to the beginning of the next month. The results will be shown as graphs on the right-hand side of the Google Earth Engine code editor under the Console tap. To save results as CSV files, open each time-series by clicking on the button located at each graph's top right corner. From the new web page, you can click on the Download CSV button on top.

Here is the link to the script path: https://code.earthengine.google.com/?scriptPath=users%2Figarousi%2Fppr2-modis%3AMODIS-monthly

Then, run the Jupyter Notebook (merge_downloaded_csv_files.ipynb) to merge the downloaded CSV files that are stored for example in a folder called output/from_GEE into one single CSV file which is merged.csv. The Jupyter Notebook then applies some preprocessing steps and the final output is NDSI_FSCA_MODIS_C6.csv.
d
SummaModel PreProcessing using csv file and PostProcessing using Plotting...
search.dataone.org
hydroshare.org
Updated Dec 5, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
YOUNGDON CHOI; Jonathan Goodall; Jeff Sadler; Andrew Bennett (2021). SummaModel PreProcessing using csv file and PostProcessing using Plotting Modules using PySUMMA [Dataset]. https://search.dataone.org/view/sha256%3Ab4b188e39a57501ae8384edc66544d2f3ca58901777708eb0407c7cbc46a178a
Explore at:
Dataset updated
Dec 5, 2021
Dataset provided by
Hydroshare
Authors
YOUNGDON CHOI; Jonathan Goodall; Jeff Sadler; Andrew Bennett
Time period covered
Jul 1, 2001 - Sep 30, 2008
Area covered

Description
Following the procedure of Jupyter notebook, users can create SUMMA input using *.csv files. If users want to create new SUMMA input, they can prepare input by csv format. After that, users are able to simulate SUMMA with PySUMMA and Plotting with SUMMA output by the various way.

Following the step of this notebooks 1. Creating SUMMA input from *.csv files 2. Run SUMMA Model using PySUMMA 3. Plotting with SUMMA output - Time series Plotting - 2D Plotting (heatmap, hovmoller) - Calculating water balance variables and Plotting - Spatial Plotting with shapefile
d
CUAHSI JupyterHub, Interfacing R from a Python3 Jupyter Notebook
search.dataone.org
hydroshare.org
+1more
Updated Dec 5, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Irene Garousi-Nejad; David Tarboton (2021). CUAHSI JupyterHub, Interfacing R from a Python3 Jupyter Notebook [Dataset]. https://search.dataone.org/view/sha256%3Ac76dbd34a70ec343ec7c771e32dd8568f9a55fbe6e7ee015dede9ece1760d812
Explore at:
Dataset updated
Dec 5, 2021
Dataset provided by
Hydroshare
Authors
Irene Garousi-Nejad; David Tarboton
Description
Nowadays, there is a growing tendency to use Python and R in the analytics world for physical/statistical modeling and data visualization. As scientists, analysts, or statisticians, we oftentimes choose the tool that allows us to perform the task in the quickest and most accurate way possible. For some, that means Python. For others, that means R. For many, that means a combination of the two. However, it may take considerable time to switch between these two languages, passing data and models through .csv files or database systems. There's a solution that allows researchers to quickly and easily interface R and Python together in one single Jupyter Notebook. Here we provide a Jupyter Notebook that serves as a tutorial showing how to interface R and Python together in a Jupyter Notebook on CUAHSI JupyterHub. This tutorial walks you through the installation of rpy2 library and shows simple examples illustrating this interface.
v
Update CSV item in ArcGIS
anrgeodata.vermont.gov
Updated Mar 18, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
ArcGIS Survey123 (2022). Update CSV item in ArcGIS [Dataset]. https://anrgeodata.vermont.gov/documents/dc69467c3e7243719c9125679bbcee9b
Explore at:
Dataset updated
Mar 18, 2022
Dataset authored and provided by
ArcGIS Survey123
Description
ArcGIS Survey123 utilizes CSV data in several workflows, including external choice lists, the search() appearance, and pulldata() calculations. When you need to periodically update the CSV content used in a survey, a useful method is to upload the CSV files to your ArcGIS organization and link the CSV items to your survey. Once linked, any updates to the CSV items will automatically pull through to your survey without the need to republish the survey. To learn more about linking items to a survey, see Linked content.This notebook demonstrates how to automate updating a CSV item in your ArcGIS organization.Note: It is recommended to run this notebook on your computer in Jupyter Notebook or ArcGIS Pro, as that will provide the best experience when reading locally stored CSV files. If you intend to schedule this notebook in ArcGIS Online or ArcGIS Notebook Server, additional configuration may be required to read CSV files from online file storage, such as Microsoft OneDrive or Google Drive.
d
Reporting behavior from WHO COVID-19 public data
search.dataone.org
data.niaid.nih.gov
+1more
Updated Jul 14, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Auss Abbood (2025). Reporting behavior from WHO COVID-19 public data [Dataset]. http://doi.org/10.5061/dryad.9s4mw6mmb
Explore at:
Unique identifier
https://doi.org/10.5061/dryad.9s4mw6mmb
Dataset updated
Jul 14, 2025
Dataset provided by
Dryad Digital Repository
Authors
Auss Abbood
Time period covered
Dec 16, 2022
Description
Objective Daily COVID-19 data reported by the World Health Organization (WHO) may provide the basis for political ad hoc decisions including travel restrictions. Data reported by countries, however, is heterogeneous and metrics to evaluate its quality are scarce. In this work, we analyzed COVID-19 case counts provided by WHO and developed tools to evaluate country-specific reporting behaviors. Methods In this retrospective cross-sectional study, COVID-19 data reported daily to WHO from 3rd January 2020 until 14th June 2021 were analyzed. We proposed the concepts of binary reporting rate and relative reporting behavior and performed descriptive analyses for all countries with these metrics. We developed a score to evaluate the consistency of incidence and binary reporting rates. Further, we performed spectral clustering of the binary reporting rate and relative reporting behavior to identify salient patterns in these metrics. Results Our final analysis included 222 countries and regions...., Data collection COVID-19 data was downloaded from WHO. Using a public repository, we have added the countries' full names to the WHO data set using the two-letter abbreviations for each country to merge both data sets. The provided COVID-19 data covers January 2020 until June 2021. We uploaded the final data set used for the analyses of this paper. Data processing We processed data using a Jupyter Notebook with a Python kernel and publically available external libraries. This upload contains the required Jupyter Notebook (reporting_behavior.ipynb) with all analyses and some additional work, a README, and the conda environment yml (env.yml)., Any text editor including Microsoft Excel and their free alternatives can open the uploaded CSV file. Any web browser and some code editors (like the freely available Visual Studio Code) can show the uploaded Jupyter Notebook if the required Python environment is set up correctly.
task_data_feature
kaggle.com
zip
Updated May 12, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Vishal (2020). task_data_feature [Dataset]. https://www.kaggle.com/vishalkesti/task-data-feature
Explore at:
zip(34908 bytes)Available download formats
Dataset updated
May 12, 2020
Authors
Vishal
Description
Task description

The file task_data.csv contains an example data set that has been artificially generated. The set consists of 400 samples where for each sample there are 10 different sensor readings available. The samples have been divided into two classes where the class label is either 1 or -1. The class labels define to what particular class a particular sample belongs.

Your task is to rank the sensors according to their importance/predictive power with respect to the class labels of the samples. Your solution should be a Python script or a Jupyter notebook file that generates a ranking of the sensors from the provided CSV file. The ranking should be in decreasing order where the first sensor is the most important one.

Additionally, please include an analysis of your method and results, with possible topics including:

your process of thought, i.e., how did you come to your solution?

properties of the artificially generated data set

strengths of your method: why does it produce a reasonable result?

weaknesses of your method: when would the method produce inaccurate results?

scalability of your method with respect to the number of features and/or samples

alternative methods and their respective strengths, weaknesses, scalability

Hint: There are many reasonable solutions to our task. We are looking for good, insightful ones that are the least arbitrary.

Code4ML 2.0

zenodo.org

csv, txt

Updated May 19, 2025

Facebook

Twitter

Click to copy link

Link copied

Cite

Anonimous authors; Anonimous authors (2025). Code4ML 2.0 [Dataset]. http://doi.org/10.5281/zenodo.15465737

Explore at:

csv, txtAvailable download formats

Unique identifier

https://doi.org/10.5281/zenodo.15465737

Dataset updated

May 19, 2025

Dataset provided by

Zenodohttp://zenodo.org/

Authors

Anonimous authors; Anonimous authors

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

This is an enriched version of the Code4ML dataset, a large-scale corpus of annotated Python code snippets, competition summaries, and data descriptions sourced from Kaggle. The initial release includes approximately 2.5 million snippets of machine learning code extracted from around 100,000 Jupyter notebooks. A portion of these snippets has been manually annotated by human assessors through a custom-built, user-friendly interface designed for this task.

The original dataset is organized into multiple CSV files, each containing structured data on different entities:

code_blocks.csv: Contains raw code snippets extracted from Kaggle.
kernels_meta.csv: Metadata for the notebooks (kernels) from which the code snippets were derived.
competitions_meta.csv: Metadata describing Kaggle competitions, including information about tasks and data.
markup_data.csv: Annotated code blocks with semantic types, allowing deeper analysis of code structure.
vertices.csv: A mapping from numeric IDs to semantic types and subclasses, used to interpret annotated code blocks.

Table 1. code_blocks.csv structure

Column	Description
code_blocks_index	Global index linking code blocks to markup_data.csv.
kernel_id	Identifier for the Kaggle Jupyter notebook from which the code block was extracted.
code_block_id	Position of the code block within the notebook.
code_block	The actual machine learning code snippet.

Table 2. kernels_meta.csv structure

Column	Description
kernel_id	Identifier for the Kaggle Jupyter notebook.
kaggle_score	Performance metric of the notebook.
kaggle_comments	Number of comments on the notebook.
kaggle_upvotes	Number of upvotes the notebook received.
kernel_link	URL to the notebook.
comp_name	Name of the associated Kaggle competition.

Table 3. competitions_meta.csv structure

Column	Description
comp_name	Name of the Kaggle competition.
description	Overview of the competition task.
data_type	Type of data used in the competition.
comp_type	Classification of the competition.
subtitle	Short description of the task.
EvaluationAlgorithmAbbreviation	Metric used for assessing competition submissions.
data_sources	Links to datasets used.
metric type	Class label for the assessment metric.

Table 4. markup_data.csv structure

Column	Description
code_block	Machine learning code block.
too_long	Flag indicating whether the block spans multiple semantic types.
marks	Confidence level of the annotation.
graph_vertex_id	ID of the semantic type.

The dataset allows mapping between these tables. For example:

code_blocks.csv can be linked to kernels_meta.csv via the kernel_id column.
kernels_meta.csv is connected to competitions_meta.csv through comp_name. To maintain quality, kernels_meta.csv includes only notebooks with available Kaggle scores.

In addition, data_with_preds.csv contains automatically classified code blocks, with a mapping back to code_blocks.csvvia the code_blocks_index column.

Code4ML 2.0 Enhancements

The updated Code4ML 2.0 corpus introduces kernels extracted from Meta Kaggle Code. These kernels correspond to the kaggle competitions launched since 2020. The natural descriptions of the competitions are retrieved with the aim of LLM.

Notebooks in kernels_meta2.csv may not have a Kaggle score but include a leaderboard ranking (rank), providing additional context for evaluation.

competitions_meta_2.csv is enriched with data_cards, decsribing the data used in the competitions.

Applications

The Code4ML 2.0 corpus is a versatile resource, enabling training and evaluation of models in areas such as:

Code generation
Code understanding
Natural language processing of code-related tasks

Datasets for the paper "ReSplit: Improving the Structure of Jupyter...
data.niaid.nih.gov
Updated Dec 25, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sergey Titov; Yaroslav Golubev; Timofey Bryksin (2021). Datasets for the paper "ReSplit: Improving the Structure of Jupyter Notebooks by Re-Splitting Their Cells" [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_5803344
Explore at:
Dataset updated
Dec 25, 2021
Dataset provided by
JetBrainshttp://jetbrains.com/
Authors
Sergey Titov; Yaroslav Golubev; Timofey Bryksin
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
In this archive, you can find all the data used in the paper "ReSplit: Improving the Structure of Jupyter Notebooks by Re-Splitting Their Cells".

sklearn_full_cells.csv is the dataset from the paper of Pimentel et al. filtered with only Data Science notebooks. complete.csv is the dataset obtained after the full run of ReSplit on the dataset: both merging and splitting. split.csv is the dataset obtained after running only the splitting part of our dataset. merged.csv is the dataset obtained after running only the merging part of our dataset. duplicates_id.csv contains the IDs of the duplicate notebooks for deduplication. changes.csv contains the IDs of the datasets, as well as their length before and after running ReSplit. survey.csv is the table with the results of the survey.

In the dataset CSVs, each line is a cell that has a unique identifier and an identifier of the corresonding notebook.
Galaxy Training Material for the 'Use Jupyter notebooks in Galaxy' tutorial
zenodo.org
csv
Updated Apr 22, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Delphine Lariviere; Delphine Lariviere; Teresa Müller; Teresa Müller (2025). Galaxy Training Material for the 'Use Jupyter notebooks in Galaxy' tutorial [Dataset]. http://doi.org/10.5281/zenodo.15263830
Explore at:
csvAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.15263830
Dataset updated
Apr 22, 2025
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Delphine Lariviere; Delphine Lariviere; Teresa Müller; Teresa Müller
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset was originally curated by Software Carpentry, a branch of The Carpentries non-profit organization, and is based on data from the Gapminder Foundation. It consists of six tabular CSV files containing GDP data for various countries across different years. The dataset was initially prepared for the Software Carpentry tutorial "Plotting and Programming in Python" and is also reused in the Galaxy Training Network (GTN) tutorial "Use Jupyter Notebooks in Galaxy."

This GTN tutorial provides an introduction to launching a Jupyter Notebook in Galaxy, installing dependencies, and importing and exporting data. It serves as a setup guide for a Jupyter Notebook environment that can be used to follow the Software Carpentry tutorial "Plotting and Programming in Python."
Z
Data set from: Rates of Compact Object Coalescences
data.niaid.nih.gov
data-staging.niaid.nih.gov
+1more
Updated Jul 16, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Floor Broekgaarden; Ilya Mandel (2024). Data set from: Rates of Compact Object Coalescences [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_5072400
Explore at:
Dataset updated
Jul 16, 2024
Dataset provided by
Monash Centre for Astrophysics, School of Physics and Astronomy
Center for Astrophysics | Harvard & Smithsonian
Authors
Floor Broekgaarden; Ilya Mandel
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Data from: Rates of Compact Object Coalescence

Brief overview: This Zenodo entry contains the data that has been used to make the figures for the living review "Rates of Compact Object Coalescence" by Ilya Mandel & Floor Broekgaarden (2021). To reproduce the figures, download all the *.csv files and run the jupyter notebook created to reproduce the results in the publicly available Github directory https://github.com/FloorBroekgaarden/Rates_of_Compact_Object_Coalescence (the exact jupyter notebook can be found here)

For any suggestions, questions or inquiry, please email one, or both, of the authors:

Ilya Mandel: ilya.mandel@monash.edu

Floor Broekgaarden: floor.broekgaarden@cfa.harvard.edu

We very much welcome suggestions for additional/missing literature with rate predictions or measurements.

Extra figures: Extra figures that can be used can be found here:

Vertical figures: https://docs.google.com/presentation/d/1GqJ0k2zpnxBGwIYNeQ0BfsLSU7H2942gspL-PN_iaJY/edit?usp=sharing

The authors are currently working on making an interactive tool for plotting the rates that will be available soon. In the mean time, feel free to send requests for plots/figures to the authors.

Reference If you use this data/code for publication, please cite both the paper: Mandel & Broekgaarden (2021) (https://ui.adsabs.harvard.edu/abs/2021arXiv210714239M/abstract) and the dataset on Zenodo through it's doi (see tabs on the right of this zenodo entry)

Details datafiles:

The PDF COC_rates_supplementary_material.pdf attached (and in the Github repository) describes how each of the rates in the data files of this Zenodo entry are retrieved. The other 26 files are .csv files, where each csv file contains the rates from one specific double compact object type: NS-NS, NS-BH or BH-BH, and specific rate group (isolated binary evolution, gravitational wave observations etc.). The files in this entry are:

Data_Mandel_and_Broekgaarden_2021.zip all the files below conveniently in one zip file so that you only have to do 1 download.

COC_rates_supplementary_material.pdf # PDF document describing how the rates are retrieved and quoted rom each study

BH-BH_rates_CHE.csv # BH-BH rates for chemically homogeneous evolution

BH-BH_rates_flybys.csv # BH-BH rates for formation from wide isolated binaries with dynamical interactions from flybys

BH-BH_rates_globular-clusters.csv # BH-BH rates for dynamical formation in globular clusters

BH-BH_rates_isolated-binary-evolution.csv # BH-BH rates for isolated binary evolution

BH-BH_rates_nuclear-clusters.csv # BH-BH rates for (dynamical )formation in (active) nuclear star clusters

BH-BH_rates_observations-GWs.csv # BH-BH rates for observations from gravitational waves

BH-BH_rates_population-III.csv # BH-BH rates for population-III stars

BH-BH_rates_primordial.csv # BH-BH rates for primordial formation

BH-BH_rates_triples.csv. # BH-BH rates for formation in (hierarchical) triples

BH-BH_rates_young-stellar-clusters.csv # BH-BH rates for dynamical formation in young/open star clusters

NS-BH_rates_CHE.csv # NS-BH rates for chemically homogeneous evolution

NS-BH_rates_flybys.csv # BH-BH rates for formation from wide isolated binaries with dynamical interactions from flybys

NS-BH_rates_globular-clusters.csv # NS-BH rates for dynamical formation in globular clusters

NS-BH_rates_isolated-binary-evolution.csv. # NS-BH rates for isolated binary evolution

NS-BH_rates_nuclear-clusters.csv # NS-BH rates for (dynamical )formation in (active) nuclear star clusters

NS-BH_rates_observations-GWs.csv # NS-BH rates for observations from gravitational waves

NS-BH_rates_population-III.csv # NS-BH rates for population-III stars

NS-BH_rates_triples.csv # NS-BH rates for formation in (hierarchical) triples

NS-BH_rates_young-stellar-clusters.csv # BH-BH rates for dynamical formation in young/open star clusters

NS-NS_rates_globular-clusters.csv # NS-NS rates for dynamical formation in globular clusters

NS-NS_rates_isolated-binary-evolution.csv # NS-NS rates for isolated binary evolution

NS-NS_rates_nuclear-clusters.csv # NS-NS rates for (dynamical )formation in (active) nuclear star clusters

NS-NS_rates_observations-GWs.csv # NS-NS rates for observations from gravitational waves

NS-NS_rates_observations-kilonovae.csv # NS-NS rates for observations from kilonovae

NS-NS_rates_observations-pulsars.csv # NS-NS rates for observations from Galactic pulsars

NS-NS_rates_observations-sGRBs.csv # NS-NS rates for observations short gamma-ray bursts

NS-NS_rates_triples.csv # NS-NS rates for formation in (hierarchical) triples

NS-NS_rates_young-stellar-clusters.csv # NS-NS rates for dynamical formation in young/open star clusters

Each csv file contains the following header: ADS year # year of the paper in the ADS entry ADS month # month of the paper in the ADS entry ADS abstract link # link to the ADS abstract ArXiv link # link to the ArXiv version of the paper First Author # name of the first author label string # label of the study, that corresponds to the label in the figure code (optional) # name of the code used in this study type of limit (for plotting, see jupyter notebook for a dictionary) # integer, that is used to map to a certain limit visualization in the plot (e.g. scatter points vs upper limit).

Each entry takes two columns in the csv files. One for the rates (quoted under the header 'rate [Gpc^-3 yr^-1]') and one for "notes" where we sometimes added notes about the rates (such as whether it is an upper or lower limit).
Data from: GreEn-ER - Electricity Consumption Data of a Tertiary Building
search.datacite.org
data.mendeley.com
Updated Sep 20, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Gustavo Martin Nascimento (2020). GreEn-ER - Electricity Consumption Data of a Tertiary Building [Dataset]. http://doi.org/10.17632/h8mmnthn5w
Explore at:
Unique identifier
https://doi.org/10.17632/h8mmnthn5w
Dataset updated
Sep 20, 2020
Dataset provided by
DataCitehttps://www.datacite.org/
Mendeley
Authors
Gustavo Martin Nascimento
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset provides electricity consumption data collected from the building management system of GreEn-ER. This building, located in Grenoble, hosts Grenoble-INP Ense³ Engineering School and the G2ELab (Grenoble Electrical Engineering Laboratory). It brings together in one place the teaching and research actors around new energy technologies. The electricity consumption of the building is highly monitored with plus than 300 meters. The data from each meter is available in one csv file, which contains two columns. One contains the Timestamp and the other contains de electricity consumption in kWh. The sampling rate for all data is 10 min. There are data available for 2017 and 2018. The dataset also contains data of the external temperature for 2017 and 2018. The files are structured as follows: - The main folder called "Data" contains 2 sub-folders, each one corresponding to one year (2017 and 2018). - Each sub-folder contains 3 other sub-folders, each one corresponding to a sector of the building. - The main folder "Data" also contains the csv files with the electricity consumption data of the whole building and a file called "Temp.csv" with the temperature data. - The separator used in the csv files is ";". - The sampling rate is 10 min and the unity of the consumption is kWh. It means that each sample corresponds to the energy consumption in these 10 minutes. So if the user wants to retrieve the mean power in this period (that corresponds to each sample), the value must be multiplied by 6. - Four Jupyter Notebook files, a format that allows combining text, graphics and code in python are also available. These files allow exploring all the data within the dataset. - These jupyter notebook files contains all the metadata necessary for understanding the system, like drawings of the system design, of the building etc. - Each file is named by the number of its meter. These numbers can be retrieved in tables and drawings available in the Jupyter Notebooks. - A couple of csv files with the system design are also available. They are called "TGBT1_n.csv", "TGBT2_n.csv" and "PREDIS-MHI_n.csv".
u
Data from: T1DiabetesGranada: a longitudinal multi-modal dataset of type 1...
produccioncientifica.ugr.es
data.niaid.nih.gov
Updated 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Rodriguez-Leon, Ciro; Aviles Perez, Maria Dolores; Banos, Oresti; Quesada-Charneco, Miguel; Lopez-Ibarra, Pablo J; Villalonga, Claudia; Munoz-Torres, Manuel; Rodriguez-Leon, Ciro; Aviles Perez, Maria Dolores; Banos, Oresti; Quesada-Charneco, Miguel; Lopez-Ibarra, Pablo J; Villalonga, Claudia; Munoz-Torres, Manuel (2023). T1DiabetesGranada: a longitudinal multi-modal dataset of type 1 diabetes mellitus [Dataset]. https://produccioncientifica.ugr.es/documentos/668fc429b9e7c03b01bd53b7
Explore at:
Dataset updated
2023
Authors
Rodriguez-Leon, Ciro; Aviles Perez, Maria Dolores; Banos, Oresti; Quesada-Charneco, Miguel; Lopez-Ibarra, Pablo J; Villalonga, Claudia; Munoz-Torres, Manuel; Rodriguez-Leon, Ciro; Aviles Perez, Maria Dolores; Banos, Oresti; Quesada-Charneco, Miguel; Lopez-Ibarra, Pablo J; Villalonga, Claudia; Munoz-Torres, Manuel
Description
T1DiabetesGranada

A longitudinal multi-modal dataset of type 1 diabetes mellitus

Documented by:

Rodriguez-Leon, C., Aviles-Perez, M. D., Banos, O., Quesada-Charneco, M., Lopez-Ibarra, P. J., Villalonga, C., & Munoz-Torres, M. (2023). T1DiabetesGranada: a longitudinal multi-modal dataset of type 1 diabetes mellitus. Scientific Data, 10(1), 916. https://doi.org/10.1038/s41597-023-02737-4

Background

Type 1 diabetes mellitus (T1D) patients face daily difficulties in keeping their blood glucose levels within appropriate ranges. Several techniques and devices, such as flash glucose meters, have been developed to help T1D patients improve their quality of life. Most recently, the data collected via these devices is being used to train advanced artificial intelligence models to characterize the evolution of the disease and support its management. The main problem for the generation of these models is the scarcity of data, as most published works use private or artificially generated datasets. For this reason, this work presents T1DiabetesGranada, a open under specific permission longitudinal dataset that not only provides continuous glucose levels, but also patient demographic and clinical information. The dataset includes 257780 days of measurements over four years from 736 T1D patients from the province of Granada, Spain. This dataset progresses significantly beyond the state of the art as one the longest and largest open datasets of continuous glucose measurements, thus boosting the development of new artificial intelligence models for glucose level characterization and prediction.

Data Records

The data are stored in four comma-separated values (CSV) files which are available in T1DiabetesGranada.zip. These files are described in detail below.

Patient_info.csv

Patient_info.csv is the file containing information about the patients, such as demographic data, start and end dates of blood glucose level measurements and biochemical parameters, number of biochemical parameters or number of diagnostics. This file is composed of 736 records, one for each patient in the dataset, and includes the following variables:

Patient_ID – Unique identifier of the patient. Format: LIB19XXXX.

Sex – Sex of the patient. Values: F (for female), masculine (for male)

Birth_year – Year of birth of the patient. Format: YYYY.

Initial_measurement_date – Date of the first blood glucose level measurement of the patient in the Glucose_measurements.csv file. Format: YYYY-MM-DD.

Final_measurement_date – Date of the last blood glucose level measurement of the patient in the Glucose_measurements.csv file. Format: YYYY-MM-DD.

Number_of_days_with_measures – Number of days with blood glucose level measurements of the patient, extracted from the Glucose_measurements.csv file. Values: ranging from 8 to 1463.

Number_of_measurements – Number of blood glucose level measurements of the patient, extracted from the Glucose_measurements.csv file. Values: ranging from 400 to 137292.

Initial_biochemical_parameters_date – Date of the first biochemical test to measure some biochemical parameter of the patient, extracted from the Biochemical_parameters.csv file. Format: YYYY-MM-DD.

Final_biochemical_parameters_date – Date of the last biochemical test to measure some biochemical parameter of the patient, extracted from the Biochemical_parameters.csv file. Format: YYYY-MM-DD.

Number_of_biochemical_parameters – Number of biochemical parameters measured on the patient, extracted from the Biochemical_parameters.csv file. Values: ranging from 4 to 846.

Number_of_diagnostics – Number of diagnoses realized to the patient, extracted from the Diagnostics.csv file. Values: ranging from 1 to 24.

Glucose_measurements.csv

Glucose_measurements.csv is the file containing the continuous blood glucose level measurements of the patients. The file is composed of more than 22.6 million records that constitute the time series of continuous blood glucose level measurements. It includes the following variables:

Patient_ID – Unique identifier of the patient. Format: LIB19XXXX.

Measurement_date – Date of the blood glucose level measurement. Format: YYYY-MM-DD.

Measurement_time – Time of the blood glucose level measurement. Format: HH:MM:SS.

Measurement – Value of the blood glucose level measurement in mg/dL. Values: ranging from 40 to 500.

Biochemical_parameters.csv

Biochemical_parameters.csv is the file containing data of the biochemical tests performed on patients to measure their biochemical parameters. This file is composed of 87482 records and includes the following variables:

Patient_ID – Unique identifier of the patient. Format: LIB19XXXX.

Reception_date – Date of receipt in the laboratory of the sample to measure the biochemical parameter. Format: YYYY-MM-DD.

Name – Name of the measured biochemical parameter. Values: 'Potassium', 'HDL cholesterol', 'Gammaglutamyl Transferase (GGT)', 'Creatinine', 'Glucose', 'Uric acid', 'Triglycerides', 'Alanine transaminase (GPT)', 'Chlorine', 'Thyrotropin (TSH)', 'Sodium', 'Glycated hemoglobin (Ac)', 'Total cholesterol', 'Albumin (urine)', 'Creatinine (urine)', 'Insulin', 'IA ANTIBODIES'.

Value – Value of the biochemical parameter. Values: ranging from -4.0 to 6446.74.

Diagnostics.csv

Diagnostics.csv is the file containing diagnoses of diabetes mellitus complications or other diseases that patients have in addition to type 1 diabetes mellitus. This file is composed of 1757 records and includes the following variables:

Patient_ID – Unique identifier of the patient. Format: LIB19XXXX.

Code – ICD-9-CM diagnosis code. Values: subset of 594 of the ICD-9-CM codes (https://www.cms.gov/Medicare/Coding/ICD9ProviderDiagnosticCodes/codes).

Description – ICD-9-CM long description. Values: subset of 594 of the ICD-9-CM long description (https://www.cms.gov/Medicare/Coding/ICD9ProviderDiagnosticCodes/codes).

Technical Validation

Blood glucose level measurements are collected using FreeStyle Libre devices, which are widely used for healthcare in patients with T1D. Abbott Diabetes Care, Inc., Alameda, CA, USA, the manufacturer company, has conducted validation studies of these devices concluding that the measurements made by their sensors compare to YSI analyzer devices (Xylem Inc.), the gold standard, yielding results of 99.9% of the time within zones A and B of the consensus error grid. In addition, other studies external to the company concluded that the accuracy of the measurements is adequate.

Moreover, it was also checked in most cases the blood glucose level measurements per patient were continuous (i.e. a sample at least every 15 minutes) in the Glucose_measurements.csv file as they should be.

Usage Notes

For data downloading, it is necessary to be authenticated on the Zenodo platform, accept the Data Usage Agreement and send a request specifying full name, email, and the justification of the data use. This request will be processed by the Secretary of the Department of Computer Engineering, Automatics, and Robotics of the University of Granada and access to the dataset will be granted.

The files that compose the dataset are CSV type files delimited by commas and are available in T1DiabetesGranada.zip. A Jupyter Notebook (Python v. 3.8) with code that may help to a better understanding of the dataset, with graphics and statistics, is available in UsageNotes.zip.

Graphs_and_stats.ipynb

The Jupyter Notebook generates tables, graphs and statistics for a better understanding of the dataset. It has four main sections, one dedicated to each file in the dataset. In addition, it has useful functions such as calculating the patient age, deleting a patient list from a dataset file and leaving only a patient list in a dataset file.

Code Availability

The dataset was generated using some custom code located in CodeAvailability.zip. The code is provided as Jupyter Notebooks created with Python v. 3.8. The code was used to conduct tasks such as data curation and transformation, and variables extraction.

Original_patient_info_curation.ipynb

In the Jupyter Notebook is preprocessed the original file with patient data. Mainly irrelevant rows and columns are removed, and the sex variable is recoded.

Glucose_measurements_curation.ipynb

In the Jupyter Notebook is preprocessed the original file with the continuous glucose level measurements of the patients. Principally rows without information or duplicated rows are removed and the variable with the timestamp is transformed into two new variables, measurement date and measurement time.

Biochemical_parameters_curation.ipynb

In the Jupyter Notebook is preprocessed the original file with patient data of the biochemical tests performed on patients to measure their biochemical parameters. Mainly irrelevant rows and columns are removed and the variable with the name of the measured biochemical parameter is translated.

Diagnostic_curation.ipynb

In the Jupyter Notebook is preprocessed the original file with patient data of the diagnoses of diabetes mellitus complications or other diseases that patients have in addition to T1D.

Get_patient_info_variables.ipynb

In the Jupyter Notebook it is coded the feature extraction process from the files Glucose_measurements.csv, Biochemical_parameters.csv and Diagnostics.csv to complete the file Patient_info.csv. It is divided into six sections, the first three to extract the features from each of the mentioned files and the next three to add the extracted features to the resulting new file.

Data Usage Agreement

The conditions for use are as follows:

You confirm that you will not attempt to re-identify research participants for any reason, including for re-identification theory research.

You commit to keeping the T1DiabetesGranada dataset confidential and secure and will not redistribute data or Zenodo account credentials.

You will require
artificially generated-sensorsData
kaggle.com
zip
Updated Sep 3, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ayoub Berdeddouch (2021). artificially generated-sensorsData [Dataset]. https://www.kaggle.com/datasets/ayoubberdeddouch/artificially-generatedsensorsdata
Explore at:
zip(34961 bytes)Available download formats
Dataset updated
Sep 3, 2021
Authors
Ayoub Berdeddouch
Description
Context

The task_data.csv contains an example data set that has been artificially generated. The set consists of 400 samples where for each sample there are 10 different sensor readings available.

The samples have been divided into two classes where the class label is either 1 or -1.

The class labels define to what particular class a particular sample belongs.

There's a story behind every dataset and here's your opportunity to share yours.

There are 10 Sensors: from Sensor0 till Sensor9. Target : class_label. Sample Index.

Inspiration

Your task if you choose to accept it?

is to rank the sensors according to their importance/predictive power with respect to the class labels of the samples. Your solution should be a Python script or a Jupyter notebook file that generates a ranking of the sensors from the provided CSV file. The ranking should be in decreasing order where the first sensor is the most important one.
d
Notebook for retrieval of National Water Model V2.0 Retrospective run...
search.dataone.org
Updated Dec 5, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
David Tarboton; Irene Garousi-Nejad (2021). Notebook for retrieval of National Water Model V2.0 Retrospective run results at SNOTEL sites [Dataset]. https://search.dataone.org/view/sha256%3A4737505e67074c23cd3a4fc55613cde1dfb39995c6f24320a0529f51f85d2a3c
Explore at:
Dataset updated
Dec 5, 2021
Dataset provided by
Hydroshare
Authors
David Tarboton; Irene Garousi-Nejad
Description
This notebook has been developed to download specific variables at specific sites from National Water Model V2.0 (NWM) Retrospective run results in Google Cloud. It has been set up to retrieve data at SNOTEL sites. An input file SNOTEL_indices_at_NWM.csv maps from SNOTEL site identifiers to NWM X and Y indices (Xindex and Yindex). A shell script (gget.sh) uses Google utilities (gsutil) to retrieve NWM grid file results for a fixed (limited) block of time. A python function then reads a set of designated variables from a set of designated sites from NWM grid files into CSV files for further analysis.

The input file SNOTEL_indices_at_NWM.csv is generated using Garousi-Nejad and Tarboton (2021).

Reference: Garousi-Nejad, I., D. Tarboton (2021). Notebook to get the indices of National Water Model V2.0 grid cells containing SNOTL sites, HydroShare, http://www.hydroshare.org/resource/7839e3f3b4f54940bd3591b24803cacf
Modelica Models and Jupyter Notebooks for System Analysis of Glucose Insulin...
zenodo.org
bin, csv
Updated Mar 16, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Tomáš Kulhánek; Tomáš Kulhánek; Jiří Kofránek; Jiří Kofránek (2020). Modelica Models and Jupyter Notebooks for System Analysis of Glucose Insulin Regulation [Dataset]. http://doi.org/10.5281/zenodo.3633324
Explore at:
bin, csvAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.3633324
Dataset updated
Mar 16, 2020
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Tomáš Kulhánek; Tomáš Kulhánek; Jiří Kofránek; Jiří Kofránek
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset contains source code of Modelica models of Glucose-Insulin regulation using different techniques.

Accompanying Jupyter notebook is demo for system analysis (parameter estimation) of artificial data and to match model simulation able to be used in Teaching class.

ModelicaIdentification.ipynb - default notebook - code contains ellipsis which needs to be replaced as per instruction in text

ModelicaIdentificationResolution.ipynb - notebook - code with exemplar solution to default notebook

glucoseinsulin.mo - Modelica source code

PatientInsulinConcentration.csv - sample data to be fitted against model

seminar11hw.GIExperiment.fmu - FMU exported from Modelica in order to run simulation in Python and PyFMI library

Thanks to the MYBINDER service, the Jupyter notebook can be viewed and executed as

https://mybinder.org/v2/zenodo/10.5281/zenodo.3633324/ note that you need to launch terminal first in Jupyter -> New -> Terminal and install pyfmi and matplotlib by:

conda install -c conda-forge pyfmi matplotlib

Most recent version with other models and notebooks https://mybinder.org/v2/gh/creative-connections/Bodylight-notebooks/master?filepath=Seminar11GlucoseInsulinIdentification/
Z
Datasets and Jupyter notebook for the structural analysis of protein-RNA...
data.niaid.nih.gov
zenodo.org
Updated Sep 8, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Andreani, Jessica (2024). Datasets and Jupyter notebook for the structural analysis of protein-RNA interface evolution [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_11126925
Explore at:
Dataset updated
Sep 8, 2024
Dataset provided by
Université Paris-Saclay, CEA, CNRS, Institute for Integrative Biology of the Cell (I2BC), 91198, Gif-sur-Yvette, France
Authors
Andreani, Jessica
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The present repository contains data and code related to our manuscript "Structural comparison of protein-RNA homologous interfaces reveals widespread overall conservation contrasted with versatility in polar contacts". In the manuscript, we analyze the evolution of protein-RNA interfaces by building a dataset of protein-RNA interologs (homologous interfaces) and exploring how interface contacts are conserved between homologous interfaces, as well as possible explanations for non-conserved contacts.

This repository contains the following files:

DataAnalysisNotebook.ipynb is a Jupyter notebook to reproduce contact conservation analysis and all figures from our manuscript, and to explore data

env.yaml is an environment file in order to build a Conda/Mamba environment to run the Jupyter notebook

2022-02-21-PDB.csv contains data from the PDB about 3D structures of complexes containing interacting protein and RNA chains (PDB structure identifier, chain identifiers, experimental technique and resolution)

2022-02-21-PDB_proteinchainscontactingRNAchains.groupbp.tsv contains more detailed information about interacting protein and RNA chains from these complexes (PDB and chain identifiers, protein and RNA size, interface size and number of contacts)

2022-02-21-PDB_proteinchainscontactingRNAchains.groupbp.txt.selectXE_2.50_p30_r10_pi5_ri5_rep_bc-100.out_RNAcl_0.99.tsv contains the same detailed information, restricted to the filtered dataset used as a starting point in our interolog search pipeline

PDBinterfaceAlign.csv contains information about the structural alignment of pairs of protein-RNA interactions (structural alignment TM-scores, sequence identity and coverage)

DataInterologsParam.tsv contains information about a pre-filtered set of 2587 potential interologs (including interface RMSD, sequence identity and coverage and interface size)

DataInterologsContactsFixedSASA.tsv contains detailed information about conserved and non-conserved contacts in the final set of 2022 interologs (atomic contacts, apolar contacts, hydrogen bonds, salt bridges and stacking information for aminoacid-nucleotide pairs, as well as information about whether each belongs to the interface, secondary structures, and the aminoacid surface accessibility and evolutionary conservation metrics) - compared to version 1, the calculation of solvent accessibility was fixed for a number of interolog pairs

DataCons.csv contains precomputed contact conservation metrics for each of the 2022 interolog pairs, for fast reproduction of manuscript figures

DataInterologsContactsResampledMaintainStructSeqId.tsv, DataInterologsContactsShuffled.tsv and DataInterologsShuffled.tsv relate to baselines computed for contact conservation assessment

clan.txt, clan_membership.txt, ecod.latest.domains.uniq.txt, rfam_interfaces_977.txt, DataGroupsECOD.tsv, DataGroupesRFAM.tsv, DataGroupsRFAMClan.tsv, DataInterfaceGroupsECOD.tsv and DataInterfaceGroupsRFAM.tsv relate to the ECOD (respectively Rfam) classification of protein domains (respectively RNA) in protein-RNA interfaces from our dataset

ListeIntraHbonds.pkl and ListeIntraSaltBridges.pkl are pickle-format data files containing intra-molecular hydrogen bonds and salt bridges (respectively) that are used to analyse scenarii of compensation for non-conserved polar contacts.

Facebook

Twitter

Click to copy link

Link copied

Cite

Johanna Schultz (2022). csv file for jupyter notebook [Dataset]. http://doi.org/10.6084/m9.figshare.21590175.v1

csv file for jupyter notebook

Explore at:

txtAvailable download formats

Unique identifier

https://doi.org/10.6084/m9.figshare.21590175.v1

Dataset updated

Nov 21, 2022

Dataset provided by

Figsharehttp://figshare.com/

Authors

Johanna Schultz

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

df_force_kin_filtered.csv is the data sheet used for the DATA3 python notebook to analyse kinematics and dynamics combined. It contains the footfalls that hava data for both: kinematics and dynamics. To see how this file is generated, read the first half of the jupyter notebook

Clear search

Close search

Google apps

Main menu

csv file for jupyter notebook

Cancer_dataset

Dataset

Contents

Data Cleaning, Translation & Split of the Dataset for the Automatic...

Speedtest Open Data - Four International cities - MEL, BKK, SHG, LAX plus...

Melb 14784 lines Avg download speed 69.4M Tests 0.39M

SHG 31207 lines Avg 233.7M Tests 0.56M

ALC 113 lines Avg 51.5M Test 1092

BKK 29684 lines Avg 215.9M Tests 1.2M

LAX 15505 lines Avg 218.5M Tests 0.74M

JavaScript code for retrieval of MODIS Collection 6 NDSI snow cover at...

SummaModel PreProcessing using csv file and PostProcessing using Plotting...

CUAHSI JupyterHub, Interfacing R from a Python3 Jupyter Notebook

Update CSV item in ArcGIS

Reporting behavior from WHO COVID-19 public data

task_data_feature

Task description

Code4ML 2.0

Code4ML 2.0 Enhancements

Applications

Datasets for the paper "ReSplit: Improving the Structure of Jupyter...

Galaxy Training Material for the 'Use Jupyter notebooks in Galaxy' tutorial

Data set from: Rates of Compact Object Coalescences

Data from: GreEn-ER - Electricity Consumption Data of a Tertiary Building

Data from: T1DiabetesGranada: a longitudinal multi-modal dataset of type 1...

artificially generated-sensorsData

Context

Inspiration

Notebook for retrieval of National Water Model V2.0 Retrospective run...

Modelica Models and Jupyter Notebooks for System Analysis of Glucose Insulin...

Datasets and Jupyter notebook for the structural analysis of protein-RNA...

csv file for jupyter notebook