100+ datasets found
  1. csv file for jupyter notebook

    • figshare.com
    txt
    Updated Nov 21, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Johanna Schultz (2022). csv file for jupyter notebook [Dataset]. http://doi.org/10.6084/m9.figshare.21590175.v1
    Explore at:
    txtAvailable download formats
    Dataset updated
    Nov 21, 2022
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Johanna Schultz
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    df_force_kin_filtered.csv is the data sheet used for the DATA3 python notebook to analyse kinematics and dynamics combined. It contains the footfalls that hava data for both: kinematics and dynamics. To see how this file is generated, read the first half of the jupyter notebook

  2. Cancer_dataset

    • kaggle.com
    zip
    Updated Oct 4, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Balirwa Alvin Daniel (2024). Cancer_dataset [Dataset]. https://www.kaggle.com/datasets/balirwaalvindaniel/cancer-dataset
    Explore at:
    zip(192617 bytes)Available download formats
    Dataset updated
    Oct 4, 2024
    Authors
    Balirwa Alvin Daniel
    Description

    Dataset

    This dataset was created by Balirwa Alvin Daniel

    Contents

  3. Data Cleaning, Translation & Split of the Dataset for the Automatic...

    • zenodo.org
    • data.niaid.nih.gov
    bin, csv +1
    Updated Apr 24, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Juliane Köhler; Juliane Köhler (2025). Data Cleaning, Translation & Split of the Dataset for the Automatic Classification of Documents for the Classification System for the Berliner Handreichungen zur Bibliotheks- und Informationswissenschaft [Dataset]. http://doi.org/10.5281/zenodo.6957842
    Explore at:
    text/x-python, csv, binAvailable download formats
    Dataset updated
    Apr 24, 2025
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Juliane Köhler; Juliane Köhler
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description
    • Cleaned_Dataset.csv – The combined CSV files of all scraped documents from DABI, e-LiS, o-bib and Springer.
    • Data_Cleaning.ipynb – The Jupyter Notebook with python code for the analysis and cleaning of the original dataset.
    • ger_train.csv – The German training set as CSV file.
    • ger_validation.csv – The German validation set as CSV file.
    • en_test.csv – The English test set as CSV file.
    • en_train.csv – The English training set as CSV file.
    • en_validation.csv – The English validation set as CSV file.
    • splitting.py – The python code for splitting a dataset into train, test and validation set.
    • DataSetTrans_de.csv – The final German dataset as a CSV file.
    • DataSetTrans_en.csv – The final English dataset as a CSV file.
    • translation.py – The python code for translating the cleaned dataset.
  4. Speedtest Open Data - Four International cities - MEL, BKK, SHG, LAX plus...

    • figshare.com
    txt
    Updated May 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Richard Ferrers; Speedtest Global Index (2023). Speedtest Open Data - Four International cities - MEL, BKK, SHG, LAX plus ALC - 2020, 2022 [Dataset]. http://doi.org/10.6084/m9.figshare.13621169.v24
    Explore at:
    txtAvailable download formats
    Dataset updated
    May 30, 2023
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Richard Ferrers; Speedtest Global Index
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset compares four cities FIXED-line broadband internet speeds: - Melbourne, AU - Bangkok, TH - Shanghai, CN - Los Angeles, US - Alice Springs, AU

    ERRATA: 1.Data is for Q3 2020, but some files are labelled incorrectly as 02-20 of June 20. They all should read Sept 20, or 09-20 as Q3 20, rather than Q2. Will rename and reload. Amended in v7.

    1. LAX file named 0320, when should be Q320. Amended in v8.

    *lines of data for each geojson file; a line equates to a 600m^2 location, inc total tests, devices used, and average upload and download speed - MEL 16181 locations/lines => 0.85M speedtests (16.7 tests per 100people) - SHG 31745 lines => 0.65M speedtests (2.5/100pp) - BKK 29296 lines => 1.5M speedtests (14.3/100pp) - LAX 15899 lines => 1.3M speedtests (10.4/100pp) - ALC 76 lines => 500 speedtests (2/100pp)

    Geojsons of these 2* by 2* extracts for MEL, BKK, SHG now added, and LAX added v6. Alice Springs added v15.

    This dataset unpacks, geospatially, data summaries provided in Speedtest Global Index (linked below). See Jupyter Notebook (*.ipynb) to interrogate geo data. See link to install Jupyter.

    ** To Do Will add Google Map versions so everyone can see without installing Jupyter. - Link to Google Map (BKK) added below. Key:Green > 100Mbps(Superfast). Black > 500Mbps (Ultrafast). CSV provided. Code in Speedtestv1.1.ipynb Jupyter Notebook. - Community (Whirlpool) surprised [Link: https://whrl.pl/RgAPTl] that Melb has 20% at or above 100Mbps. Suggest plot Top 20% on map for community. Google Map link - now added (and tweet).

    ** Python melb = au_tiles.cx[144:146 , -39:-37] #Lat/Lon extract shg = tiles.cx[120:122 , 30:32] #Lat/Lon extract bkk = tiles.cx[100:102 , 13:15] #Lat/Lon extract lax = tiles.cx[-118:-120, 33:35] #lat/Lon extract ALC=tiles.cx[132:134, -22:-24] #Lat/Lon extract

    Histograms (v9), and data visualisations (v3,5,9,11) will be provided. Data Sourced from - This is an extract of Speedtest Open data available at Amazon WS (link below - opendata.aws).

    **VERSIONS v.24 Add tweet and google map of Top 20% (over 100Mbps locations) in Mel Q322. Add v.1.5 MEL-Superfast notebook, and CSV of results (now on Google Map; link below). v23. Add graph of 2022 Broadband distribution, and compare 2020 - 2022. Updated v1.4 Jupyter notebook. v22. Add Import ipynb; workflow-import-4cities. v21. Add Q3 2022 data; five cities inc ALC. Geojson files. (2020; 4.3M tests 2022; 2.9M tests)

    Melb 14784 lines Avg download speed 69.4M Tests 0.39M

    SHG 31207 lines Avg 233.7M Tests 0.56M

    ALC 113 lines Avg 51.5M Test 1092

    BKK 29684 lines Avg 215.9M Tests 1.2M

    LAX 15505 lines Avg 218.5M Tests 0.74M

    v20. Speedtest - Five Cities inc ALC. v19. Add ALC2.ipynb. v18. Add ALC line graph. v17. Added ipynb for ALC. Added ALC to title.v16. Load Alice Springs Data Q221 - csv. Added Google Map link of ALC. v15. Load Melb Q1 2021 data - csv. V14. Added Melb Q1 2021 data - geojson. v13. Added Twitter link to pics. v12 Add Line-Compare pic (fastest 1000 locations) inc Jupyter (nbn-intl-v1.2.ipynb). v11 Add Line-Compare pic, plotting Four Cities on a graph. v10 Add Four Histograms in one pic. v9 Add Histogram for Four Cities. Add NBN-Intl.v1.1.ipynb (Jupyter Notebook). v8 Renamed LAX file to Q3, rather than 03. v7 Amended file names of BKK files to correctly label as Q3, not Q2 or 06. v6 Added LAX file. v5 Add screenshot of BKK Google Map. v4 Add BKK Google map(link below), and BKK csv mapping files. v3 replaced MEL map with big key version. Prev key was very tiny in top right corner. v2 Uploaded MEL, SHG, BKK data and Jupyter Notebook v1 Metadata record

    ** LICENCE AWS data licence on Speedtest data is "CC BY-NC-SA 4.0", so use of this data must be: - non-commercial (NC) - reuse must be share-alike (SA)(add same licence). This restricts the standard CC-BY Figshare licence.

    ** Other uses of Speedtest Open Data; - see link at Speedtest below.

  5. H

    JavaScript code for retrieval of MODIS Collection 6 NDSI snow cover at...

    • beta.hydroshare.org
    • hydroshare.org
    • +1more
    zip
    Updated Feb 11, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Irene Garousi-Nejad; David Tarboton (2022). JavaScript code for retrieval of MODIS Collection 6 NDSI snow cover at SNOTEL sites and a Jupyter Notebook to merge/reprocess data [Dataset]. http://doi.org/10.4211/hs.d287f010b2dd48edb0573415a56d47f8
    Explore at:
    zip(52.2 KB)Available download formats
    Dataset updated
    Feb 11, 2022
    Dataset provided by
    HydroShare
    Authors
    Irene Garousi-Nejad; David Tarboton
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Description

    This JavaScript code has been developed to retrieve NDSI_Snow_Cover from MODIS version 6 for SNOTEL sites using the Google Earth Engine platform. To successfully run the code, you should have a Google Earth Engine account. An input file, called NWM_grid_Western_US_polygons_SNOTEL_ID.zip, is required to run the code. This input file includes 1 km grid cells of the NWM containing SNOTEL sites. You need to upload this input file to the Assets tap in the Google Earth Engine code editor. You also need to import the MOD10A1.006 Terra Snow Cover Daily Global 500m collection to the Google Earth Engine code editor. You may do this by searching for the product name in the search bar of the code editor.

    The JavaScript works for s specified time range. We found that the best period is a month, which is the maximum allowable time range to do the computation for all SNOTEL sites on Google Earth Engine. The script consists of two main loops. The first loop retrieves data for the first day of a month up to day 28 through five periods. The second loop retrieves data from day 28 to the beginning of the next month. The results will be shown as graphs on the right-hand side of the Google Earth Engine code editor under the Console tap. To save results as CSV files, open each time-series by clicking on the button located at each graph's top right corner. From the new web page, you can click on the Download CSV button on top.

    Here is the link to the script path: https://code.earthengine.google.com/?scriptPath=users%2Figarousi%2Fppr2-modis%3AMODIS-monthly

    Then, run the Jupyter Notebook (merge_downloaded_csv_files.ipynb) to merge the downloaded CSV files that are stored for example in a folder called output/from_GEE into one single CSV file which is merged.csv. The Jupyter Notebook then applies some preprocessing steps and the final output is NDSI_FSCA_MODIS_C6.csv.

  6. d

    SummaModel PreProcessing using csv file and PostProcessing using Plotting...

    • search.dataone.org
    • hydroshare.org
    Updated Dec 5, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    YOUNGDON CHOI; Jonathan Goodall; Jeff Sadler; Andrew Bennett (2021). SummaModel PreProcessing using csv file and PostProcessing using Plotting Modules using PySUMMA [Dataset]. https://search.dataone.org/view/sha256%3Ab4b188e39a57501ae8384edc66544d2f3ca58901777708eb0407c7cbc46a178a
    Explore at:
    Dataset updated
    Dec 5, 2021
    Dataset provided by
    Hydroshare
    Authors
    YOUNGDON CHOI; Jonathan Goodall; Jeff Sadler; Andrew Bennett
    Time period covered
    Jul 1, 2001 - Sep 30, 2008
    Area covered
    Description

    Following the procedure of Jupyter notebook, users can create SUMMA input using *.csv files. If users want to create new SUMMA input, they can prepare input by csv format. After that, users are able to simulate SUMMA with PySUMMA and Plotting with SUMMA output by the various way.

    Following the step of this notebooks 1. Creating SUMMA input from *.csv files 2. Run SUMMA Model using PySUMMA 3. Plotting with SUMMA output - Time series Plotting - 2D Plotting (heatmap, hovmoller) - Calculating water balance variables and Plotting - Spatial Plotting with shapefile

  7. d

    CUAHSI JupyterHub, Interfacing R from a Python3 Jupyter Notebook

    • search.dataone.org
    • hydroshare.org
    • +1more
    Updated Dec 5, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Irene Garousi-Nejad; David Tarboton (2021). CUAHSI JupyterHub, Interfacing R from a Python3 Jupyter Notebook [Dataset]. https://search.dataone.org/view/sha256%3Ac76dbd34a70ec343ec7c771e32dd8568f9a55fbe6e7ee015dede9ece1760d812
    Explore at:
    Dataset updated
    Dec 5, 2021
    Dataset provided by
    Hydroshare
    Authors
    Irene Garousi-Nejad; David Tarboton
    Description

    Nowadays, there is a growing tendency to use Python and R in the analytics world for physical/statistical modeling and data visualization. As scientists, analysts, or statisticians, we oftentimes choose the tool that allows us to perform the task in the quickest and most accurate way possible. For some, that means Python. For others, that means R. For many, that means a combination of the two. However, it may take considerable time to switch between these two languages, passing data and models through .csv files or database systems. There's a solution that allows researchers to quickly and easily interface R and Python together in one single Jupyter Notebook. Here we provide a Jupyter Notebook that serves as a tutorial showing how to interface R and Python together in a Jupyter Notebook on CUAHSI JupyterHub. This tutorial walks you through the installation of rpy2 library and shows simple examples illustrating this interface.

  8. v

    Update CSV item in ArcGIS

    • anrgeodata.vermont.gov
    Updated Mar 18, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    ArcGIS Survey123 (2022). Update CSV item in ArcGIS [Dataset]. https://anrgeodata.vermont.gov/documents/dc69467c3e7243719c9125679bbcee9b
    Explore at:
    Dataset updated
    Mar 18, 2022
    Dataset authored and provided by
    ArcGIS Survey123
    Description

    ArcGIS Survey123 utilizes CSV data in several workflows, including external choice lists, the search() appearance, and pulldata() calculations. When you need to periodically update the CSV content used in a survey, a useful method is to upload the CSV files to your ArcGIS organization and link the CSV items to your survey. Once linked, any updates to the CSV items will automatically pull through to your survey without the need to republish the survey. To learn more about linking items to a survey, see Linked content.This notebook demonstrates how to automate updating a CSV item in your ArcGIS organization.Note: It is recommended to run this notebook on your computer in Jupyter Notebook or ArcGIS Pro, as that will provide the best experience when reading locally stored CSV files. If you intend to schedule this notebook in ArcGIS Online or ArcGIS Notebook Server, additional configuration may be required to read CSV files from online file storage, such as Microsoft OneDrive or Google Drive.

  9. d

    Reporting behavior from WHO COVID-19 public data

    • search.dataone.org
    • data.niaid.nih.gov
    • +1more
    Updated Jul 14, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Auss Abbood (2025). Reporting behavior from WHO COVID-19 public data [Dataset]. http://doi.org/10.5061/dryad.9s4mw6mmb
    Explore at:
    Dataset updated
    Jul 14, 2025
    Dataset provided by
    Dryad Digital Repository
    Authors
    Auss Abbood
    Time period covered
    Dec 16, 2022
    Description

    Objective Daily COVID-19 data reported by the World Health Organization (WHO) may provide the basis for political ad hoc decisions including travel restrictions. Data reported by countries, however, is heterogeneous and metrics to evaluate its quality are scarce. In this work, we analyzed COVID-19 case counts provided by WHO and developed tools to evaluate country-specific reporting behaviors. Methods In this retrospective cross-sectional study, COVID-19 data reported daily to WHO from 3rd January 2020 until 14th June 2021 were analyzed. We proposed the concepts of binary reporting rate and relative reporting behavior and performed descriptive analyses for all countries with these metrics. We developed a score to evaluate the consistency of incidence and binary reporting rates. Further, we performed spectral clustering of the binary reporting rate and relative reporting behavior to identify salient patterns in these metrics. Results Our final analysis included 222 countries and regions...., Data collection COVID-19 data was downloaded from WHO. Using a public repository, we have added the countries' full names to the WHO data set using the two-letter abbreviations for each country to merge both data sets. The provided COVID-19 data covers January 2020 until June 2021. We uploaded the final data set used for the analyses of this paper. Data processing We processed data using a Jupyter Notebook with a Python kernel and publically available external libraries. This upload contains the required Jupyter Notebook (reporting_behavior.ipynb) with all analyses and some additional work, a README, and the conda environment yml (env.yml)., Any text editor including Microsoft Excel and their free alternatives can open the uploaded CSV file. Any web browser and some code editors (like the freely available Visual Studio Code) can show the uploaded Jupyter Notebook if the required Python environment is set up correctly.

  10. task_data_feature

    • kaggle.com
    zip
    Updated May 12, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Vishal (2020). task_data_feature [Dataset]. https://www.kaggle.com/vishalkesti/task-data-feature
    Explore at:
    zip(34908 bytes)Available download formats
    Dataset updated
    May 12, 2020
    Authors
    Vishal
    Description

    Task description

    The file task_data.csv contains an example data set that has been artificially generated. The set consists of 400 samples where for each sample there are 10 different sensor readings available. The samples have been divided into two classes where the class label is either 1 or -1. The class labels define to what particular class a particular sample belongs.

    Your task is to rank the sensors according to their importance/predictive power with respect to the class labels of the samples. Your solution should be a Python script or a Jupyter notebook file that generates a ranking of the sensors from the provided CSV file. The ranking should be in decreasing order where the first sensor is the most important one.

    Additionally, please include an analysis of your method and results, with possible topics including:

    • your process of thought, i.e., how did you come to your solution?
    • properties of the artificially generated data set
    • strengths of your method: why does it produce a reasonable result?
    • weaknesses of your method: when would the method produce inaccurate results?
    • scalability of your method with respect to the number of features and/or samples
    • alternative methods and their respective strengths, weaknesses, scalability

    Hint: There are many reasonable solutions to our task. We are looking for good, insightful ones that are the least arbitrary.

  11. Code4ML 2.0

    • zenodo.org
    csv, txt
    Updated May 19, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Anonimous authors; Anonimous authors (2025). Code4ML 2.0 [Dataset]. http://doi.org/10.5281/zenodo.15465737
    Explore at:
    csv, txtAvailable download formats
    Dataset updated
    May 19, 2025
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Anonimous authors; Anonimous authors
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This is an enriched version of the Code4ML dataset, a large-scale corpus of annotated Python code snippets, competition summaries, and data descriptions sourced from Kaggle. The initial release includes approximately 2.5 million snippets of machine learning code extracted from around 100,000 Jupyter notebooks. A portion of these snippets has been manually annotated by human assessors through a custom-built, user-friendly interface designed for this task.

    The original dataset is organized into multiple CSV files, each containing structured data on different entities:

    • code_blocks.csv: Contains raw code snippets extracted from Kaggle.
    • kernels_meta.csv: Metadata for the notebooks (kernels) from which the code snippets were derived.
    • competitions_meta.csv: Metadata describing Kaggle competitions, including information about tasks and data.
    • markup_data.csv: Annotated code blocks with semantic types, allowing deeper analysis of code structure.
    • vertices.csv: A mapping from numeric IDs to semantic types and subclasses, used to interpret annotated code blocks.

    Table 1. code_blocks.csv structure

    ColumnDescription
    code_blocks_indexGlobal index linking code blocks to markup_data.csv.
    kernel_idIdentifier for the Kaggle Jupyter notebook from which the code block was extracted.
    code_block_id

    Position of the code block within the notebook.

    code_block

    The actual machine learning code snippet.

    Table 2. kernels_meta.csv structure

    ColumnDescription
    kernel_idIdentifier for the Kaggle Jupyter notebook.
    kaggle_scorePerformance metric of the notebook.
    kaggle_commentsNumber of comments on the notebook.
    kaggle_upvotesNumber of upvotes the notebook received.
    kernel_linkURL to the notebook.
    comp_nameName of the associated Kaggle competition.

    Table 3. competitions_meta.csv structure

    ColumnDescription
    comp_nameName of the Kaggle competition.
    descriptionOverview of the competition task.
    data_typeType of data used in the competition.
    comp_typeClassification of the competition.
    subtitleShort description of the task.
    EvaluationAlgorithmAbbreviationMetric used for assessing competition submissions.
    data_sourcesLinks to datasets used.
    metric typeClass label for the assessment metric.

    Table 4. markup_data.csv structure

    ColumnDescription
    code_blockMachine learning code block.
    too_longFlag indicating whether the block spans multiple semantic types.
    marksConfidence level of the annotation.
    graph_vertex_idID of the semantic type.

    The dataset allows mapping between these tables. For example:

    • code_blocks.csv can be linked to kernels_meta.csv via the kernel_id column.
    • kernels_meta.csv is connected to competitions_meta.csv through comp_name. To maintain quality, kernels_meta.csv includes only notebooks with available Kaggle scores.

    In addition, data_with_preds.csv contains automatically classified code blocks, with a mapping back to code_blocks.csvvia the code_blocks_index column.

    Code4ML 2.0 Enhancements

    The updated Code4ML 2.0 corpus introduces kernels extracted from Meta Kaggle Code. These kernels correspond to the kaggle competitions launched since 2020. The natural descriptions of the competitions are retrieved with the aim of LLM.

    Notebooks in kernels_meta2.csv may not have a Kaggle score but include a leaderboard ranking (rank), providing additional context for evaluation.

    competitions_meta_2.csv is enriched with data_cards, decsribing the data used in the competitions.

    Applications

    The Code4ML 2.0 corpus is a versatile resource, enabling training and evaluation of models in areas such as:

    • Code generation
    • Code understanding
    • Natural language processing of code-related tasks
  12. Datasets for the paper "ReSplit: Improving the Structure of Jupyter...

    • data.niaid.nih.gov
    Updated Dec 25, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sergey Titov; Yaroslav Golubev; Timofey Bryksin (2021). Datasets for the paper "ReSplit: Improving the Structure of Jupyter Notebooks by Re-Splitting Their Cells" [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_5803344
    Explore at:
    Dataset updated
    Dec 25, 2021
    Dataset provided by
    JetBrainshttp://jetbrains.com/
    Authors
    Sergey Titov; Yaroslav Golubev; Timofey Bryksin
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    In this archive, you can find all the data used in the paper "ReSplit: Improving the Structure of Jupyter Notebooks by Re-Splitting Their Cells".

    sklearn_full_cells.csv is the dataset from the paper of Pimentel et al. filtered with only Data Science notebooks. complete.csv is the dataset obtained after the full run of ReSplit on the dataset: both merging and splitting. split.csv is the dataset obtained after running only the splitting part of our dataset. merged.csv is the dataset obtained after running only the merging part of our dataset. duplicates_id.csv contains the IDs of the duplicate notebooks for deduplication. changes.csv contains the IDs of the datasets, as well as their length before and after running ReSplit. survey.csv is the table with the results of the survey.

    In the dataset CSVs, each line is a cell that has a unique identifier and an identifier of the corresonding notebook.

  13. Galaxy Training Material for the 'Use Jupyter notebooks in Galaxy' tutorial

    • zenodo.org
    csv
    Updated Apr 22, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Delphine Lariviere; Delphine Lariviere; Teresa Müller; Teresa Müller (2025). Galaxy Training Material for the 'Use Jupyter notebooks in Galaxy' tutorial [Dataset]. http://doi.org/10.5281/zenodo.15263830
    Explore at:
    csvAvailable download formats
    Dataset updated
    Apr 22, 2025
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Delphine Lariviere; Delphine Lariviere; Teresa Müller; Teresa Müller
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset was originally curated by Software Carpentry, a branch of The Carpentries non-profit organization, and is based on data from the Gapminder Foundation. It consists of six tabular CSV files containing GDP data for various countries across different years. The dataset was initially prepared for the Software Carpentry tutorial "Plotting and Programming in Python" and is also reused in the Galaxy Training Network (GTN) tutorial "Use Jupyter Notebooks in Galaxy."

    This GTN tutorial provides an introduction to launching a Jupyter Notebook in Galaxy, installing dependencies, and importing and exporting data. It serves as a setup guide for a Jupyter Notebook environment that can be used to follow the Software Carpentry tutorial "Plotting and Programming in Python."

  14. Z

    Data set from: Rates of Compact Object Coalescences

    • data.niaid.nih.gov
    • data-staging.niaid.nih.gov
    • +1more
    Updated Jul 16, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Floor Broekgaarden; Ilya Mandel (2024). Data set from: Rates of Compact Object Coalescences [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_5072400
    Explore at:
    Dataset updated
    Jul 16, 2024
    Dataset provided by
    Monash Centre for Astrophysics, School of Physics and Astronomy
    Center for Astrophysics | Harvard & Smithsonian
    Authors
    Floor Broekgaarden; Ilya Mandel
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Data from: Rates of Compact Object Coalescence

    Brief overview: This Zenodo entry contains the data that has been used to make the figures for the living review "Rates of Compact Object Coalescence" by Ilya Mandel & Floor Broekgaarden (2021). To reproduce the figures, download all the *.csv files and run the jupyter notebook created to reproduce the results in the publicly available Github directory https://github.com/FloorBroekgaarden/Rates_of_Compact_Object_Coalescence (the exact jupyter notebook can be found here)

    For any suggestions, questions or inquiry, please email one, or both, of the authors:

    Ilya Mandel: ilya.mandel@monash.edu

    Floor Broekgaarden: floor.broekgaarden@cfa.harvard.edu

    We very much welcome suggestions for additional/missing literature with rate predictions or measurements.

    Extra figures: Extra figures that can be used can be found here:

    Vertical figures: https://docs.google.com/presentation/d/1GqJ0k2zpnxBGwIYNeQ0BfsLSU7H2942gspL-PN_iaJY/edit?usp=sharing

    The authors are currently working on making an interactive tool for plotting the rates that will be available soon. In the mean time, feel free to send requests for plots/figures to the authors.

    Reference If you use this data/code for publication, please cite both the paper: Mandel & Broekgaarden (2021) (https://ui.adsabs.harvard.edu/abs/2021arXiv210714239M/abstract) and the dataset on Zenodo through it's doi (see tabs on the right of this zenodo entry)

    Details datafiles:

    The PDF COC_rates_supplementary_material.pdf attached (and in the Github repository) describes how each of the rates in the data files of this Zenodo entry are retrieved. The other 26 files are .csv files, where each csv file contains the rates from one specific double compact object type: NS-NS, NS-BH or BH-BH, and specific rate group (isolated binary evolution, gravitational wave observations etc.). The files in this entry are:

    Data_Mandel_and_Broekgaarden_2021.zip all the files below conveniently in one zip file so that you only have to do 1 download.

    COC_rates_supplementary_material.pdf # PDF document describing how the rates are retrieved and quoted rom each study

    BH-BH_rates_CHE.csv # BH-BH rates for chemically homogeneous evolution

    BH-BH_rates_flybys.csv # BH-BH rates for formation from wide isolated binaries with dynamical interactions from flybys

    BH-BH_rates_globular-clusters.csv # BH-BH rates for dynamical formation in globular clusters

    BH-BH_rates_isolated-binary-evolution.csv # BH-BH rates for isolated binary evolution

    BH-BH_rates_nuclear-clusters.csv # BH-BH rates for (dynamical )formation in (active) nuclear star clusters

    BH-BH_rates_observations-GWs.csv # BH-BH rates for observations from gravitational waves

    BH-BH_rates_population-III.csv # BH-BH rates for population-III stars

    BH-BH_rates_primordial.csv # BH-BH rates for primordial formation

    BH-BH_rates_triples.csv. # BH-BH rates for formation in (hierarchical) triples

    BH-BH_rates_young-stellar-clusters.csv # BH-BH rates for dynamical formation in young/open star clusters

    NS-BH_rates_CHE.csv # NS-BH rates for chemically homogeneous evolution

    NS-BH_rates_flybys.csv # BH-BH rates for formation from wide isolated binaries with dynamical interactions from flybys

    NS-BH_rates_globular-clusters.csv # NS-BH rates for dynamical formation in globular clusters

    NS-BH_rates_isolated-binary-evolution.csv. # NS-BH rates for isolated binary evolution

    NS-BH_rates_nuclear-clusters.csv # NS-BH rates for (dynamical )formation in (active) nuclear star clusters

    NS-BH_rates_observations-GWs.csv # NS-BH rates for observations from gravitational waves

    NS-BH_rates_population-III.csv # NS-BH rates for population-III stars

    NS-BH_rates_triples.csv # NS-BH rates for formation in (hierarchical) triples

    NS-BH_rates_young-stellar-clusters.csv # BH-BH rates for dynamical formation in young/open star clusters

    NS-NS_rates_globular-clusters.csv # NS-NS rates for dynamical formation in globular clusters

    NS-NS_rates_isolated-binary-evolution.csv # NS-NS rates for isolated binary evolution

    NS-NS_rates_nuclear-clusters.csv # NS-NS rates for (dynamical )formation in (active) nuclear star clusters

    NS-NS_rates_observations-GWs.csv # NS-NS rates for observations from gravitational waves

    NS-NS_rates_observations-kilonovae.csv # NS-NS rates for observations from kilonovae

    NS-NS_rates_observations-pulsars.csv # NS-NS rates for observations from Galactic pulsars

    NS-NS_rates_observations-sGRBs.csv # NS-NS rates for observations short gamma-ray bursts

    NS-NS_rates_triples.csv # NS-NS rates for formation in (hierarchical) triples

    NS-NS_rates_young-stellar-clusters.csv # NS-NS rates for dynamical formation in young/open star clusters

    Each csv file contains the following header: ADS year # year of the paper in the ADS entry ADS month # month of the paper in the ADS entry ADS abstract link # link to the ADS abstract ArXiv link # link to the ArXiv version of the paper First Author # name of the first author label string # label of the study, that corresponds to the label in the figure code (optional) # name of the code used in this study type of limit (for plotting, see jupyter notebook for a dictionary) # integer, that is used to map to a certain limit visualization in the plot (e.g. scatter points vs upper limit).

    Each entry takes two columns in the csv files. One for the rates (quoted under the header 'rate [Gpc^-3 yr^-1]') and one for "notes" where we sometimes added notes about the rates (such as whether it is an upper or lower limit).

  15. Data from: GreEn-ER - Electricity Consumption Data of a Tertiary Building

    • search.datacite.org
    • data.mendeley.com
    Updated Sep 20, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Gustavo Martin Nascimento (2020). GreEn-ER - Electricity Consumption Data of a Tertiary Building [Dataset]. http://doi.org/10.17632/h8mmnthn5w
    Explore at:
    Dataset updated
    Sep 20, 2020
    Dataset provided by
    DataCitehttps://www.datacite.org/
    Mendeley
    Authors
    Gustavo Martin Nascimento
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset provides electricity consumption data collected from the building management system of GreEn-ER. This building, located in Grenoble, hosts Grenoble-INP Ense³ Engineering School and the G2ELab (Grenoble Electrical Engineering Laboratory). It brings together in one place the teaching and research actors around new energy technologies. The electricity consumption of the building is highly monitored with plus than 300 meters. The data from each meter is available in one csv file, which contains two columns. One contains the Timestamp and the other contains de electricity consumption in kWh. The sampling rate for all data is 10 min. There are data available for 2017 and 2018. The dataset also contains data of the external temperature for 2017 and 2018. The files are structured as follows: - The main folder called "Data" contains 2 sub-folders, each one corresponding to one year (2017 and 2018). - Each sub-folder contains 3 other sub-folders, each one corresponding to a sector of the building. - The main folder "Data" also contains the csv files with the electricity consumption data of the whole building and a file called "Temp.csv" with the temperature data. - The separator used in the csv files is ";". - The sampling rate is 10 min and the unity of the consumption is kWh. It means that each sample corresponds to the energy consumption in these 10 minutes. So if the user wants to retrieve the mean power in this period (that corresponds to each sample), the value must be multiplied by 6. - Four Jupyter Notebook files, a format that allows combining text, graphics and code in python are also available. These files allow exploring all the data within the dataset. - These jupyter notebook files contains all the metadata necessary for understanding the system, like drawings of the system design, of the building etc. - Each file is named by the number of its meter. These numbers can be retrieved in tables and drawings available in the Jupyter Notebooks. - A couple of csv files with the system design are also available. They are called "TGBT1_n.csv", "TGBT2_n.csv" and "PREDIS-MHI_n.csv".

  16. u

    Data from: T1DiabetesGranada: a longitudinal multi-modal dataset of type 1...

    • produccioncientifica.ugr.es
    • data.niaid.nih.gov
    Updated 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rodriguez-Leon, Ciro; Aviles Perez, Maria Dolores; Banos, Oresti; Quesada-Charneco, Miguel; Lopez-Ibarra, Pablo J; Villalonga, Claudia; Munoz-Torres, Manuel; Rodriguez-Leon, Ciro; Aviles Perez, Maria Dolores; Banos, Oresti; Quesada-Charneco, Miguel; Lopez-Ibarra, Pablo J; Villalonga, Claudia; Munoz-Torres, Manuel (2023). T1DiabetesGranada: a longitudinal multi-modal dataset of type 1 diabetes mellitus [Dataset]. https://produccioncientifica.ugr.es/documentos/668fc429b9e7c03b01bd53b7
    Explore at:
    Dataset updated
    2023
    Authors
    Rodriguez-Leon, Ciro; Aviles Perez, Maria Dolores; Banos, Oresti; Quesada-Charneco, Miguel; Lopez-Ibarra, Pablo J; Villalonga, Claudia; Munoz-Torres, Manuel; Rodriguez-Leon, Ciro; Aviles Perez, Maria Dolores; Banos, Oresti; Quesada-Charneco, Miguel; Lopez-Ibarra, Pablo J; Villalonga, Claudia; Munoz-Torres, Manuel
    Description

    T1DiabetesGranada

    A longitudinal multi-modal dataset of type 1 diabetes mellitus

    Documented by:

    Rodriguez-Leon, C., Aviles-Perez, M. D., Banos, O., Quesada-Charneco, M., Lopez-Ibarra, P. J., Villalonga, C., & Munoz-Torres, M. (2023). T1DiabetesGranada: a longitudinal multi-modal dataset of type 1 diabetes mellitus. Scientific Data, 10(1), 916. https://doi.org/10.1038/s41597-023-02737-4

    Background

    Type 1 diabetes mellitus (T1D) patients face daily difficulties in keeping their blood glucose levels within appropriate ranges. Several techniques and devices, such as flash glucose meters, have been developed to help T1D patients improve their quality of life. Most recently, the data collected via these devices is being used to train advanced artificial intelligence models to characterize the evolution of the disease and support its management. The main problem for the generation of these models is the scarcity of data, as most published works use private or artificially generated datasets. For this reason, this work presents T1DiabetesGranada, a open under specific permission longitudinal dataset that not only provides continuous glucose levels, but also patient demographic and clinical information. The dataset includes 257780 days of measurements over four years from 736 T1D patients from the province of Granada, Spain. This dataset progresses significantly beyond the state of the art as one the longest and largest open datasets of continuous glucose measurements, thus boosting the development of new artificial intelligence models for glucose level characterization and prediction.

    Data Records

    The data are stored in four comma-separated values (CSV) files which are available in T1DiabetesGranada.zip. These files are described in detail below.

    Patient_info.csv

    Patient_info.csv is the file containing information about the patients, such as demographic data, start and end dates of blood glucose level measurements and biochemical parameters, number of biochemical parameters or number of diagnostics. This file is composed of 736 records, one for each patient in the dataset, and includes the following variables:

    Patient_ID – Unique identifier of the patient. Format: LIB19XXXX.

    Sex – Sex of the patient. Values: F (for female), masculine (for male)

    Birth_year – Year of birth of the patient. Format: YYYY.

    Initial_measurement_date – Date of the first blood glucose level measurement of the patient in the Glucose_measurements.csv file. Format: YYYY-MM-DD.

    Final_measurement_date – Date of the last blood glucose level measurement of the patient in the Glucose_measurements.csv file. Format: YYYY-MM-DD.

    Number_of_days_with_measures – Number of days with blood glucose level measurements of the patient, extracted from the Glucose_measurements.csv file. Values: ranging from 8 to 1463.

    Number_of_measurements – Number of blood glucose level measurements of the patient, extracted from the Glucose_measurements.csv file. Values: ranging from 400 to 137292.

    Initial_biochemical_parameters_date – Date of the first biochemical test to measure some biochemical parameter of the patient, extracted from the Biochemical_parameters.csv file. Format: YYYY-MM-DD.

    Final_biochemical_parameters_date – Date of the last biochemical test to measure some biochemical parameter of the patient, extracted from the Biochemical_parameters.csv file. Format: YYYY-MM-DD.

    Number_of_biochemical_parameters – Number of biochemical parameters measured on the patient, extracted from the Biochemical_parameters.csv file. Values: ranging from 4 to 846.

    Number_of_diagnostics – Number of diagnoses realized to the patient, extracted from the Diagnostics.csv file. Values: ranging from 1 to 24.

    Glucose_measurements.csv

    Glucose_measurements.csv is the file containing the continuous blood glucose level measurements of the patients. The file is composed of more than 22.6 million records that constitute the time series of continuous blood glucose level measurements. It includes the following variables:

    Patient_ID – Unique identifier of the patient. Format: LIB19XXXX.

    Measurement_date – Date of the blood glucose level measurement. Format: YYYY-MM-DD.

    Measurement_time – Time of the blood glucose level measurement. Format: HH:MM:SS.

    Measurement – Value of the blood glucose level measurement in mg/dL. Values: ranging from 40 to 500.

    Biochemical_parameters.csv

    Biochemical_parameters.csv is the file containing data of the biochemical tests performed on patients to measure their biochemical parameters. This file is composed of 87482 records and includes the following variables:

    Patient_ID – Unique identifier of the patient. Format: LIB19XXXX.

    Reception_date – Date of receipt in the laboratory of the sample to measure the biochemical parameter. Format: YYYY-MM-DD.

    Name – Name of the measured biochemical parameter. Values: 'Potassium', 'HDL cholesterol', 'Gammaglutamyl Transferase (GGT)', 'Creatinine', 'Glucose', 'Uric acid', 'Triglycerides', 'Alanine transaminase (GPT)', 'Chlorine', 'Thyrotropin (TSH)', 'Sodium', 'Glycated hemoglobin (Ac)', 'Total cholesterol', 'Albumin (urine)', 'Creatinine (urine)', 'Insulin', 'IA ANTIBODIES'.

    Value – Value of the biochemical parameter. Values: ranging from -4.0 to 6446.74.

    Diagnostics.csv

    Diagnostics.csv is the file containing diagnoses of diabetes mellitus complications or other diseases that patients have in addition to type 1 diabetes mellitus. This file is composed of 1757 records and includes the following variables:

    Patient_ID – Unique identifier of the patient. Format: LIB19XXXX.

    Code – ICD-9-CM diagnosis code. Values: subset of 594 of the ICD-9-CM codes (https://www.cms.gov/Medicare/Coding/ICD9ProviderDiagnosticCodes/codes).

    Description – ICD-9-CM long description. Values: subset of 594 of the ICD-9-CM long description (https://www.cms.gov/Medicare/Coding/ICD9ProviderDiagnosticCodes/codes).

    Technical Validation

    Blood glucose level measurements are collected using FreeStyle Libre devices, which are widely used for healthcare in patients with T1D. Abbott Diabetes Care, Inc., Alameda, CA, USA, the manufacturer company, has conducted validation studies of these devices concluding that the measurements made by their sensors compare to YSI analyzer devices (Xylem Inc.), the gold standard, yielding results of 99.9% of the time within zones A and B of the consensus error grid. In addition, other studies external to the company concluded that the accuracy of the measurements is adequate.

    Moreover, it was also checked in most cases the blood glucose level measurements per patient were continuous (i.e. a sample at least every 15 minutes) in the Glucose_measurements.csv file as they should be.

    Usage Notes

    For data downloading, it is necessary to be authenticated on the Zenodo platform, accept the Data Usage Agreement and send a request specifying full name, email, and the justification of the data use. This request will be processed by the Secretary of the Department of Computer Engineering, Automatics, and Robotics of the University of Granada and access to the dataset will be granted.

    The files that compose the dataset are CSV type files delimited by commas and are available in T1DiabetesGranada.zip. A Jupyter Notebook (Python v. 3.8) with code that may help to a better understanding of the dataset, with graphics and statistics, is available in UsageNotes.zip.

    Graphs_and_stats.ipynb

    The Jupyter Notebook generates tables, graphs and statistics for a better understanding of the dataset. It has four main sections, one dedicated to each file in the dataset. In addition, it has useful functions such as calculating the patient age, deleting a patient list from a dataset file and leaving only a patient list in a dataset file.

    Code Availability

    The dataset was generated using some custom code located in CodeAvailability.zip. The code is provided as Jupyter Notebooks created with Python v. 3.8. The code was used to conduct tasks such as data curation and transformation, and variables extraction.

    Original_patient_info_curation.ipynb

    In the Jupyter Notebook is preprocessed the original file with patient data. Mainly irrelevant rows and columns are removed, and the sex variable is recoded.

    Glucose_measurements_curation.ipynb

    In the Jupyter Notebook is preprocessed the original file with the continuous glucose level measurements of the patients. Principally rows without information or duplicated rows are removed and the variable with the timestamp is transformed into two new variables, measurement date and measurement time.

    Biochemical_parameters_curation.ipynb

    In the Jupyter Notebook is preprocessed the original file with patient data of the biochemical tests performed on patients to measure their biochemical parameters. Mainly irrelevant rows and columns are removed and the variable with the name of the measured biochemical parameter is translated.

    Diagnostic_curation.ipynb

    In the Jupyter Notebook is preprocessed the original file with patient data of the diagnoses of diabetes mellitus complications or other diseases that patients have in addition to T1D.

    Get_patient_info_variables.ipynb

    In the Jupyter Notebook it is coded the feature extraction process from the files Glucose_measurements.csv, Biochemical_parameters.csv and Diagnostics.csv to complete the file Patient_info.csv. It is divided into six sections, the first three to extract the features from each of the mentioned files and the next three to add the extracted features to the resulting new file.

    Data Usage Agreement

    The conditions for use are as follows:

    You confirm that you will not attempt to re-identify research participants for any reason, including for re-identification theory research.

    You commit to keeping the T1DiabetesGranada dataset confidential and secure and will not redistribute data or Zenodo account credentials.

    You will require

  17. artificially generated-sensorsData

    • kaggle.com
    zip
    Updated Sep 3, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ayoub Berdeddouch (2021). artificially generated-sensorsData [Dataset]. https://www.kaggle.com/datasets/ayoubberdeddouch/artificially-generatedsensorsdata
    Explore at:
    zip(34961 bytes)Available download formats
    Dataset updated
    Sep 3, 2021
    Authors
    Ayoub Berdeddouch
    Description

    Context

    The task_data.csv contains an example data set that has been artificially generated. The set consists of 400 samples where for each sample there are 10 different sensor readings available.

    The samples have been divided into two classes where the class label is either 1 or -1.

    The class labels define to what particular class a particular sample belongs.

    There's a story behind every dataset and here's your opportunity to share yours.

    There are 10 Sensors: from Sensor0 till Sensor9. Target : class_label. Sample Index.

    Inspiration

    Your task if you choose to accept it?

    is to rank the sensors according to their importance/predictive power with respect to the class labels of the samples. Your solution should be a Python script or a Jupyter notebook file that generates a ranking of the sensors from the provided CSV file. The ranking should be in decreasing order where the first sensor is the most important one.

  18. d

    Notebook for retrieval of National Water Model V2.0 Retrospective run...

    • search.dataone.org
    Updated Dec 5, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    David Tarboton; Irene Garousi-Nejad (2021). Notebook for retrieval of National Water Model V2.0 Retrospective run results at SNOTEL sites [Dataset]. https://search.dataone.org/view/sha256%3A4737505e67074c23cd3a4fc55613cde1dfb39995c6f24320a0529f51f85d2a3c
    Explore at:
    Dataset updated
    Dec 5, 2021
    Dataset provided by
    Hydroshare
    Authors
    David Tarboton; Irene Garousi-Nejad
    Description

    This notebook has been developed to download specific variables at specific sites from National Water Model V2.0 (NWM) Retrospective run results in Google Cloud. It has been set up to retrieve data at SNOTEL sites. An input file SNOTEL_indices_at_NWM.csv maps from SNOTEL site identifiers to NWM X and Y indices (Xindex and Yindex). A shell script (gget.sh) uses Google utilities (gsutil) to retrieve NWM grid file results for a fixed (limited) block of time. A python function then reads a set of designated variables from a set of designated sites from NWM grid files into CSV files for further analysis.

    The input file SNOTEL_indices_at_NWM.csv is generated using Garousi-Nejad and Tarboton (2021).

    Reference: Garousi-Nejad, I., D. Tarboton (2021). Notebook to get the indices of National Water Model V2.0 grid cells containing SNOTL sites, HydroShare, http://www.hydroshare.org/resource/7839e3f3b4f54940bd3591b24803cacf

  19. Modelica Models and Jupyter Notebooks for System Analysis of Glucose Insulin...

    • zenodo.org
    bin, csv
    Updated Mar 16, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Tomáš Kulhánek; Tomáš Kulhánek; Jiří Kofránek; Jiří Kofránek (2020). Modelica Models and Jupyter Notebooks for System Analysis of Glucose Insulin Regulation [Dataset]. http://doi.org/10.5281/zenodo.3633324
    Explore at:
    bin, csvAvailable download formats
    Dataset updated
    Mar 16, 2020
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Tomáš Kulhánek; Tomáš Kulhánek; Jiří Kofránek; Jiří Kofránek
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset contains source code of Modelica models of Glucose-Insulin regulation using different techniques.

    Accompanying Jupyter notebook is demo for system analysis (parameter estimation) of artificial data and to match model simulation able to be used in Teaching class.

    • ModelicaIdentification.ipynb - default notebook - code contains ellipsis which needs to be replaced as per instruction in text
    • ModelicaIdentificationResolution.ipynb - notebook - code with exemplar solution to default notebook
    • glucoseinsulin.mo - Modelica source code
    • PatientInsulinConcentration.csv - sample data to be fitted against model
    • seminar11hw.GIExperiment.fmu - FMU exported from Modelica in order to run simulation in Python and PyFMI library

    Thanks to the MYBINDER service, the Jupyter notebook can be viewed and executed as

    conda install -c conda-forge pyfmi matplotlib
  20. Z

    Datasets and Jupyter notebook for the structural analysis of protein-RNA...

    • data.niaid.nih.gov
    • zenodo.org
    Updated Sep 8, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Andreani, Jessica (2024). Datasets and Jupyter notebook for the structural analysis of protein-RNA interface evolution [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_11126925
    Explore at:
    Dataset updated
    Sep 8, 2024
    Dataset provided by
    Université Paris-Saclay, CEA, CNRS, Institute for Integrative Biology of the Cell (I2BC), 91198, Gif-sur-Yvette, France
    Authors
    Andreani, Jessica
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The present repository contains data and code related to our manuscript "Structural comparison of protein-RNA homologous interfaces reveals widespread overall conservation contrasted with versatility in polar contacts". In the manuscript, we analyze the evolution of protein-RNA interfaces by building a dataset of protein-RNA interologs (homologous interfaces) and exploring how interface contacts are conserved between homologous interfaces, as well as possible explanations for non-conserved contacts.

    This repository contains the following files:

    DataAnalysisNotebook.ipynb is a Jupyter notebook to reproduce contact conservation analysis and all figures from our manuscript, and to explore data

    env.yaml is an environment file in order to build a Conda/Mamba environment to run the Jupyter notebook

    2022-02-21-PDB.csv contains data from the PDB about 3D structures of complexes containing interacting protein and RNA chains (PDB structure identifier, chain identifiers, experimental technique and resolution)

    2022-02-21-PDB_proteinchainscontactingRNAchains.groupbp.tsv contains more detailed information about interacting protein and RNA chains from these complexes (PDB and chain identifiers, protein and RNA size, interface size and number of contacts)

    2022-02-21-PDB_proteinchainscontactingRNAchains.groupbp.txt.selectXE_2.50_p30_r10_pi5_ri5_rep_bc-100.out_RNAcl_0.99.tsv contains the same detailed information, restricted to the filtered dataset used as a starting point in our interolog search pipeline

    PDBinterfaceAlign.csv contains information about the structural alignment of pairs of protein-RNA interactions (structural alignment TM-scores, sequence identity and coverage)

    DataInterologsParam.tsv contains information about a pre-filtered set of 2587 potential interologs (including interface RMSD, sequence identity and coverage and interface size)

    DataInterologsContactsFixedSASA.tsv contains detailed information about conserved and non-conserved contacts in the final set of 2022 interologs (atomic contacts, apolar contacts, hydrogen bonds, salt bridges and stacking information for aminoacid-nucleotide pairs, as well as information about whether each belongs to the interface, secondary structures, and the aminoacid surface accessibility and evolutionary conservation metrics) - compared to version 1, the calculation of solvent accessibility was fixed for a number of interolog pairs

    DataCons.csv contains precomputed contact conservation metrics for each of the 2022 interolog pairs, for fast reproduction of manuscript figures

    DataInterologsContactsResampledMaintainStructSeqId.tsv, DataInterologsContactsShuffled.tsv and DataInterologsShuffled.tsv relate to baselines computed for contact conservation assessment

    clan.txt, clan_membership.txt, ecod.latest.domains.uniq.txt, rfam_interfaces_977.txt, DataGroupsECOD.tsv, DataGroupesRFAM.tsv, DataGroupsRFAMClan.tsv, DataInterfaceGroupsECOD.tsv and DataInterfaceGroupsRFAM.tsv relate to the ECOD (respectively Rfam) classification of protein domains (respectively RNA) in protein-RNA interfaces from our dataset

    ListeIntraHbonds.pkl and ListeIntraSaltBridges.pkl are pickle-format data files containing intra-molecular hydrogen bonds and salt bridges (respectively) that are used to analyse scenarii of compensation for non-conserved polar contacts.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Johanna Schultz (2022). csv file for jupyter notebook [Dataset]. http://doi.org/10.6084/m9.figshare.21590175.v1
Organization logo

csv file for jupyter notebook

Explore at:
txtAvailable download formats
Dataset updated
Nov 21, 2022
Dataset provided by
Figsharehttp://figshare.com/
Authors
Johanna Schultz
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

df_force_kin_filtered.csv is the data sheet used for the DATA3 python notebook to analyse kinematics and dynamics combined. It contains the footfalls that hava data for both: kinematics and dynamics. To see how this file is generated, read the first half of the jupyter notebook

Search
Clear search
Close search
Google apps
Main menu