32 datasets found
  1. PandasPlotBench

    • huggingface.co
    Updated Nov 25, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    PandasPlotBench [Dataset]. https://huggingface.co/datasets/JetBrains-Research/PandasPlotBench
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Nov 25, 2024
    Dataset provided by
    JetBrainshttp://jetbrains.com/
    Authors
    JetBrains Research
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    PandasPlotBench

    PandasPlotBench is a benchmark to assess the capability of models in writing the code for visualizations given the description of the Pandas DataFrame. 🛠️ Task. Given the plotting task and the description of a Pandas DataFrame, write the code to build a plot. The dataset is based on the MatPlotLib gallery. The paper can be found in arXiv: https://arxiv.org/abs/2412.02764v1. To score your model on this dataset, you can use the our GitHub repository. 📩 If you have… See the full description on the dataset page: https://huggingface.co/datasets/JetBrains-Research/PandasPlotBench.

  2. o

    CO2 emissions DF - analysis

    • market.oceanprotocol.com
    Updated Dec 12, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    cesar (2023). CO2 emissions DF - analysis [Dataset]. https://market.oceanprotocol.com/asset/did:op:da39016a45f3b473a386f9380dd760797dce7bb8e2950f78f41cd9c777d499c2
    Explore at:
    Dataset updated
    Dec 12, 2023
    Dataset authored and provided by
    cesar
    License

    https://market.oceanprotocol.com/termshttps://market.oceanprotocol.com/terms

    Description

    this df is about the analysis of the predictions of CO2 fossil emissions and GHG

  3. Z

    Python Time Normalized Superposed Epoch Analysis (SEAnorm) Example Data Set

    • data.niaid.nih.gov
    Updated Jul 15, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Walton, Sam D. (2022). Python Time Normalized Superposed Epoch Analysis (SEAnorm) Example Data Set [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_6835136
    Explore at:
    Dataset updated
    Jul 15, 2022
    Dataset provided by
    Walton, Sam D.
    Murphy, Kyle R.
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Solar Wind Omni and SAMPEX ( Solar Anomalous and Magnetospheric Particle Explorer) datasets used in examples for SEAnorm, a time normalized superposed epoch analysis package in python.

    Both data sets are stored as either a HDF5 or a compressed csv file (csv.bz2) which contain a Pandas DataFrame of either the Solar Wind Omni and SAMPEX data sets. The data sets where written with pandas.DataFrame.to_hdf() and pandas.DataFrame.to_csv() using a compression level of 9. The DataFrames can be read using pandas.DataFrame.read_hdf( ) or pandas.DataFrame.read_csv( ) depending on the file format.

    The Solar Wind Omni data sets contains solar wind velocity (V) and dynamic pressure (P), the southward interplanetary magnetic field in Geocentric Solar Ecliptic System (GSE) coordinates (B_Z_GSE), the auroral electrojet index (AE), and the Sym-H index all at 1 minute cadence.

    The SAMPEX data set contains electron flux from the Proton/Electron Telescope (PET) at two energy channels 1.5-6.0 MeV (ELO) and 2.5-14 MeV (EHI) at an approximate 6 second cadence.

  4. Z

    Database of Uniaxial Cyclic and Tensile Coupon Tests for Structural Metallic...

    • data.niaid.nih.gov
    • zenodo.org
    Updated Dec 24, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Lignos, Dimitrios G. (2022). Database of Uniaxial Cyclic and Tensile Coupon Tests for Structural Metallic Materials [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_6965146
    Explore at:
    Dataset updated
    Dec 24, 2022
    Dataset provided by
    Ozden, Selimcan
    Hartloper, Alexander R.
    de Castro e Sousa, Albano
    Lignos, Dimitrios G.
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Database of Uniaxial Cyclic and Tensile Coupon Tests for Structural Metallic Materials

    Background

    This dataset contains data from monotonic and cyclic loading experiments on structural metallic materials. The materials are primarily structural steels and one iron-based shape memory alloy is also included. Summary files are included that provide an overview of the database and data from the individual experiments is also included.

    The files included in the database are outlined below and the format of the files is briefly described. Additional information regarding the formatting can be found through the post-processing library (https://github.com/ahartloper/rlmtp/tree/master/protocols).

    Usage

    The data is licensed through the Creative Commons Attribution 4.0 International.

    If you have used our data and are publishing your work, we ask that you please reference both:

    this database through its DOI, and

    any publication that is associated with the experiments. See the Overall_Summary and Database_References files for the associated publication references.

    Included Files

    Overall_Summary_2022-08-25_v1-0-0.csv: summarises the specimen information for all experiments in the database.

    Summarized_Mechanical_Props_Campaign_2022-08-25_v1-0-0.csv: summarises the average initial yield stress and average initial elastic modulus per campaign.

    Unreduced_Data-#_v1-0-0.zip: contain the original (not downsampled) data

    Where # is one of: 1, 2, 3, 4, 5, 6. The unreduced data is broken into separate archives because of upload limitations to Zenodo. Together they provide all the experimental data.

    We recommend you un-zip all the folders and place them in one "Unreduced_Data" directory similar to the "Clean_Data"

    The experimental data is provided through .csv files for each test that contain the processed data. The experiments are organised by experimental campaign and named by load protocol and specimen. A .pdf file accompanies each test showing the stress-strain graph.

    There is a "db_tag_clean_data_map.csv" file that is used to map the database summary with the unreduced data.

    The computed yield stresses and elastic moduli are stored in the "yield_stress" directory.

    Clean_Data_v1-0-0.zip: contains all the downsampled data

    The experimental data is provided through .csv files for each test that contain the processed data. The experiments are organised by experimental campaign and named by load protocol and specimen. A .pdf file accompanies each test showing the stress-strain graph.

    There is a "db_tag_clean_data_map.csv" file that is used to map the database summary with the clean data.

    The computed yield stresses and elastic moduli are stored in the "yield_stress" directory.

    Database_References_v1-0-0.bib

    Contains a bibtex reference for many of the experiments in the database. Corresponds to the "citekey" entry in the summary files.

    File Format: Downsampled Data

    These are the "LP_

    The header of the first column is empty: the first column corresponds to the index of the sample point in the original (unreduced) data

    Time[s]: time in seconds since the start of the test

    e_true: true strain

    Sigma_true: true stress in MPa

    (optional) Temperature[C]: the surface temperature in degC

    These data files can be easily loaded using the pandas library in Python through:

    import pandas data = pandas.read_csv(data_file, index_col=0)

    The data is formatted so it can be used directly in RESSPyLab (https://github.com/AlbanoCastroSousa/RESSPyLab). Note that the column names "e_true" and "Sigma_true" were kept for backwards compatibility reasons with RESSPyLab.

    File Format: Unreduced Data

    These are the "LP_

    The first column is the index of each data point

    S/No: sample number recorded by the DAQ

    System Date: Date and time of sample

    Time[s]: time in seconds since the start of the test

    C_1_Force[kN]: load cell force

    C_1_Déform1[mm]: extensometer displacement

    C_1_Déplacement[mm]: cross-head displacement

    Eng_Stress[MPa]: engineering stress

    Eng_Strain[]: engineering strain

    e_true: true strain

    Sigma_true: true stress in MPa

    (optional) Temperature[C]: specimen surface temperature in degC

    The data can be loaded and used similarly to the downsampled data.

    File Format: Overall_Summary

    The overall summary file provides data on all the test specimens in the database. The columns include:

    hidden_index: internal reference ID

    grade: material grade

    spec: specifications for the material

    source: base material for the test specimen

    id: internal name for the specimen

    lp: load protocol

    size: type of specimen (M8, M12, M20)

    gage_length_mm_: unreduced section length in mm

    avg_reduced_dia_mm_: average measured diameter for the reduced section in mm

    avg_fractured_dia_top_mm_: average measured diameter of the top fracture surface in mm

    avg_fractured_dia_bot_mm_: average measured diameter of the bottom fracture surface in mm

    fy_n_mpa_: nominal yield stress

    fu_n_mpa_: nominal ultimate stress

    t_a_deg_c_: ambient temperature in degC

    date: date of test

    investigator: person(s) who conducted the test

    location: laboratory where test was conducted

    machine: setup used to conduct test

    pid_force_k_p, pid_force_t_i, pid_force_t_d: PID parameters for force control

    pid_disp_k_p, pid_disp_t_i, pid_disp_t_d: PID parameters for displacement control

    pid_extenso_k_p, pid_extenso_t_i, pid_extenso_t_d: PID parameters for extensometer control

    citekey: reference corresponding to the Database_References.bib file

    yield_stress_mpa_: computed yield stress in MPa

    elastic_modulus_mpa_: computed elastic modulus in MPa

    fracture_strain: computed average true strain across the fracture surface

    c,si,mn,p,s,n,cu,mo,ni,cr,v,nb,ti,al,b,zr,sn,ca,h,fe: chemical compositions in units of %mass

    file: file name of corresponding clean (downsampled) stress-strain data

    File Format: Summarized_Mechanical_Props_Campaign

    Meant to be loaded in Python as a pandas DataFrame with multi-indexing, e.g.,

    tab1 = pd.read_csv('Summarized_Mechanical_Props_Campaign_' + date + version + '.csv', index_col=[0, 1, 2, 3], skipinitialspace=True, header=[0, 1], keep_default_na=False, na_values='')

    citekey: reference in "Campaign_References.bib".

    Grade: material grade.

    Spec.: specifications (e.g., J2+N).

    Yield Stress [MPa]: initial yield stress in MPa

    size, count, mean, coefvar: number of experiments in campaign, number of experiments in mean, mean value for campaign, coefficient of variation for campaign

    Elastic Modulus [MPa]: initial elastic modulus in MPa

    size, count, mean, coefvar: number of experiments in campaign, number of experiments in mean, mean value for campaign, coefficient of variation for campaign

    Caveats

    The files in the following directories were tested before the protocol was established. Therefore, only the true stress-strain is available for each:

    A500

    A992_Gr50

    BCP325

    BCR295

    HYP400

    S460NL

    S690QL/25mm

    S355J2_Plates/S355J2_N_25mm and S355J2_N_50mm

  5. Z

    Analysis of references in the IPCC AR6 WG2 Report of 2022

    • data.niaid.nih.gov
    Updated Mar 11, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bianca Kramer (2022). Analysis of references in the IPCC AR6 WG2 Report of 2022 [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_6327206
    Explore at:
    Dataset updated
    Mar 11, 2022
    Dataset provided by
    Cameron Neylon
    Bianca Kramer
    License

    https://creativecommons.org/licenses/publicdomain/https://creativecommons.org/licenses/publicdomain/

    Description

    This repository contains data on 17,419 DOIs cited in the IPCC Working Group 2 contribution to the Sixth Assessment Report, and the code to link them to the dataset built at the Curtin Open Knowledge Initiative (COKI).

    References were extracted from the report's PDFs (downloaded 2022-03-01) via Scholarcy and exported as RIS and BibTeX files. DOI strings were identified from RIS files by pattern matching and saved as CSV file. The list of DOIs for each chapter and cross chapter paper was processed using a custom Python script to generate a pandas DataFrame which was saved as CSV file and uploaded to Google Big Query.

    We used the main object table of the Academic Observatory, which combines information from Crossref, Unpaywall, Microsoft Academic, Open Citations, the Research Organization Registry and Geonames to enrich the DOIs with bibliographic information, affiliations, and open access status. A custom query was used to join and format the data and the resulting table was visualised in a Google DataStudio dashboard.

    This version of the repository also includes the set of DOIs from references in the IPCC Working Group 1 contribution to the Sixth Assessment Report as extracted by Alexis-Michel Mugabushaka and shared on Zenodo: https://doi.org/10.5281/zenodo.5475442 (CC-BY)

    A brief descriptive analysis was provided as a blogpost on the COKI website.

    The repository contains the following content:

    Data:

    data/scholarcy/RIS/ - extracted references as RIS files

    data/scholarcy/BibTeX/ - extracted references as BibTeX files

    IPCC_AR6_WGII_dois.csv - list of DOIs

    data/10.5281_zenodo.5475442/ - references from IPCC AR6 WG1 report

    Processing:

    preprocessing.R - preprocessing steps for identifying and cleaning DOIs

    process.py - Python script for transforming data and linking to COKI data through Google Big Query

    Outcomes:

    Dataset on BigQuery - requires a google account for access and bigquery account for querying

    Data Studio Dashboard - interactive analysis of the generated data

    Zotero library of references extracted via Scholarcy

    PDF version of blogpost

    Note on licenses: Data are made available under CC0 (with the exception of WG1 reference data, which have been shared under CC-BY 4.0) Code is made available under Apache License 2.0

  6. d

    University Archives web archive collection derivatives

    • search.dataone.org
    • borealisdata.ca
    Updated Dec 28, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ruest, Nick; Wilk, Jocelyn; Thurman, Alex (2023). University Archives web archive collection derivatives [Dataset]. https://search.dataone.org/view/sha256%3A6e1852b2330a5035d1d8ca3a5bb7f3d5ebb74441ca7744f883cbc30259aae1e8
    Explore at:
    Dataset updated
    Dec 28, 2023
    Dataset provided by
    Borealis
    Authors
    Ruest, Nick; Wilk, Jocelyn; Thurman, Alex
    Description

    Web archive derivatives of the University Archives collection from Columbia University Libraries. The derivatives were created with the Archives Unleashed Toolkit and Archives Unleashed Cloud. The cul-1914-parquet.tar.gz derivatives are in the Apache Parquet format, which is a columnar storage format. These derivatives are generally small enough to work with on your local machine, and can be easily converted to Pandas DataFrames. See this notebook for examples. Domains .webpages().groupBy(ExtractDomainDF($"url").alias("url")).count().sort($"count".desc) Produces a DataFrame with the following columns: domain count Web Pages .webpages().select($"crawl_date", $"url", $"mime_type_web_server", $"mime_type_tika", RemoveHTMLDF(RemoveHTTPHeaderDF(($"content"))).alias("content")) Produces a DataFrame with the following columns: crawl_date url mime_type_web_server mime_type_tika content Web Graph .webgraph() Produces a DataFrame with the following columns: crawl_date src dest anchor Image Links .imageLinks() Produces a DataFrame with the following columns: src image_url Binary Analysis Images PDFs Presentation program files Spreadsheets Text files Word processor files The cul-1914-auk.tar.gz derivatives are the standard set of web archive derivatives produced by the Archives Unleashed Cloud. Gephi file, which can be loaded into Gephi. It will have basic characteristics already computed and a basic layout. Raw Network file, which can also be loaded into Gephi. You will have to use that network program to lay it out yourself. Full text file. In it, each website within the web archive collection will have its full text presented on one line, along with information around when it was crawled, the name of the domain, and the full URL of the content. Domains count file. A text file containing the frequency count of domains captured within your web archive. Due to file size restrictions in Scholars Portal Dataverse, each of the derivative files needed to be split into 1G parts. These parts can be joined back together with cat. For example: cat cul-1914-parquet.tar.gz.part* > cul-1914-parquet.tar.gz

  7. Burke Library New York City Religions web archive collection derivatives

    • zenodo.org
    • data.niaid.nih.gov
    application/gzip
    Updated Mar 9, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nick Ruest; Nick Ruest; Matthew C. Baker; Alex Thurman; Matthew C. Baker; Alex Thurman (2020). Burke Library New York City Religions web archive collection derivatives [Dataset]. http://doi.org/10.5281/zenodo.3701455
    Explore at:
    application/gzipAvailable download formats
    Dataset updated
    Mar 9, 2020
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Nick Ruest; Nick Ruest; Matthew C. Baker; Alex Thurman; Matthew C. Baker; Alex Thurman
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    New York
    Description

    Web archive derivatives of the Burke Library New York City Religions collection from Columbia University Libraries. The derivatives were created with the Archives Unleashed Toolkit and Archives Unleashed Cloud.

    The cul-1945-parquet.tar.gz derivatives are in the Apache Parquet format, which is a columnar storage format. These derivatives are generally small enough to work with on your local machine, and can be easily converted to Pandas DataFrames. See this notebook for examples.

    Domains

    .webpages().groupBy(ExtractDomainDF($"url").alias("url")).count().sort($"count".desc)

    Produces a DataFrame with the following columns:

    • domain
    • count

    Web Pages

    .webpages().select($"crawl_date", $"url", $"mime_type_web_server", $"mime_type_tika", RemoveHTMLDF(RemoveHTTPHeaderDF(($"content"))).alias("content"))

    Produces a DataFrame with the following columns:

    • crawl_date
    • url
    • mime_type_web_server
    • mime_type_tika
    • content

    Web Graph

    .webgraph()

    Produces a DataFrame with the following columns:

    • crawl_date
    • src
    • dest
    • anchor

    Image Links

    .imageLinks()

    Produces a DataFrame with the following columns:

    • src
    • image_url

    Binary Analysis

    • Images
    • PDFs
    • Presentation program files
    • Spreadsheets
    • Text files
    • Word processor files

    The cul-1945-auk.tar.gz derivatives are the standard set of web archive derivatives produced by the Archives Unleashed Cloud.

    • Gephi file, which can be loaded into Gephi. It will have basic characteristics already computed and a basic layout.
    • Raw Network file, which can also be loaded into Gephi. You will have to use that network program to lay it out yourself.
    • Full text file. In it, each website within the web archive collection will have its full text presented on one line, along with information around when it was crawled, the name of the domain, and the full URL of the content.
    • Domains count file. A text file containing the frequency count of domains captured within your web archive.
  8. Z

    Data from: Bayesian Analysis for Remote Biosignature Identification on...

    • data.niaid.nih.gov
    • zenodo.org
    Updated Jul 29, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Vincent Kofman (2023). Bayesian Analysis for Remote Biosignature Identification on exoEarths (BARBIE) I: Using Grid-Based Nested Sampling in Coronagraphy Observation Simulations for H2O [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7897197
    Explore at:
    Dataset updated
    Jul 29, 2023
    Dataset provided by
    Nicholas Susemiehl
    Avi Mandell
    Michael Dane Moore
    Geronimo Villanueva
    Vincent Kofman
    Michael D. Himes
    Natasha Latouf
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    We present all of the data across our SNR and abundance study for the molecule H2O for an exoEarth twin. The wavelength range is from 0.515-1 micron, with 25 evenly spaced 20% bandpasses in this range. The SNR ranges from 3-16, and the abundance values range from log10(VMR) = -3.5 to -1.5 in steps of 0.5 and 0.25 (all presented in VMR in the associated table). We present the lower and upper wavelength per bandpass, the input H2O value (abundance case), the retrieved H2O value (presented as the log10(VMR)), the lower and upper limits of the 68% credible region (presented as the log10(VMR)), and the log-Bayes factor for H2O. For more information about how these were calculated, please see Bayesian Analysis for Remote Biosignature Identification on exoEarths (BARBIE) I: Using Grid-Based Nested Sampling in Coronagraphy Observation Simulations for H2O, accepted and currently available on arXiv.

    To open this csv as a Pandas dataframe, use the following command:

    your_dataframe_name = pd.read_csv(f'zenodo_table.csv', dtype={'Input H2O': str})

  9. Z

    polyOne Data Set - 100 million hypothetical polymers including 29 properties...

    • data.niaid.nih.gov
    • zenodo.org
    Updated Mar 24, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rampi Ramprasad (2023). polyOne Data Set - 100 million hypothetical polymers including 29 properties [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7124187
    Explore at:
    Dataset updated
    Mar 24, 2023
    Dataset provided by
    Christopher Kuenneth
    Rampi Ramprasad
    Description

    polyOne Data Set

    The data set contains 100 million hypothetical polymers each with 29 predicted properties using machine learning models. We use PSMILES strings to represent polymer structures, see here and here. The polymers are generated by decomposing previously synthesized polymers into unique chemical fragments. Random and enumerative compositions of these fragments yield 100 million hypothetical PSMILES strings. All PSMILES strings are chemically valid polymers but, mostly, have never been synthesized before. More information can be found in the paper. Please note the license agreement in the LICENSE file.

    Full data set including the properties

    The data files are in Apache Parquet format. The files start with polyOne_*.parquet.

    I recommend using dask (pip install dask) to load and process the data set. Pandas also works but is slower.

    Load sharded data set with dask python import dask.dataframe as dd ddf = dd.read_parquet("*.parquet", engine="pyarrow")

    For example, compute the description of data set ```python df_describe = ddf.describe().compute() df_describe

    
    
    PSMILES strings only
    
    
    
      
    generated_polymer_smiles_train.txt - 80 million PSMILES strings for training polyBERT. One string per line.
      
    generated_polymer_smiles_dev.txt - 20 million PSMILES strings for testing polyBERT. One string per line.
    
  10. Freely Accessible eJournals web archive collection derivatives

    • zenodo.org
    • data.niaid.nih.gov
    application/gzip
    Updated Feb 2, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nick Ruest; Nick Ruest (2020). Freely Accessible eJournals web archive collection derivatives [Dataset]. http://doi.org/10.5281/zenodo.3633671
    Explore at:
    application/gzipAvailable download formats
    Dataset updated
    Feb 2, 2020
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Nick Ruest; Nick Ruest
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Web archive derivatives of the Freely Accessible eJournals collection from Columbia University Libraries. The derivatives were created with the Archives Unleashed Toolkit and Archives Unleashed Cloud.

    The cul-5921-parquet.tar.gz derivatives are in the Apache Parquet format, which is a columnar storage format. These derivatives are generally small enough to work with on your local machine, and can be easily converted to Pandas DataFrames. See this notebook for examples.

    Domains

    .webpages().groupBy(ExtractDomainDF($"url").alias("url")).count().sort($"count".desc)

    Produces a DataFrame with the following columns:

    • domain
    • count

    Web Pages

    .webpages().select($"crawl_date", $"url", $"mime_type_web_server", $"mime_type_tika", RemoveHTMLDF(RemoveHTTPHeaderDF(($"content"))).alias("content"))

    Produces a DataFrame with the following columns:

    • crawl_date
    • url
    • mime_type_web_server
    • mime_type_tika
    • content

    Web Graph

    .webgraph()

    Produces a DataFrame with the following columns:

    • crawl_date
    • src
    • dest
    • anchor

    Image Links

    .imageLinks()

    Produces a DataFrame with the following columns:

    • src
    • image_url

    Binary Analysis

    • Audio
    • Images
    • PDFs
    • Presentation program files
    • Spreadsheets
    • Text files
    • Word processor files

    The cul-12143-auk.tar.gz derivatives are the standard set of web archive derivatives produced by the Archives Unleashed Cloud.

    • Gephi file, which can be loaded into Gephi. It will have basic characteristics already computed and a basic layout.
    • Raw Network file, which can also be loaded into Gephi. You will have to use that network program to lay it out yourself.
    • Full text file. In it, each website within the web archive collection will have its full text presented on one line, along with information around when it was crawled, the name of the domain, and the full URL of the content.
    • Domains count file. A text file containing the frequency count of domains captured within your web archive.
  11. Contemporary Composers Web Archive (CCWA) web archive collection derivatives...

    • zenodo.org
    • data.niaid.nih.gov
    application/gzip
    Updated Mar 1, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nick Ruest; Nick Ruest; Samantha Abrams; Samantha Abrams (2020). Contemporary Composers Web Archive (CCWA) web archive collection derivatives [Dataset]. http://doi.org/10.5281/zenodo.3692559
    Explore at:
    application/gzipAvailable download formats
    Dataset updated
    Mar 1, 2020
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Nick Ruest; Nick Ruest; Samantha Abrams; Samantha Abrams
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Web archive derivatives of the Contemporary Composers Web Archive (CCWA) collection from the Ivy Plus Libraries Confederation. The derivatives were created with the Archives Unleashed Toolkit and Archives Unleashed Cloud.

    The ivy-4019-parquet.tar.gz derivatives are in the Apache Parquet format, which is a columnar storage format. These derivatives are generally small enough to work with on your local machine, and can be easily converted to Pandas DataFrames. See this notebook for examples.

    Domains

    .webpages().groupBy(ExtractDomainDF($"url").alias("url")).count().sort($"count".desc)

    Produces a DataFrame with the following columns:

    • domain
    • count

    Web Pages

    .webpages().select($"crawl_date", $"url", $"mime_type_web_server", $"mime_type_tika", RemoveHTMLDF(RemoveHTTPHeaderDF(($"content"))).alias("content"))

    Produces a DataFrame with the following columns:

    • crawl_date
    • url
    • mime_type_web_server
    • mime_type_tika
    • content

    Web Graph

    .webgraph()

    Produces a DataFrame with the following columns:

    • crawl_date
    • src
    • dest
    • anchor

    Image Links

    .imageLinks()

    Produces a DataFrame with the following columns:

    • src
    • image_url

    Binary Analysis

    • Audio
    • Images
    • PDFs
    • Presentation program files
    • Spreadsheets
    • Text files
    • Word processor files

    The ivy-4019-auk.tar.gz derivatives are the standard set of web archive derivatives produced by the Archives Unleashed Cloud.

    • Gephi file, which can be loaded into Gephi. It will have basic characteristics already computed and a basic layout.
    • Raw Network file, which can also be loaded into Gephi. You will have to use that network program to lay it out yourself.
    • Full text file. In it, each website within the web archive collection will have its full text presented on one line, along with information around when it was crawled, the name of the domain, and the full URL of the content.
    • Domains count file. A text file containing the frequency count of domains captured within your web archive.
  12. Web Archive of Independent News Sites on Turkish Affairs derivatives

    • zenodo.org
    • data.niaid.nih.gov
    application/gzip
    Updated Jan 31, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nick Ruest; Nick Ruest (2020). Web Archive of Independent News Sites on Turkish Affairs derivatives [Dataset]. http://doi.org/10.5281/zenodo.3633234
    Explore at:
    application/gzipAvailable download formats
    Dataset updated
    Jan 31, 2020
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Nick Ruest; Nick Ruest
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Derivatives of the Web Archive of Independent News Sites on Turkish Affairs collection from the Ivy Plus Libraries Confederation. The derivatives were created with the Archives Unleashed Toolkit and Archives Unleashed Cloud.

    The ivy-12911-parquet.tar.gz derivatives are in the Apache Parquet format, which is a columnar storage format. These derivatives are generally small enough to work with on your local machine, and can be easily converted to Pandas DataFrames. See this notebook for examples.

    Domains

    .webpages().groupBy(ExtractDomainDF($"url").alias("url")).count().sort($"count".desc)

    Produces a DataFrame with the following columns:

    • domain
    • count

    Web Pages

    .webpages().select($"crawl_date", $"url", $"mime_type_web_server", $"mime_type_tika", RemoveHTMLDF(RemoveHTTPHeaderDF(($"content"))).alias("content"))

    Produces a DataFrame with the following columns:

    • crawl_date
    • url
    • mime_type_web_server
    • mime_type_tika
    • content

    Web Graph

    .webgraph()

    Produces a DataFrame with the following columns:

    • crawl_date
    • src
    • dest
    • anchor

    Image Links

    .imageLinks()

    Produces a DataFrame with the following columns:

    • src
    • image_url

    Binary Analysis

    • Audio
    • Images
    • PDFs
    • Presentation program files
    • Spreadsheets
    • Text files
    • Word processor files

    The ivy-12911-auk.tar.gz derivatives are the standard set of web archive derivatives produced by the Archives Unleashed Cloud.

    • Gephi file, which can be loaded into Gephi. It will have basic characteristics already computed and a basic layout.
    • Raw Network file, which can also be loaded into Gephi. You will have to use that network program to lay it out yourself.
    • Full text file. In it, each website within the web archive collection will have its full text presented on one line, along with information around when it was crawled, the name of the domain, and the full URL of the content.
    • Domains count file. A text file containing the frequency count of domains captured within your web archive.
  13. Popline and K4Health Web Archive collection derivatives

    • zenodo.org
    • data.niaid.nih.gov
    application/gzip
    Updated Jan 31, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nick Ruest; Nick Ruest; Lauris Olson; Samantha Abrams; Lauris Olson; Samantha Abrams (2020). Popline and K4Health Web Archive collection derivatives [Dataset]. http://doi.org/10.5281/zenodo.3633022
    Explore at:
    application/gzipAvailable download formats
    Dataset updated
    Jan 31, 2020
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Nick Ruest; Nick Ruest; Lauris Olson; Samantha Abrams; Lauris Olson; Samantha Abrams
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Web archive derivatives of the Popline and K4Health Web Archive collection from the Ivy Plus Libraries Confederation. The derivatives were created with the Archives Unleashed Toolkit and Archives Unleashed Cloud.

    The ivy-12006-parquet.tar.gz derivatives are in the Apache Parquet format, which is a columnar storage format. These derivatives are generally small enough to work with on your local machine, and can be easily converted to Pandas DataFrames. See this notebook for examples.

    Domains

    .webpages().groupBy(ExtractDomainDF($"url").alias("url")).count().sort($"count".desc)

    Produces a DataFrame with the following columns:

    • domain
    • count

    Web Pages

    .webpages().select($"crawl_date", $"url", $"mime_type_web_server", $"mime_type_tika", RemoveHTMLDF(RemoveHTTPHeaderDF(($"content"))).alias("content"))

    Produces a DataFrame with the following columns:

    • crawl_date
    • url
    • mime_type_web_server
    • mime_type_tika
    • content

    Web Graph

    .webgraph()

    Produces a DataFrame with the following columns:

    • crawl_date
    • src
    • dest
    • anchor

    Image Links

    .imageLinks()

    Produces a DataFrame with the following columns:

    • src
    • image_url

    Binary Analysis

    • Audio
    • Images
    • PDFs
    • Presentation program files
    • Spreadsheets
    • Text files
    • Word processor files

    The ivy-12006-auk.tar.gz derivatives are the standard set of web archive derivatives produced by the Archives Unleashed Cloud.

    • Gephi file, which can be loaded into Gephi. It will have basic characteristics already computed and a basic layout.
    • Raw Network file, which can also be loaded into Gephi. You will have to use that network program to lay it out yourself.
    • Full text file. In it, each website within the web archive collection will have its full text presented on one line, along with information around when it was crawled, the name of the domain, and the full URL of the content.
    • Domains count file. A text file containing the frequency count of domains captured within your web archive.
  14. d

    Optogenetics_Dfd_population_imaging

    • search.dataone.org
    • dataverse.harvard.edu
    Updated Nov 8, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Braun, Jonas (2023). Optogenetics_Dfd_population_imaging [Dataset]. http://doi.org/10.7910/DVN/INYAYV
    Explore at:
    Dataset updated
    Nov 8, 2023
    Dataset provided by
    Harvard Dataverse
    Authors
    Braun, Jonas
    Description

    This dataset contains Dfd population imaging data and behavioral including responses to optogenetic stimulation. Part of Braun et al. 2023. There are two different types of tar.gz files. The one ending with "processed.tar.gz" contains data related to one fly: 1. "background_image.tif" is a standard deviation projection of raw fluorescence data used as background in figure 2c. 2. "roi_center_annotation.pdf" is a plot indicating the results of semi-automated ROI detection. 3. "ROI_centers.txt" indicates the location of said ROI centers. 4. "ROI_mask.tif" is the mask used for ROI extraction for this fly. ------------------- The other tar.gz files each correspond to one 10min long recording of neuronal activity and behavior of one fly. If multiple recordings have been made in one fly, their sequence is indicated with a 3 digit number in the name. Each of those folders contains 3 subfolders: "2p" holds the synchronisation data and metadata and the two photon recording metadata. "behData" contains the behavioral camera metadata. "processed" contains a pickled pandas dataframe containing all processed behavioral variables (beh_df.pkl) and all neuronal time series (twop_df.pkl) required to reproduce the figures. Raw behavioral videos and raw fluorescence data are available upon request from the authors and are omitted here because of their size.

  15. Quebec International Relation and Economy web archive collection derivatives...

    • zenodo.org
    • data.niaid.nih.gov
    application/gzip
    Updated Feb 26, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nick Ruest; Nick Ruest; Carole Gagné; Dave Mitchell; Carole Gagné; Dave Mitchell (2020). Quebec International Relation and Economy web archive collection derivatives [Dataset]. http://doi.org/10.5281/zenodo.3688334
    Explore at:
    application/gzipAvailable download formats
    Dataset updated
    Feb 26, 2020
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Nick Ruest; Nick Ruest; Carole Gagné; Dave Mitchell; Carole Gagné; Dave Mitchell
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Québec City
    Description

    Web archive derivatives of the Sites of the Quebec International Relation and Economy collection from the Bibliothèque et Archives nationales du Québec. The derivatives were created with the Archives Unleashed Toolkit. Merci beaucoup BAnQ!

    These derivatives are in the Apache Parquet format, which is a columnar storage format. These derivatives are generally small enough to work with on your local machine, and can be easily converted to Pandas DataFrames. See this notebook for examples.

    Domains

    .webpages().groupBy(ExtractDomainDF($"url").alias("url")).count().sort($"count".desc)

    Produces a DataFrame with the following columns:

    • domain
    • count

    Web Pages

    .webpages().select($"crawl_date", $"url", $"mime_type_web_server", $"mime_type_tika", RemoveHTMLDF(RemoveHTTPHeaderDF(($"content"))).alias("content"))

    Produces a DataFrame with the following columns:

    • crawl_date
    • url
    • mime_type_web_server
    • mime_type_tika
    • content

    Web Graph

    .webgraph()

    Produces a DataFrame with the following columns:

    • crawl_date
    • src
    • dest
    • anchor

    Image Links

    .imageLinks()

    Produces a DataFrame with the following columns:

    • src
    • image_url

    Binary Analysis

    • Audio
    • Images
    • PDFs
    • Presentation program files
    • Spreadsheets
    • Text files
    • Videos
    • Word processor files
  16. m

    Chapter 10 - Advanced Feature Engineering Techniques for Fraud Analytics

    • data.mendeley.com
    Updated Oct 18, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    ABDELRAHIM AQQAD (2023). Chapter 10 - Advanced Feature Engineering Techniques for Fraud Analytics [Dataset]. http://doi.org/10.17632/v7r3dsgtmz.2
    Explore at:
    Dataset updated
    Oct 18, 2023
    Authors
    ABDELRAHIM AQQAD
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Dataset_1: The dataset consists of transaction timestamps (in hours) for a sample of online banking transactions. The timestamps represent the time of day when the transactions occurred.

    Dataset_2: The dataset, encapsulated as a pandas DataFrame "trans_David", chronicles the transactional activities of an individual named David. A salient column, "channel_cd", signifies the payment channel employed by David for each transaction. The dataset encompasses 40 entries across 14 columns, with 'channel_cd' being the focal point for the derivation of the 'freq_channel' feature.

  17. State Elections Web Archive collection derivatives

    • zenodo.org
    • data.niaid.nih.gov
    application/gzip
    Updated Feb 5, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nick Ruest; Nick Ruest; Kristina Williams; JKeely Wilczek; Jeremy Darrington; Ryan Denniston; Samantha Abrams; Kristina Williams; JKeely Wilczek; Jeremy Darrington; Ryan Denniston; Samantha Abrams (2020). State Elections Web Archive collection derivatives [Dataset]. http://doi.org/10.5281/zenodo.3635634
    Explore at:
    application/gzipAvailable download formats
    Dataset updated
    Feb 5, 2020
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Nick Ruest; Nick Ruest; Kristina Williams; JKeely Wilczek; Jeremy Darrington; Ryan Denniston; Samantha Abrams; Kristina Williams; JKeely Wilczek; Jeremy Darrington; Ryan Denniston; Samantha Abrams
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Web archive derivatives of the State Elections Web Archive collection from the Ivy Plus Libraries Confederation. The derivatives were created with the Archives Unleashed Toolkit and Archives Unleashed Cloud.

    The ivy-10793-parquet.tar.gz derivatives are in the Apache Parquet format, which is a columnar storage format. These derivatives are generally small enough to work with on your local machine, and can be easily converted to Pandas DataFrames. See this notebook for examples.

    Domains

    .webpages().groupBy(ExtractDomainDF($"url").alias("url")).count().sort($"count".desc)

    Produces a DataFrame with the following columns:

    • domain
    • count

    Web Pages

    .webpages().select($"crawl_date", $"url", $"mime_type_web_server", $"mime_type_tika", RemoveHTMLDF(RemoveHTTPHeaderDF(($"content"))).alias("content"))

    Produces a DataFrame with the following columns:

    • crawl_date
    • url
    • mime_type_web_server
    • mime_type_tika
    • content

    Web Graph

    .webgraph()

    Produces a DataFrame with the following columns:

    • crawl_date
    • src
    • dest
    • anchor

    Image Links

    .imageLinks()

    Produces a DataFrame with the following columns:

    • src
    • image_url

    Binary Analysis

    • Audio
    • Images
    • PDFs
    • Presentation program files
    • Spreadsheets
    • Text files
    • Word processor files

    The ivy-10793-auk.tar.gz derivatives are the standard set of web archive derivatives produced by the Archives Unleashed Cloud.

    • Gephi file, which can be loaded into Gephi. It will have basic characteristics already computed and a basic layout.
    • Raw Network file, which can also be loaded into Gephi. You will have to use that network program to lay it out yourself.
    • Full text file. In it, each website within the web archive collection will have its full text presented on one line, along with information around when it was crawled, the name of the domain, and the full URL of the content.
    • Domains count file. A text file containing the frequency count of domains captured within your web archive.
  18. Rare Book and Manuscript Library web archive collection derivatives

    • zenodo.org
    • data.niaid.nih.gov
    application/gzip
    Updated Mar 9, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nick Ruest; Nick Ruest (2020). Rare Book and Manuscript Library web archive collection derivatives [Dataset]. http://doi.org/10.5281/zenodo.3701593
    Explore at:
    application/gzipAvailable download formats
    Dataset updated
    Mar 9, 2020
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Nick Ruest; Nick Ruest
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Web archive derivatives of the Rare Book and Manuscript Library collection from Columbia University Libraries. The derivatives were created with the Archives Unleashed Toolkit and Archives Unleashed Cloud.

    The cul-2766-parquet.tar.gz derivatives are in the Apache Parquet format, which is a columnar storage format. These derivatives are generally small enough to work with on your local machine, and can be easily converted to Pandas DataFrames. See this notebook for examples.

    Domains

    .webpages().groupBy(ExtractDomainDF($"url").alias("url")).count().sort($"count".desc)

    Produces a DataFrame with the following columns:

    • domain
    • count

    Web Pages

    .webpages().select($"crawl_date", $"url", $"mime_type_web_server", $"mime_type_tika", RemoveHTMLDF(RemoveHTTPHeaderDF(($"content"))).alias("content"))

    Produces a DataFrame with the following columns:

    • crawl_date
    • url
    • mime_type_web_server
    • mime_type_tika
    • content

    Web Graph

    .webgraph()

    Produces a DataFrame with the following columns:

    • crawl_date
    • src
    • dest
    • anchor

    Image Links

    .imageLinks()

    Produces a DataFrame with the following columns:

    • src
    • image_url

    Binary Analysis

    • Images
    • PDFs
    • Presentation program files
    • Spreadsheets
    • Word processor files

    The cul-2766-auk.tar.gz derivatives are the standard set of web archive derivatives produced by the Archives Unleashed Cloud.

    • Gephi file, which can be loaded into Gephi. It will have basic characteristics already computed and a basic layout.
    • Raw Network file, which can also be loaded into Gephi. You will have to use that network program to lay it out yourself.
    • Full text file. In it, each website within the web archive collection will have its full text presented on one line, along with information around when it was crawled, the name of the domain, and the full URL of the content.
    • Domains count file. A text file containing the frequency count of domains captured within your web archive.
  19. Z

    Resistance web archive collection derivatives

    • data.niaid.nih.gov
    • zenodo.org
    Updated Feb 9, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ruest, Nick (2020). Resistance web archive collection derivatives [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_3660456
    Explore at:
    Dataset updated
    Feb 9, 2020
    Dataset provided by
    Thurman, Alex
    Ruest, Nick
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Web archive derivatives of the Resistance collection from Columbia University Libraries. The derivatives were created with the Archives Unleashed Toolkit and Archives Unleashed Cloud.

    The cul-8752-parquet.tar.gz derivatives are in the Apache Parquet format, which is a columnar storage format. These derivatives are generally small enough to work with on your local machine, and can be easily converted to Pandas DataFrames. See this notebook for examples.

    Domains

    .webpages().groupBy(ExtractDomainDF($"url").alias("url")).count().sort($"count".desc)

    Produces a DataFrame with the following columns:

    domain

    count

    Web Pages

    .webpages().select($"crawl_date", $"url", $"mime_type_web_server", $"mime_type_tika", RemoveHTMLDF(RemoveHTTPHeaderDF(($"content"))).alias("content"))

    Produces a DataFrame with the following columns:

    crawl_date

    url

    mime_type_web_server

    mime_type_tika

    content

    Web Graph

    .webgraph()

    Produces a DataFrame with the following columns:

    crawl_date

    src

    dest

    anchor

    Image Links

    .imageLinks()

    Produces a DataFrame with the following columns:

    src

    image_url

    Binary Analysis

    PDFs

    Spreadsheets

    Text files

    Word processor files

    The cul-8752-auk.tar.gz derivatives are the standard set of web archive derivatives produced by the Archives Unleashed Cloud.

    Gephi file, which can be loaded into Gephi. It will have basic characteristics already computed and a basic layout.

    Raw Network file, which can also be loaded into Gephi. You will have to use that network program to lay it out yourself.

    Full text file. In it, each website within the web archive collection will have its full text presented on one line, along with information around when it was crawled, the name of the domain, and the full URL of the content.

    Domains count file. A text file containing the frequency count of domains captured within your web archive.

  20. Ministry of Environment of Québec (2011-2014) web archive collection...

    • zenodo.org
    application/gzip, bin
    Updated Feb 25, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nick Ruest; Nick Ruest; Carole Gagné; Dave Mitchell; Carole Gagné; Dave Mitchell (2020). Ministry of Environment of Québec (2011-2014) web archive collection derivatives [Dataset]. http://doi.org/10.5281/zenodo.3605525
    Explore at:
    application/gzip, binAvailable download formats
    Dataset updated
    Feb 25, 2020
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Nick Ruest; Nick Ruest; Carole Gagné; Dave Mitchell; Carole Gagné; Dave Mitchell
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Quebec
    Description

    Web archive derivatives of the Ministry of Environment of Québec (2011-2014) collection from the Bibliothèque et Archives nationales du Québec. The derivatives were created with the Archives Unleashed Toolkit. Merci beaucoup BAnQ!

    These derivatives are in the Apache Parquet format, which is a columnar storage format. These derivatives are generally small enough to work with on your local machine, and can be easily converted to Pandas DataFrames. See this notebook for examples.

    Domains

    .webpages().groupBy(ExtractDomainDF($"url").alias("url")).count().sort($"count".desc)

    Produces a DataFrame with the following columns:

    • domain
    • count

    Web Pages

    .webpages().select($"crawl_date", $"url", $"mime_type_web_server", $"mime_type_tika", RemoveHTMLDF(RemoveHTTPHeaderDF(($"content"))).alias("content"))

    Produces a DataFrame with the following columns:

    • crawl_date
    • url
    • mime_type_web_server
    • mime_type_tika
    • content

    Web Graph

    .webgraph()

    Produces a DataFrame with the following columns:

    • crawl_date
    • src
    • dest
    • anchor

    Image Links

    .imageLinks()

    Produces a DataFrame with the following columns:

    • src
    • image_url

    Binary Analysis

    • Audio
    • Images
    • PDFs
    • Presentation program files
    • Spreadsheets
    • Text files
    • Videos
    • Word processor files
Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
PandasPlotBench [Dataset]. https://huggingface.co/datasets/JetBrains-Research/PandasPlotBench
Organization logo

PandasPlotBench

JetBrains-Research/PandasPlotBench

Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Nov 25, 2024
Dataset provided by
JetBrainshttp://jetbrains.com/
Authors
JetBrains Research
License

Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically

Description

PandasPlotBench

PandasPlotBench is a benchmark to assess the capability of models in writing the code for visualizations given the description of the Pandas DataFrame. 🛠️ Task. Given the plotting task and the description of a Pandas DataFrame, write the code to build a plot. The dataset is based on the MatPlotLib gallery. The paper can be found in arXiv: https://arxiv.org/abs/2412.02764v1. To score your model on this dataset, you can use the our GitHub repository. 📩 If you have… See the full description on the dataset page: https://huggingface.co/datasets/JetBrains-Research/PandasPlotBench.

Search
Clear search
Close search
Google apps
Main menu