100+ datasets found
  1. h

    pandas-create-context

    • huggingface.co
    Updated Jan 8, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Or Hiltch (2024). pandas-create-context [Dataset]. https://huggingface.co/datasets/hiltch/pandas-create-context
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jan 8, 2024
    Authors
    Or Hiltch
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Overview

    This dataset is built from sql-create-context, which in itself builds from WikiSQL and Spider. I have used GPT4 to translate the SQL schema into pandas DataFrame schem initialization statements and to translate the SQL queries into pandas queries. There are 862 examples of natural language queries, pandas DataFrame creation statements, and pandas query answering the question using the DataFrame creation statement as context. This dataset was built with text-to-pandas LLMs… See the full description on the dataset page: https://huggingface.co/datasets/hiltch/pandas-create-context.

  2. Learn Pandas

    • kaggle.com
    zip
    Updated Oct 5, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Vaidik Patel (2023). Learn Pandas [Dataset]. https://www.kaggle.com/datasets/js1js2js3js4js5/learn-pandas
    Explore at:
    zip(1209861 bytes)Available download formats
    Dataset updated
    Oct 5, 2023
    Authors
    Vaidik Patel
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    It is a dataset with notebook kind of learning. Download the whole package and you will find everything to learn basics to advanced pandas which is exactly what you will need in machine learning and in data science. 😄

    This will gives you the overview and data analysis tools in pandas that is mostly required in the data manipulation and extraction important data.

    Use this notebook as notes for pandas. whenever you forget the code or syntax open it and scroll through it and you will find the solution. 🥳

  3. Pandas Practice Dataset

    • kaggle.com
    zip
    Updated Jan 27, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mrityunjay Pathak (2023). Pandas Practice Dataset [Dataset]. https://www.kaggle.com/datasets/themrityunjaypathak/pandas-practice-dataset/discussion
    Explore at:
    zip(493 bytes)Available download formats
    Dataset updated
    Jan 27, 2023
    Authors
    Mrityunjay Pathak
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    What is Pandas?

    Pandas is a Python library used for working with data sets.

    It has functions for analyzing, cleaning, exploring, and manipulating data.

    The name "Pandas" has a reference to both "Panel Data", and "Python Data Analysis" and was created by Wes McKinney in 2008.

    Why Use Pandas?

    Pandas allows us to analyze big data and make conclusions based on statistical theories.

    Pandas can clean messy data sets, and make them readable and relevant.

    Relevant data is very important in data science.

    What Can Pandas Do?

    Pandas gives you answers about the data. Like:

    Is there a correlation between two or more columns?

    What is average value?

    Max value?

    Min value?

  4. PandasPlotBench

    • huggingface.co
    Updated Nov 25, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    JetBrains Research (2024). PandasPlotBench [Dataset]. https://huggingface.co/datasets/JetBrains-Research/PandasPlotBench
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Nov 25, 2024
    Dataset provided by
    JetBrainshttp://jetbrains.com/
    Authors
    JetBrains Research
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    PandasPlotBench

    PandasPlotBench is a benchmark to assess the capability of models in writing the code for visualizations given the description of the Pandas DataFrame. 🛠️ Task. Given the plotting task and the description of a Pandas DataFrame, write the code to build a plot. The dataset is based on the MatPlotLib gallery. The paper can be found in arXiv: https://arxiv.org/abs/2412.02764v1. To score your model on this dataset, you can use the our GitHub repository. 📩 If you have… See the full description on the dataset page: https://huggingface.co/datasets/JetBrains-Research/PandasPlotBench.

  5. R

    Pandas Detection Dataset

    • universe.roboflow.com
    zip
    Updated Oct 22, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Intelligence (2024). Pandas Detection Dataset [Dataset]. https://universe.roboflow.com/intelligence-7y6fd/pandas-detection-o6kyg/dataset/1
    Explore at:
    zipAvailable download formats
    Dataset updated
    Oct 22, 2024
    Dataset authored and provided by
    Intelligence
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Variables measured
    Panda Bounding Boxes
    Description

    Pandas Detection

    ## Overview
    
    Pandas Detection is a dataset for object detection tasks - it contains Panda annotations for 400 images.
    
    ## Getting Started
    
    You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
    
      ## License
    
      This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
    
  6. R

    Red Pandas Dataset

    • universe.roboflow.com
    zip
    Updated Jan 22, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    training (2024). Red Pandas Dataset [Dataset]. https://universe.roboflow.com/training-rduft/red-pandas
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jan 22, 2024
    Dataset authored and provided by
    training
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Variables measured
    Red Pandas Bounding Boxes
    Description

    Red Pandas

    ## Overview
    
    Red Pandas is a dataset for object detection tasks - it contains Red Pandas annotations for 1,756 images.
    
    ## Getting Started
    
    You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
    
      ## License
    
      This dataset is available under the [MIT license](https://creativecommons.org/licenses/MIT).
    
  7. R

    Red Pandas 100 Dataset

    • universe.roboflow.com
    zip
    Updated Sep 3, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    YOLOdata (2024). Red Pandas 100 Dataset [Dataset]. https://universe.roboflow.com/yolodata-cftcs/red-pandas-100
    Explore at:
    zipAvailable download formats
    Dataset updated
    Sep 3, 2024
    Dataset authored and provided by
    YOLOdata
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Variables measured
    Red Pandas 5VrQ Bounding Boxes
    Description

    Red Pandas 100

    ## Overview
    
    Red Pandas 100 is a dataset for object detection tasks - it contains Red Pandas 5VrQ annotations for 328 images.
    
    ## Getting Started
    
    You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
    
      ## License
    
      This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
    
  8. R

    Pandas Bears Dataset

    • universe.roboflow.com
    zip
    Updated Oct 1, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dua Belas Naga (2025). Pandas Bears Dataset [Dataset]. https://universe.roboflow.com/dua-belas-naga-swxp4/pandas-bears-9phhn/dataset/2
    Explore at:
    zipAvailable download formats
    Dataset updated
    Oct 1, 2025
    Dataset authored and provided by
    Dua Belas Naga
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Variables measured
    Panda Bounding Boxes
    Description

    Pandas Bears

    ## Overview
    
    Pandas Bears is a dataset for object detection tasks - it contains Panda annotations for 598 images.
    
    ## Getting Started
    
    You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
    
      ## License
    
      This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
    
  9. h

    pandas-issues

    • huggingface.co
    Updated Aug 24, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Clyde Cossey (2025). pandas-issues [Dataset]. https://huggingface.co/datasets/cicboy/pandas-issues
    Explore at:
    Dataset updated
    Aug 24, 2025
    Authors
    Clyde Cossey
    License

    https://choosealicense.com/licenses/unknown/https://choosealicense.com/licenses/unknown/

    Description

    Pandas GitHub Issues

    This dataset contains 5,000 GitHub issues collected from the pandas-dev/pandas repository.It includes issue metadata, content, labels, user information, timestamps, and comments.
    The dataset is suitable for text classification, multi-label classification, and document retrieval tasks.

      Dataset Structure
    

    Columns:

    id — Internal ID of the issue (int64)
    number — GitHub issue number (int64)
    title — Title of the issue (string)
    state — Issue… See the full description on the dataset page: https://huggingface.co/datasets/cicboy/pandas-issues.

  10. h

    pandas-questions

    • huggingface.co
    Updated Aug 25, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Paco Valdez (2021). pandas-questions [Dataset]. https://huggingface.co/datasets/pacovaldez/pandas-questions
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 25, 2021
    Authors
    Paco Valdez
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    pacovaldez/pandas-questions dataset hosted on Hugging Face and contributed by the HF Datasets community

  11. D

    Panda images dataset

    • researchdata.ntu.edu.sg
    7z, bin, zip
    Updated Sep 21, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Peng Chen; Pranjal Swarup; Pranjal Swarup; Wojciech Michal Matkowski; Adams Wai Kin Kong; Su Han; Zhihe Zhang; Hou Rong; Peng Chen; Wojciech Michal Matkowski; Adams Wai Kin Kong; Su Han; Zhihe Zhang; Hou Rong (2020). Panda images dataset [Dataset]. http://doi.org/10.21979/N9/8CYVGF
    Explore at:
    bin(1048576000), 7z(2460230), bin(126231456), zip(465428296), zip(12876752534)Available download formats
    Dataset updated
    Sep 21, 2020
    Dataset provided by
    DR-NTU (Data)
    Authors
    Peng Chen; Pranjal Swarup; Pranjal Swarup; Wojciech Michal Matkowski; Adams Wai Kin Kong; Su Han; Zhihe Zhang; Hou Rong; Peng Chen; Wojciech Michal Matkowski; Adams Wai Kin Kong; Su Han; Zhihe Zhang; Hou Rong
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Dataset funded by
    Sichuan Science and Technology Program
    Panda International Foundation of the National Forestry Administration, China
    Chengdu Giant Panda Breeding Research Foundation
    Chengdu Research Base of Giant Panda Breeding
    National Natural Science Foundation of China
    Description

    The data used in the study titled "A Study on Giant Panda Recognition Based on Images of a Large Proportion of Captive Pandas".

  12. d

    Giant panda distribution ranges in the Liangshan Mountains

    • datadryad.org
    • data.niaid.nih.gov
    zip
    Updated May 28, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jianghong Ran; Yuhang Li; Gai Luo; Megan Price; Yuxin Liu (2023). Giant panda distribution ranges in the Liangshan Mountains [Dataset]. http://doi.org/10.5061/dryad.ns1rn8pzm
    Explore at:
    zipAvailable download formats
    Dataset updated
    May 28, 2023
    Dataset provided by
    Dryad
    Authors
    Jianghong Ran; Yuhang Li; Gai Luo; Megan Price; Yuxin Liu
    Time period covered
    May 23, 2023
    Description

    Comprehending the population trend and understanding the distribution range dynamics of species is necessary for global species protection. Recognizing what causes dynamic distribution change is crucial for identifying species’ environmental preferences and formulating protection policies. Here, we studied the rear-edge population of the flagship species, giant pandas (Ailuropoda melanoleuca), to 1) assess their population trend using their distribution patterns, 2) evaluate their distribution dynamics change from the 2nd (1988) to the 3rd (2001) surveys (2–3 Interval) and 3rd to the 4th (2013) survey (3–4 Interval) using a machine learning algorithm (The Extremely Gradient Boosting), and 3) decode model results to identify driver factors in the first known use of SHapley Additive exPlanations. Our results showed that the population trends in Liangshan Mountains were worst in the 2nd survey (k = 1.050), improved by the 3rd survey (k = 0.97), but got worse by the 4th survey (k = 0.996), ...

  13. Z

    Multimodal Vision-Audio-Language Dataset

    • data.niaid.nih.gov
    • zenodo.org
    Updated Jul 11, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Schaumlöffel, Timothy; Roig, Gemma; Choksi, Bhavin (2024). Multimodal Vision-Audio-Language Dataset [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_10060784
    Explore at:
    Dataset updated
    Jul 11, 2024
    Dataset provided by
    Goethe University Frankfurt
    Authors
    Schaumlöffel, Timothy; Roig, Gemma; Choksi, Bhavin
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The Multimodal Vision-Audio-Language Dataset is a large-scale dataset for multimodal learning. It contains 2M video clips with corresponding audio and a textual description of the visual and auditory content. The dataset is an ensemble of existing datasets and fills the gap of missing modalities. Details can be found in the attached report. Annotation The annotation files are provided as Parquet files. They can be read using Python and the pandas and pyarrow library. The split into train, validation and test set follows the split of the original datasets. Installation

    pip install pandas pyarrow Example

    import pandas as pddf = pd.read_parquet('annotation_train.parquet', engine='pyarrow')print(df.iloc[0])

    dataset AudioSet filename train/---2_BBVHAA.mp3 captions_visual [a man in a black hat and glasses.] captions_auditory [a man speaks and dishes clank.] tags [Speech] Description The annotation file consists of the following fields:filename: Name of the corresponding file (video or audio file)dataset: Source dataset associated with the data pointcaptions_visual: A list of captions related to the visual content of the video. Can be NaN in case of no visual contentcaptions_auditory: A list of captions related to the auditory content of the videotags: A list of tags, classifying the sound of a file. It can be NaN if no tags are provided Data files The raw data files for most datasets are not released due to licensing issues. They must be downloaded from the source. However, due to missing files, we provide them on request. Please contact us at schaumloeffel@em.uni-frankfurt.de

  14. a

    An Introduction to Pandas, GeoPandas and More with Python

    • planning-commission-washcodps.hub.arcgis.com
    Updated Mar 22, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Washington County, PA GIS (2024). An Introduction to Pandas, GeoPandas and More with Python [Dataset]. https://planning-commission-washcodps.hub.arcgis.com/datasets/washingtoncopa::an-introduction-to-pandas-geopandas-and-more-with-python
    Explore at:
    Dataset updated
    Mar 22, 2024
    Dataset authored and provided by
    Washington County, PA GIS
    Description

    Geospatial potential is available in tabular formats provided by clients and stakeholders for GIS-related projects. These tabular formats commonly include comma separated values and spreadsheets. While not immediately geospatial in nature, the tabular data can be upgraded to geospatial data with libraries such as Pandas and GeoPandas. Subsequently, this geospatial data can be converted back to a tabular format for non-GIS users. This lecture will conquer the learning curve of beginning Python with Pandas and GeoPandas for basic data conversions.

  15. pandas_workshop_data

    • kaggle.com
    zip
    Updated Sep 2, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    mnijhuis (2020). pandas_workshop_data [Dataset]. https://www.kaggle.com/datasets/mnijhuis/pandas-workshop-data
    Explore at:
    zip(145821500 bytes)Available download formats
    Dataset updated
    Sep 2, 2020
    Authors
    mnijhuis
    Description

    Dataset

    This dataset was created by mnijhuis

    Released under Other (specified in description)

    Contents

  16. n

    Keyphrase Metrics for Pandas

    • newsletterscan.com
    Updated Aug 20, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). Keyphrase Metrics for Pandas [Dataset]. http://newsletterscan.com/topic/pandas
    Explore at:
    Dataset updated
    Aug 20, 2025
    Variables measured
    Mentions, Growth Rate, Growth Category
    Description

    A dataset of mentions, growth rate, and total volume of the keyphrase 'Pandas' over time.

  17. R

    Panda Dataset

    • universe.roboflow.com
    zip
    Updated Oct 31, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    yolo (2024). Panda Dataset [Dataset]. https://universe.roboflow.com/yolo-ujgqj/panda-j219t/dataset/3
    Explore at:
    zipAvailable download formats
    Dataset updated
    Oct 31, 2024
    Dataset authored and provided by
    yolo
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Variables measured
    Panda Bounding Boxes
    Description

    Panda

    ## Overview
    
    Panda is a dataset for object detection tasks - it contains Panda annotations for 299 images.
    
    ## Getting Started
    
    You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
    
      ## License
    
      This dataset is available under the [MIT license](https://creativecommons.org/licenses/MIT).
    
  18. n

    Data from: Predicting range shifts of the giant pandas under future climate...

    • data.niaid.nih.gov
    • search.dataone.org
    • +1more
    zip
    Updated Oct 25, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Zhenjun Liu; Xuzhe Zhao; Wei Wei; Mingsheng Hong; Hong Zhou; Junfeng Tang; Zejun Zhang (2022). Predicting range shifts of the giant pandas under future climate and land use scenarios [Dataset]. http://doi.org/10.5061/dryad.xd2547dk7
    Explore at:
    zipAvailable download formats
    Dataset updated
    Oct 25, 2022
    Dataset provided by
    China West Normal University
    Authors
    Zhenjun Liu; Xuzhe Zhao; Wei Wei; Mingsheng Hong; Hong Zhou; Junfeng Tang; Zejun Zhang
    License

    https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html

    Description

    Aim: Understanding and predicting how species will respond to global environmental change (i.e., climate and land use change) is essential to efficiently inform conservation and management strategies for authorities and managers. Here, we assessed the combined effect of future climate and land use change on the potential range shifts of the giant pandas (Ailuropoda melanoleuca). Location: Sichuan Province, China. Methods: We used ensemble species distribution models (SDMs) to forecast range shifts of the giant pandas by the 2050s and 2070s under four combined climate and land use change scenarios. We also compared the differences in distributional changes of giant pandas among the five mountains in the study area. Results: Our ensemble SDMs exhibited good model performance in terms of both AUC (0.931) and TSS (0.747), and suggested that precipitation seasonality, annual mean temperature, the proportion of forest cover and total annual precipitation are the most important factors in shaping the current distribution patterns for the giant pandas. Our projections of future species distribution also suggested a range expansion under an optimistic greenhouse gas emission, while suggesting a range contraction under a pessimistic greenhouse gas emission. Moreover, we found that there is considerable variation in the projected range change patterns among the five mountains in the study area. Especially, the suitable habitat of the giant panda is predicted to increase under all scenarios in Minshan mountains, while is predicted to decrease under all scenarios in Daxiangling and Liangshan mountains, indicating the vulnerability of the giant pandas at low latitudes. Main conclusions: Our findings highlight the importance of an integrated approach that combines climate and land use change to predict the future species distribution and the need for a spatial explicit consideration of the projected range change patterns of target species for guiding conservation and management strategies.

  19. i

    Grant Giving Statistics for Pandas Resource Network Inc

    • instrumentl.com
    Updated Feb 19, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2023). Grant Giving Statistics for Pandas Resource Network Inc [Dataset]. https://www.instrumentl.com/990-report/pandas-resource-network-inc
    Explore at:
    Dataset updated
    Feb 19, 2023
    Variables measured
    Total Assets
    Description

    Financial overview and grant giving statistics of Pandas Resource Network Inc

  20. d

    Data from: Ecological and anthropogenic drivers of local extinction and...

    • datadryad.org
    • datasetcatalog.nlm.nih.gov
    zip
    Updated Nov 12, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Junfeng Tang; Ronald R. Swaisgood; Megan A. Owen; Xuzhe Zhao; Wei Wei; Mingsheng Hong; Hong Zhou; Jindong Zhang; Zenjun Zhang (2024). Ecological and anthropogenic drivers of local extinction and colonization of giant pandas over the past 30 years [Dataset]. http://doi.org/10.5061/dryad.2280gb60d
    Explore at:
    zipAvailable download formats
    Dataset updated
    Nov 12, 2024
    Dataset provided by
    Dryad
    Authors
    Junfeng Tang; Ronald R. Swaisgood; Megan A. Owen; Xuzhe Zhao; Wei Wei; Mingsheng Hong; Hong Zhou; Jindong Zhang; Zenjun Zhang
    Time period covered
    Dec 13, 2023
    Description

    Data from: Ecological and anthropogenic drivers of local extinction and colonization of giant pandas over the past 30 years

    https://doi.org/10.5061/dryad.2280gb60d

    Description of the data and file structure

    Data from: Ecological and anthropogenic drivers of local extinction and colonization of giant pandas over the past 30 years

    Datasets used to identify ecological and anthropogenic drivers of local extinction and colonization of giant pandas over the past 30 years

    Files and variables:

    File:

    R script—Script to run spatial generalized additive models in the programming language R

    TP12_5km_ext.csv — local extinction (loss [1] and persistence [0]), local rarity, local abundance, protected area status, 19 future bioclimatic variables and 10 land use variables during TP1-TP2 at 5 km X 5 km grid cell

    TP12_5km_col.csv — local co...

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Or Hiltch (2024). pandas-create-context [Dataset]. https://huggingface.co/datasets/hiltch/pandas-create-context

pandas-create-context

pandas-create-context

hiltch/pandas-create-context

Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jan 8, 2024
Authors
Or Hiltch
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Overview

This dataset is built from sql-create-context, which in itself builds from WikiSQL and Spider. I have used GPT4 to translate the SQL schema into pandas DataFrame schem initialization statements and to translate the SQL queries into pandas queries. There are 862 examples of natural language queries, pandas DataFrame creation statements, and pandas query answering the question using the DataFrame creation statement as context. This dataset was built with text-to-pandas LLMs… See the full description on the dataset page: https://huggingface.co/datasets/hiltch/pandas-create-context.

Search
Clear search
Close search
Google apps
Main menu