78 datasets found
  1. PandasPlotBench

    • huggingface.co
    Updated Nov 25, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    JetBrains Research (2024). PandasPlotBench [Dataset]. https://huggingface.co/datasets/JetBrains-Research/PandasPlotBench
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Nov 25, 2024
    Dataset provided by
    JetBrainshttp://jetbrains.com/
    Authors
    JetBrains Research
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    PandasPlotBench

    PandasPlotBench is a benchmark to assess the capability of models in writing the code for visualizations given the description of the Pandas DataFrame. 🛠️ Task. Given the plotting task and the description of a Pandas DataFrame, write the code to build a plot. The dataset is based on the MatPlotLib gallery. The paper can be found in arXiv: https://arxiv.org/abs/2412.02764v1. To score your model on this dataset, you can use the our GitHub repository. 📩 If you have… See the full description on the dataset page: https://huggingface.co/datasets/JetBrains-Research/PandasPlotBench.

  2. Pandas Test Data

    • kaggle.com
    Updated Sep 27, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Gyan Kumar (2020). Pandas Test Data [Dataset]. https://www.kaggle.com/kgmgyan57/pandas-test-data/metadata
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Sep 27, 2020
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Gyan Kumar
    Description

    Dataset

    This dataset was created by Gyan Kumar

    Contents

  3. A

    ‘Pandas practices’ analyzed by Analyst-2

    • analyst-2.ai
    Updated Jan 28, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2022). ‘Pandas practices’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-pandas-practices-890b/latest
    Explore at:
    Dataset updated
    Jan 28, 2022
    Dataset authored and provided by
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Analysis of ‘Pandas practices’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/melihkanbay/police on 28 January 2022.

    --- Dataset description provided by original source is as follows ---

    https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F3718520%2Fccd96a32c92d21640b67c1aa74a685c6%2Findir%20(1).jpg?generation=1581067964496524&alt=media" alt="">

    Context

    vehicles stopped and search by the police

    Content

    Age, reason....

    Acknowledgements

    thx for stanford

    Inspiration

    do practice

    --- Original source retains full ownership of the source dataset ---

  4. h

    panda

    • huggingface.co
    Updated Jan 17, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    AI at Meta (2017). panda [Dataset]. https://huggingface.co/datasets/facebook/panda
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jan 17, 2017
    Dataset authored and provided by
    AI at Meta
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Dataset Card for PANDA

      Dataset Summary
    

    PANDA (Perturbation Augmentation NLP DAtaset) consists of approximately 100K pairs of crowdsourced human-perturbed text snippets (original, perturbed). Annotators were given selected terms and target demographic attributes, and instructed to rewrite text snippets along three demographic axes: gender, race and age, while preserving semantic meaning. Text snippets were sourced from a range of text corpora (BookCorpus, Wikipedia, ANLI… See the full description on the dataset page: https://huggingface.co/datasets/facebook/panda.

  5. A

    ‘Datasets for Pandas’ analyzed by Analyst-2

    • analyst-2.ai
    Updated Jan 28, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2022). ‘Datasets for Pandas’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-datasets-for-pandas-e46e/3d497e33/?iid=002-090&v=presentation
    Explore at:
    Dataset updated
    Jan 28, 2022
    Dataset authored and provided by
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Analysis of ‘Datasets for Pandas’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/rajacsp/datasets-for-pandas on 28 January 2022.

    --- No further description of dataset provided by original source ---

    --- Original source retains full ownership of the source dataset ---

  6. Z

    Python Time Normalized Superposed Epoch Analysis (SEAnorm) Example Data Set

    • data.niaid.nih.gov
    Updated Jul 15, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Walton, Sam D. (2022). Python Time Normalized Superposed Epoch Analysis (SEAnorm) Example Data Set [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_6835136
    Explore at:
    Dataset updated
    Jul 15, 2022
    Dataset provided by
    Walton, Sam D.
    Murphy, Kyle R.
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Solar Wind Omni and SAMPEX ( Solar Anomalous and Magnetospheric Particle Explorer) datasets used in examples for SEAnorm, a time normalized superposed epoch analysis package in python.

    Both data sets are stored as either a HDF5 or a compressed csv file (csv.bz2) which contain a Pandas DataFrame of either the Solar Wind Omni and SAMPEX data sets. The data sets where written with pandas.DataFrame.to_hdf() and pandas.DataFrame.to_csv() using a compression level of 9. The DataFrames can be read using pandas.DataFrame.read_hdf( ) or pandas.DataFrame.read_csv( ) depending on the file format.

    The Solar Wind Omni data sets contains solar wind velocity (V) and dynamic pressure (P), the southward interplanetary magnetic field in Geocentric Solar Ecliptic System (GSE) coordinates (B_Z_GSE), the auroral electrojet index (AE), and the Sym-H index all at 1 minute cadence.

    The SAMPEX data set contains electron flux from the Proton/Electron Telescope (PET) at two energy channels 1.5-6.0 MeV (ELO) and 2.5-14 MeV (EHI) at an approximate 6 second cadence.

  7. Sales Analysis

    • kaggle.com
    zip
    Updated Jun 30, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Vinay Shaw (2020). Sales Analysis [Dataset]. https://www.kaggle.com/vinayshaw/sales-analysis
    Explore at:
    zip(2492073 bytes)Available download formats
    Dataset updated
    Jun 30, 2020
    Authors
    Vinay Shaw
    Description

    Dataset

    This dataset was created by Vinay Shaw

    Contents

    It contains the following files:

  8. Reproduction of PANDA: analysis for simulations and applications

    • zenodo.org
    zip
    Updated Aug 19, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Meng-Guo Wang; Meng-Guo Wang (2024). Reproduction of PANDA: analysis for simulations and applications [Dataset]. http://doi.org/10.5281/zenodo.13324624
    Explore at:
    zipAvailable download formats
    Dataset updated
    Aug 19, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Meng-Guo Wang; Meng-Guo Wang
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    These data are derived from analyses based on PANDA results and are consistent with those presented in the paper "Dual decoding of cell types and gene expression in spatial transcriptomics with PANDA".

    To ensure that the file paths match those used in the code, please place the files in the following directories within your working directory before extracting them:

    "Analysis/simulations/paired_scenario.zip"

    "Analysis/simulations/unpaired_scenario.zip"

    "Analysis/simulations/merfish.zip"

    "Analysis/simulations/reference_choice.zip"

    "Analysis/simulations/parameter_sensitivity.zip"

    "Analysis/simulations/time_memory.zip"

    "Analysis/applications/melanoma.zip"

    "Analysis/applications/mouse_brain.zip"

    "Analysis/applications/human_heart.zip"

  9. Learn Data Science Series Part 1

    • kaggle.com
    Updated Dec 30, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rupesh Kumar (2022). Learn Data Science Series Part 1 [Dataset]. https://www.kaggle.com/datasets/hunter0007/learn-data-science-part-1
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Dec 30, 2022
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Rupesh Kumar
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Please feel free to share it with others and consider supporting me if you find it helpful ⭐️.

    Overview:

    • Chapter 1: Getting started with pandas
    • Chapter 2: Analysis: Bringing it all together and making decisions
    • Chapter 3: Appending to DataFrame
    • Chapter 4: Boolean indexing of dataframes
    • Chapter 5: Categorical data
    • Chapter 6: Computational Tools
    • Chapter 7: Creating DataFrames
    • Chapter 8: Cross sections of different axes with MultiIndex
    • Chapter 9: Data Types
    • Chapter 10: Dealing with categorical variables
    • Chapter 11: Duplicated data
    • Chapter 12: Getting information about DataFrames
    • Chapter 13: Gotchas of pandas
    • Chapter 14: Graphs and Visualizations
    • Chapter 15: Grouping Data
    • Chapter 16: Grouping Time Series Data
    • Chapter 17: Holiday Calendars
    • Chapter 18: Indexing and selecting data
    • Chapter 19: IO for Google BigQuery
    • Chapter 20: JSON
    • Chapter 21: Making Pandas Play Nice With Native Python Datatypes
    • Chapter 22: Map Values
    • Chapter 23: Merge, join, and concatenate
    • Chapter 24: Meta: Documentation Guidelines
    • Chapter 25: Missing Data
    • Chapter 26: MultiIndex
    • Chapter 27: Pandas Datareader
    • Chapter 28: Pandas IO tools (reading and saving data sets)
    • Chapter 29: pd.DataFrame.apply
    • Chapter 30: Read MySQL to DataFrame
    • Chapter 31: Read SQL Server to Dataframe
    • Chapter 32: Reading files into pandas DataFrame
    • Chapter 33: Resampling
    • Chapter 34: Reshaping and pivoting
    • Chapter 35: Save pandas dataframe to a csv file
    • Chapter 36: Series
    • Chapter 37: Shifting and Lagging Data
    • Chapter 38: Simple manipulation of DataFrames
    • Chapter 39: String manipulation
    • Chapter 40: Using .ix, .iloc, .loc, .at and .iat to access a DataFrame
    • Chapter 41: Working with Time Series
  10. Z

    Dataset for class comment analysis

    • data.niaid.nih.gov
    Updated Feb 22, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Pooja Rani (2022). Dataset for class comment analysis [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_4311838
    Explore at:
    Dataset updated
    Feb 22, 2022
    Dataset authored and provided by
    Pooja Rani
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    A list of different projects selected to analyze class comments (available in the source code) of various languages such as Java, Python, and Pharo. The projects vary in terms of size, contributors, and domain.

    Structure

    Projects/
      Java_projects/
        eclipse.zip
        guava.zip
        guice.zip
        hadoop.zip
        spark.zip
        vaadin.zip
    
      Pharo_projects/
        images/
          GToolkit.zip
          Moose.zip
          PetitParser.zip
          Pillar.zip
          PolyMath.zip
          Roassal2.zip
          Seaside.zip
    
        vm/
          70-x64/Pharo
    
        Scripts/
          ClassCommentExtraction.st
          SampleSelectionScript.st    
    
      Python_projects/
        django.zip
        ipython.zip
        Mailpile.zip
        pandas.zip
        pipenv.zip
        pytorch.zip   
        requests.zip 
      
    

    Contents of the Replication Package

    Projects/ contains the raw projects of each language that are used to analyze class comments. - Java_projects/ - eclipse.zip - Eclipse project downloaded from the GitHub. More detail about the project is available on GitHub Eclipse. - guava.zip - Guava project downloaded from the GitHub. More detail about the project is available on GitHub Guava. - guice.zip - Guice project downloaded from the GitHub. More detail about the project is available on GitHub Guice - hadoop.zip - Apache Hadoop project downloaded from the GitHub. More detail about the project is available on GitHub Apache Hadoop - spark.zip - Apache Spark project downloaded from the GitHub. More detail about the project is available on GitHub Apache Spark - vaadin.zip - Vaadin project downloaded from the GitHub. More detail about the project is available on GitHub Vaadin

    • Pharo_projects/

      • images/ -

        • GToolkit.zip - Gtoolkit project is imported into the Pharo image. We can run this image with the virtual machine given in the vm/ folder. The script to extract the comments is already provided in the image.
        • Moose.zip - Moose project is imported into the Pharo image. We can run this image with the virtual machine given in the vm/ folder. The script to extract the comments is already provided in the image.
        • PetitParser.zip - Petit Parser project is imported into the Pharo image. We can run this image with the virtual machine given in the vm/ folder. The script to extract the comments is already provided in the image.
        • Pillar.zip - Pillar project is imported into the Pharo image. We can run this image with the virtual machine given in the vm/ folder. The script to extract the comments is already provided in the image.
        • PolyMath.zip - PolyMath project is imported into the Pharo image. We can run this image with the virtual machine given in the vm/ folder. The script to extract the comments is already provided in the image.
        • Roassal2.zip - Roassal2 project is imported into the Pharo image. We can run this image with the virtual machine given in the vm/ folder. The script to extract the comments is already provided in the image.
        • Seaside.zip - Seaside project is imported into the Pharo image. We can run this image with the virtual machine given in the vm/ folder. The script to extract the comments is already provided in the image.
      • vm/ -

      • 70-x64/Pharo - Pharo7 (version 7 of Pharo) virtual machine to instantiate the Pharo images given in the images/ folder. The user can run the vm on macOS and select any of the Pharo image.

      • Scripts/ - It contains the sample Smalltalk scripts to extract class comments from various projects.

      • ClassCommentExtraction.st - A Smalltalk script to show how class comments are extracted from various Pharo projects. This script is already provided in the respective project image.

      • SampleSelectionScript.st - A Smalltalk script to show sample class comments of Pharo projects are selected. This script can be run in any of the Pharo images given in the images/ folder.

    • Python_projects/

      • django.zip - Django project downloaded from the GitHub. More detail about the project is available on GitHub Django
      • ipython.zip - IPython project downloaded from the GitHub. More detail about the project is available on GitHub on IPython
      • Mailpile.zip - Mailpile project downloaded from the GitHub. More detail about the project is available on GitHub on Mailpile
      • pandas.zip - pandas project downloaded from the GitHub. More detail about the project is available on GitHub on pandas
      • pipenv.zip - Pipenv project downloaded from the GitHub. More detail about the project is available on GitHub on Pipenv
      • pytorch.zip - PyTorch project downloaded from the GitHub. More detail about the project is available on GitHub on PyTorch
      • requests.zip - Requests project downloaded from the GitHub. More detail about the project is available on GitHub on Requests
  11. f

    Summary of miRNAs sequencing.

    • figshare.com
    xls
    Updated Jun 4, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mingyu Yang; Lianming Du; Wujiao Li; Fujun Shen; Zhenxin Fan; Zuoyi Jian; Rong Hou; Yongmei Shen; Bisong Yue; Xiuyue Zhang (2023). Summary of miRNAs sequencing. [Dataset]. http://doi.org/10.1371/journal.pone.0143242.t002
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 4, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Mingyu Yang; Lianming Du; Wujiao Li; Fujun Shen; Zhenxin Fan; Zuoyi Jian; Rong Hou; Yongmei Shen; Bisong Yue; Xiuyue Zhang
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Summary of miRNAs sequencing.

  12. C

    Communication Panda Polarization-Maintaining Fiber Report

    • datainsightsmarket.com
    doc, pdf, ppt
    Updated Feb 14, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Data Insights Market (2025). Communication Panda Polarization-Maintaining Fiber Report [Dataset]. https://www.datainsightsmarket.com/reports/communication-panda-polarization-maintaining-fiber-459634
    Explore at:
    doc, ppt, pdfAvailable download formats
    Dataset updated
    Feb 14, 2025
    Dataset authored and provided by
    Data Insights Market
    License

    https://www.datainsightsmarket.com/privacy-policyhttps://www.datainsightsmarket.com/privacy-policy

    Time period covered
    2025 - 2033
    Area covered
    Global
    Variables measured
    Market Size
    Description

    The market for Communication Panda Polarization-Maintaining Fiber is projected to reach a value of USD XX million by 2033, growing at a CAGR of XX% during the forecast period. The demand for high-speed data transmission and the increasing adoption of advanced communication technologies are driving the growth of the market. Panda Polarization-Maintaining Fiber is critical in optical communication systems, enabling the transmission of data over long distances without signal degradation. Key industry players include Corning, Fujikura, Yangtze Optical Fibre and Cable Joint Stock, Humanetics (Fibercore), Coherent, Furukawa Electric (OFS), Wuhan Yangtze Optical Electronic, Fiberhome Telecommunication Technologies, and iXblue, among others. The market is concentrated in North America and Europe, while Asia-Pacific is expected to witness significant growth in the coming years due to the increasing demand for telecommunication infrastructure. The research study also provides insights into the challenges and opportunities faced by the market participants, along with detailed competitive analysis.

  13. f

    Changes in the Milk Metabolome of the Giant Panda (Ailuropoda melanoleuca)...

    • figshare.com
    • data.niaid.nih.gov
    pdf
    Updated May 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Tong Zhang; Rong Zhang; Liang Zhang; Zhihe Zhang; Rong Hou; Hairui Wang; I. Kati Loeffler; David G. Watson; Malcolm W. Kennedy (2023). Changes in the Milk Metabolome of the Giant Panda (Ailuropoda melanoleuca) with Time after Birth – Three Phases in Early Lactation and Progressive Individual Differences [Dataset]. http://doi.org/10.1371/journal.pone.0143417
    Explore at:
    pdfAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Tong Zhang; Rong Zhang; Liang Zhang; Zhihe Zhang; Rong Hou; Hairui Wang; I. Kati Loeffler; David G. Watson; Malcolm W. Kennedy
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Ursids (bears) in general, and giant pandas in particular, are highly altricial at birth. The components of bear milks and their changes with time may be uniquely adapted to nourish relatively immature neonates, protect them from pathogens, and support the maturation of neonatal digestive physiology. Serial milk samples collected from three giant pandas in early lactation were subjected to untargeted metabolite profiling and multivariate analysis. Changes in milk metabolites with time after birth were analysed by Principal Component Analysis, Hierarchical Cluster Analysis and further supported by Orthogonal Partial Least Square-Discriminant Analysis, revealing three phases of milk maturation: days 1–6 (Phase 1), days 7–20 (Phase 2), and beyond day 20 (Phase 3). While the compositions of Phase 1 milks were essentially indistinguishable among individuals, divergences emerged during the second week of lactation. OPLS regression analysis positioned against the growth rate of one cub tentatively inferred a correlation with changes in the abundance of a trisaccharide, isoglobotriose, previously observed to be a major oligosaccharide in ursid milks. Three artificial milk formulae used to feed giant panda cubs were also analysed, and were found to differ markedly in component content from natural panda milk. These findings have implications for the dependence of the ontogeny of all species of bears, and potentially other members of the Carnivora and beyond, on the complexity and sequential changes in maternal provision of micrometabolites in the immediate period after birth.

  14. f

    Summary the gender and age for all samples.

    • plos.figshare.com
    xls
    Updated Jun 7, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mingyu Yang; Lianming Du; Wujiao Li; Fujun Shen; Zhenxin Fan; Zuoyi Jian; Rong Hou; Yongmei Shen; Bisong Yue; Xiuyue Zhang (2023). Summary the gender and age for all samples. [Dataset]. http://doi.org/10.1371/journal.pone.0143242.t001
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 7, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Mingyu Yang; Lianming Du; Wujiao Li; Fujun Shen; Zhenxin Fan; Zuoyi Jian; Rong Hou; Yongmei Shen; Bisong Yue; Xiuyue Zhang
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Summary the gender and age for all samples.

  15. e

    Large-scale structure of M31 halo. II. PAndAS - Dataset - B2FIND

    • b2find.eudat.eu
    Updated May 8, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2023). Large-scale structure of M31 halo. II. PAndAS - Dataset - B2FIND [Dataset]. https://b2find.eudat.eu/dataset/9ef40312-b311-501b-9d40-2f2ddf586069
    Explore at:
    Dataset updated
    May 8, 2023
    Description

    The Pan-Andromeda Archaeological Survey is a survey of >400deg^2^ centered on the Andromeda (M31) and Triangulum (M33) galaxies that has provided the most extensive panorama of an L* galaxy group to large projected galactocentric radii. Here, we collate and summarize the current status of our knowledge of the substructures in the stellar halo of M31, and discuss connections between these features. We estimate that the 13 most distinctive substructures were produced by at least 5 different accretion events, all in the last 3 or 4Gyr. We suggest that a few of the substructures farthest from M31 may be shells from a single accretion event. We calculate the luminosities of some prominent substructures for which previous estimates were not available, and we estimate the stellar mass budget of the outer halo of M31. We revisit the problem of quantifying the properties of a highly structured data set; specifically, we use the OPTICS clustering algorithm to quantify the hierarchical structure of M31's stellar halo and identify three new faint structures. M31's halo, in projection, appears to be dominated by two "mega-structures", which can be considered as the two most significant branches of a merger tree produced by breaking M31's stellar halo into increasingly smaller structures based on the stellar spatial clustering. We conclude that OPTICS is a powerful algorithm that could be used in any astronomical application involving the hierarchical clustering of points.

  16. f

    Table_5_Metagenomic Analysis of Bacteria, Fungi, Bacteriophages, and...

    • frontiersin.figshare.com
    docx
    Updated Jun 2, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Shengzhi Yang; Xin Gao; Jianghong Meng; Anyun Zhang; Yingmin Zhou; Mei Long; Bei Li; Wenwen Deng; Lei Jin; Siyue Zhao; Daifu Wu; Yongguo He; Caiwu Li; Shuliang Liu; Yan Huang; Hemin Zhang; Likou Zou (2023). Table_5_Metagenomic Analysis of Bacteria, Fungi, Bacteriophages, and Helminths in the Gut of Giant Pandas.DOCX [Dataset]. http://doi.org/10.3389/fmicb.2018.01717.s020
    Explore at:
    docxAvailable download formats
    Dataset updated
    Jun 2, 2023
    Dataset provided by
    Frontiers
    Authors
    Shengzhi Yang; Xin Gao; Jianghong Meng; Anyun Zhang; Yingmin Zhou; Mei Long; Bei Li; Wenwen Deng; Lei Jin; Siyue Zhao; Daifu Wu; Yongguo He; Caiwu Li; Shuliang Liu; Yan Huang; Hemin Zhang; Likou Zou
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    To obtain full details of gut microbiota, including bacteria, fungi, bacteriophages, and helminths, in giant pandas (GPs), we created a comprehensive microbial genome database and used metagenomic sequences to align against the database. We delineated a detailed and different gut microbiota structures of GPs. A total of 680 species of bacteria, 198 fungi, 185 bacteriophages, and 45 helminths were found. Compared with 16S rRNA sequencing, the dominant bacterium phyla not only included Proteobacteria, Firmicutes, Bacteroidetes, and Actinobacteria but also Cyanobacteria and other eight phyla. Aside from Ascomycota, Basidiomycota, and Glomeromycota, Mucoromycota, and Microsporidia were the dominant fungi phyla. The bacteriophages were predominantly dsDNA Myoviridae, Siphoviridae, Podoviridae, ssDNA Inoviridae, and Microviridae. For helminths, phylum Nematoda was the dominant. In addition to previously described parasites, another 44 species of helminths were found in GPs. Also, differences in abundance of microbiota were found between the captive, semiwild, and wild GPs. A total of 1,739 genes encoding cellulase, β-glucosidase, and cellulose β-1,4-cellobiosidase were responsible for the metabolism of cellulose, and 128,707 putative glycoside hydrolase genes were found in bacteria/fungi. Taken together, the results indicated not only bacteria but also fungi, bacteriophages, and helminths were diverse in gut of giant pandas, which provided basis for the further identification of role of gut microbiota. Besides, metagenomics revealed that the bacteria/fungi in gut of GPs harbor the ability of cellulose and hemicellulose degradation.

  17. d

    Replication Data for Exploring an extinct society through the lens of...

    • dataone.org
    Updated Dec 16, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Wieczorek, Oliver; Malzahn, Melanie (2023). Replication Data for Exploring an extinct society through the lens of Habitus-Field theory and the Tocharian text corpus [Dataset]. http://doi.org/10.7910/DVN/UF8DHK
    Explore at:
    Dataset updated
    Dec 16, 2023
    Dataset provided by
    Harvard Dataverse
    Authors
    Wieczorek, Oliver; Malzahn, Melanie
    Description

    The files and workflow will allow you to replicate the study titled "Exploring an extinct society through the lens of Habitus-Field theory and the Tocharian text corpus". This study aimed at utilizing the CEToM-corpus (https://cetom.univie.ac.at/) (Tocharian) to analyze the life-world of the elites of an extinct society situated in modern eastern China. To acquire the raw data needed for steps 1 & 2, please contact Melanie Malzahn melanie.malzahn@univie.ac.at. We conducted a mixed methods study, containing of close reading, content analysis, and multiple correspondence analysis (MCA). The excel file titled "fragments_architecture_combined.xlsx" allows for replication of the MCA and equates to the third step of the workflow outlined below. We used the following programming languages and packages to prepare the dataset and to analyze the data. Data preparation and merging procedures were achieved in python (version 3.9.10) with packages pandas (version 1.5.3), os (version 3.12.0), re (version 3.12.0), numpy (version 1.24.3), gensim (version 4.3.1), BeautifulSoup4 (version 4.12.2), pyasn1 (version 0.4.8), and langdetect (version 1.0.9). Multiple Correspondence Analyses were conducted in R (version 4.3.2) with the packages FactoMineR (version 2.9), factoextra (version 1.0.7), readxl version(1.4.3), tidyverse version(2.0.0), ggplot2 (version 3.4.4) and psych (version 2.3.9). After requesting the necessary files, please open the scripts in the order outlined bellow and execute the code-files to replicate the analysis: Preparatory step: Create a folder for the python and r-scripts downloadable in this repository. Open the file 0_create folders.py and declare a root folder in line 19. This first script will generate you the following folders: "tarim-brahmi_database" = Folder, which contains tocharian dictionaries and tocharian text fragments. "dictionaries" = contains tocharian A and tocharian B vocabularies, including linguistic features such as translations, meanings, part of speech tags etc. A full overview of the words is provided on https://cetom.univie.ac.at/?words. "fragments" = contains tocharian text fragments as xml-files. "word_corpus_data" = folder will contain excel-files of the corpus data after the first step. "Architectural_terms" = This folder contains the data on the architectural terms used in the dataset (e.g. dwelling, house). "regional_data" = This folder contains the data on the findsports (tocharian and modern chinese equivalent, e.g. Duldur-Akhur & Kucha). "mca_ready_data" = This is the folder, in which the excel-file with the merged data will be saved. Note that the prepared file named "fragments_architecture_combined.xlsx" can be saved into this directory. This allows you to skip steps 1 &2 and reproduce the MCA of the content analysis based on the third step of our workflow (R-Script titled 3_conduct_MCA.R). First step - run 1_read_xml-files.py: loops over the xml-files in folder dictionaries and identifies a) word metadata, including language (Tocharian A or B), keywords, part of speech, lemmata, word etymology, and loan sources. Then, it loops over the xml-textfiles and extracts a text id number, langauge (Tocharian A or B), text title, text genre, text subgenre, prose type, verse type, material on which the text is written, medium, findspot, the source text in tocharian, and the translation where available. After successful feature extraction, the resulting pandas dataframe object is exported to the word_corpus_data folder. Second step - run 2_merge_excel_files.py: merges all excel files (corpus, data on findspots, word data) and reproduces the content analysis, which was based upon close reading in the first place. Third step - run 3_conduct_MCA.R: recodes, prepares, and selects the variables necessary to conduct the MCA. Then produces the descriptive values, before conducitng the MCA, identifying typical texts per dimension, and exporting the png-files uploaded to this repository.

  18. d

    Daily Statistics for Discharge at USGS 09380000 Colorado River at Lee’s...

    • search.dataone.org
    • beta.hydroshare.org
    • +1more
    Updated Dec 5, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Amber Jones (2021). Daily Statistics for Discharge at USGS 09380000 Colorado River at Lee’s Ferry, AZ: Jupyter Notebook [Dataset]. https://search.dataone.org/view/sha256%3A3a1d249267335d50674650905e654a34b5d3eed742b55b6ced08d0e53b5849e6
    Explore at:
    Dataset updated
    Dec 5, 2021
    Dataset provided by
    Hydroshare
    Authors
    Amber Jones
    Time period covered
    Aug 1, 2018 - Nov 26, 2018
    Area covered
    Colorado River
    Description

    This resource contains a Jupyter Notebook that uses Python to access and visualize data for the USGS flow gage on the Colorado River at Lee’s Ferry, AZ (09380000). This site monitors water quantity and quality for water released from Glen Canyon Dam that then flows through the Grand Canyon. To call these services in Python, the suds-py3 package was used. Using this package, a “GetValuesObject” request, as defined by WaterOneFlow, was passed to the server using inputs for the web service url, site code, variable code, and dates of interest. For this case, 15-minute discharge from August 1, 2018 to the current date was used. The web service returned an object from which the dates and the data values were obtained, as well as the site name. The Python libraries Pandas and Matplotlib were used to manipulate and view the results. The time series data were converted to lists and then to a Pandas series object. Using the “resample” function of Pandas, values for mean, minimum, and maximum were determined on a daily basis from the 15-minute data. Using Matplotlib, a figure object was created to which Pandas series objects were added using the Pandas plot method. The daily mean, minimum, maximum, and the 15-minute flow values were added to illustrate the differences in the daily ranges of data.

  19. Z

    Analysis of references in the IPCC AR6 WG2 Report of 2022

    • data.niaid.nih.gov
    Updated Mar 11, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bianca Kramer (2022). Analysis of references in the IPCC AR6 WG2 Report of 2022 [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_6327206
    Explore at:
    Dataset updated
    Mar 11, 2022
    Dataset provided by
    Cameron Neylon
    Bianca Kramer
    License

    https://creativecommons.org/licenses/publicdomain/https://creativecommons.org/licenses/publicdomain/

    Description

    This repository contains data on 17,419 DOIs cited in the IPCC Working Group 2 contribution to the Sixth Assessment Report, and the code to link them to the dataset built at the Curtin Open Knowledge Initiative (COKI).

    References were extracted from the report's PDFs (downloaded 2022-03-01) via Scholarcy and exported as RIS and BibTeX files. DOI strings were identified from RIS files by pattern matching and saved as CSV file. The list of DOIs for each chapter and cross chapter paper was processed using a custom Python script to generate a pandas DataFrame which was saved as CSV file and uploaded to Google Big Query.

    We used the main object table of the Academic Observatory, which combines information from Crossref, Unpaywall, Microsoft Academic, Open Citations, the Research Organization Registry and Geonames to enrich the DOIs with bibliographic information, affiliations, and open access status. A custom query was used to join and format the data and the resulting table was visualised in a Google DataStudio dashboard.

    This version of the repository also includes the set of DOIs from references in the IPCC Working Group 1 contribution to the Sixth Assessment Report as extracted by Alexis-Michel Mugabushaka and shared on Zenodo: https://doi.org/10.5281/zenodo.5475442 (CC-BY)

    A brief descriptive analysis was provided as a blogpost on the COKI website.

    The repository contains the following content:

    Data:

    data/scholarcy/RIS/ - extracted references as RIS files

    data/scholarcy/BibTeX/ - extracted references as BibTeX files

    IPCC_AR6_WGII_dois.csv - list of DOIs

    data/10.5281_zenodo.5475442/ - references from IPCC AR6 WG1 report

    Processing:

    preprocessing.R - preprocessing steps for identifying and cleaning DOIs

    process.py - Python script for transforming data and linking to COKI data through Google Big Query

    Outcomes:

    Dataset on BigQuery - requires a google account for access and bigquery account for querying

    Data Studio Dashboard - interactive analysis of the generated data

    Zotero library of references extracted via Scholarcy

    PDF version of blogpost

    Note on licenses: Data are made available under CC0 (with the exception of WG1 reference data, which have been shared under CC-BY 4.0) Code is made available under Apache License 2.0

  20. Bayesian Analysis for Remote Biosignature Identification on exoEarths...

    • zenodo.org
    csv
    Updated Nov 22, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Natasha Latouf; Natasha Latouf (2024). Bayesian Analysis for Remote Biosignature Identification on exoEarths (BARBIE) III: Introducing the KEN; Data for CH4 [Dataset]. http://doi.org/10.5281/zenodo.13760695
    Explore at:
    csvAvailable download formats
    Dataset updated
    Nov 22, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Natasha Latouf; Natasha Latouf
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    We present all of the data across our SNR and abundance study for the molecule H2O for an exoEarth twin. The wavelength range is from 0.8-1.5 micron, with 25 evenly spaced 20%, 30%, and 40% bandpasses in this range. The SNR ranges from 3-20. We present the lower and upper wavelength per bandpass, the input CH4 value (abundance case), the retrieved CH4 value (presented as the log10(VMR)), the lower and upper limits of the 68% credible region (presented as the log10(VMR)), and the log-Bayes factor for CH4. For more information about how these were calculated, please see Bayesian Analysis for Remote Biosignature Identification on exoEarths (BARBIE) III: Introducing the KEN, accepted and currently available on arXiv.

    To open this csv as a Pandas dataframe, use the following command:

    your_dataframe_name = pd.read_csv(f'zenodo_table.csv', dtype={'Input CH4': str})

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
JetBrains Research (2024). PandasPlotBench [Dataset]. https://huggingface.co/datasets/JetBrains-Research/PandasPlotBench
Organization logo

PandasPlotBench

JetBrains-Research/PandasPlotBench

Explore at:
3 scholarly articles cite this dataset (View in Google Scholar)
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Nov 25, 2024
Dataset provided by
JetBrainshttp://jetbrains.com/
Authors
JetBrains Research
License

Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically

Description

PandasPlotBench

PandasPlotBench is a benchmark to assess the capability of models in writing the code for visualizations given the description of the Pandas DataFrame. 🛠️ Task. Given the plotting task and the description of a Pandas DataFrame, write the code to build a plot. The dataset is based on the MatPlotLib gallery. The paper can be found in arXiv: https://arxiv.org/abs/2412.02764v1. To score your model on this dataset, you can use the our GitHub repository. 📩 If you have… See the full description on the dataset page: https://huggingface.co/datasets/JetBrains-Research/PandasPlotBench.

Search
Clear search
Close search
Google apps
Main menu