100+ datasets found
  1. d

    Dataset metadata of known Dataverse installations

    • search.dataone.org
    • dataverse.harvard.edu
    • +1more
    Updated Nov 22, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Gautier, Julian (2023). Dataset metadata of known Dataverse installations [Dataset]. http://doi.org/10.7910/DVN/DCDKZQ
    Explore at:
    Dataset updated
    Nov 22, 2023
    Dataset provided by
    Harvard Dataverse
    Authors
    Gautier, Julian
    Description

    This dataset contains the metadata of the datasets published in 77 Dataverse installations, information about each installation's metadata blocks, and the list of standard licenses that dataset depositors can apply to the datasets they publish in the 36 installations running more recent versions of the Dataverse software. The data is useful for reporting on the quality of dataset and file-level metadata within and across Dataverse installations. Curators and other researchers can use this dataset to explore how well Dataverse software and the repositories using the software help depositors describe data. How the metadata was downloaded The dataset metadata and metadata block JSON files were downloaded from each installation on October 2 and October 3, 2022 using a Python script kept in a GitHub repo at https://github.com/jggautier/dataverse-scripts/blob/main/other_scripts/get_dataset_metadata_of_all_installations.py. In order to get the metadata from installations that require an installation account API token to use certain Dataverse software APIs, I created a CSV file with two columns: one column named "hostname" listing each installation URL in which I was able to create an account and another named "apikey" listing my accounts' API tokens. The Python script expects and uses the API tokens in this CSV file to get metadata and other information from installations that require API tokens. How the files are organized ├── csv_files_with_metadata_from_most_known_dataverse_installations │ ├── author(citation).csv │ ├── basic.csv │ ├── contributor(citation).csv │ ├── ... │ └── topic_classification(citation).csv ├── dataverse_json_metadata_from_each_known_dataverse_installation │ ├── Abacus_2022.10.02_17.11.19.zip │ ├── dataset_pids_Abacus_2022.10.02_17.11.19.csv │ ├── Dataverse_JSON_metadata_2022.10.02_17.11.19 │ ├── hdl_11272.1_AB2_0AQZNT_v1.0.json │ ├── ... │ ├── metadatablocks_v5.6 │ ├── astrophysics_v5.6.json │ ├── biomedical_v5.6.json │ ├── citation_v5.6.json │ ├── ... │ ├── socialscience_v5.6.json │ ├── ACSS_Dataverse_2022.10.02_17.26.19.zip │ ├── ADA_Dataverse_2022.10.02_17.26.57.zip │ ├── Arca_Dados_2022.10.02_17.44.35.zip │ ├── ... │ └── World_Agroforestry_-_Research_Data_Repository_2022.10.02_22.59.36.zip └── dataset_pids_from_most_known_dataverse_installations.csv └── licenses_used_by_dataverse_installations.csv └── metadatablocks_from_most_known_dataverse_installations.csv This dataset contains two directories and three CSV files not in a directory. One directory, "csv_files_with_metadata_from_most_known_dataverse_installations", contains 18 CSV files that contain the values from common metadata fields of all 77 Dataverse installations. For example, author(citation)_2022.10.02-2022.10.03.csv contains the "Author" metadata for all published, non-deaccessioned, versions of all datasets in the 77 installations, where there's a row for each author name, affiliation, identifier type and identifier. The other directory, "dataverse_json_metadata_from_each_known_dataverse_installation", contains 77 zipped files, one for each of the 77 Dataverse installations whose dataset metadata I was able to download using Dataverse APIs. Each zip file contains a CSV file and two sub-directories: The CSV file contains the persistent IDs and URLs of each published dataset in the Dataverse installation as well as a column to indicate whether or not the Python script was able to download the Dataverse JSON metadata for each dataset. For Dataverse installations using Dataverse software versions whose Search APIs include each dataset's owning Dataverse collection name and alias, the CSV files also include which Dataverse collection (within the installation) that dataset was published in. One sub-directory contains a JSON file for each of the installation's published, non-deaccessioned dataset versions. The JSON files contain the metadata in the "Dataverse JSON" metadata schema. The other sub-directory contains information about the metadata models (the "metadata blocks" in JSON files) that the installation was using when the dataset metadata was downloaded. I saved them so that they can be used when extracting metadata from the Dataverse JSON files. The dataset_pids_from_most_known_dataverse_installations.csv file contains the dataset PIDs of all published datasets in the 77 Dataverse installations, with a column to indicate if the Python script was able to download the dataset's metadata. It's a union of all of the "dataset_pids_..." files in each of the 77 zip files. The licenses_used_by_dataverse_installations.csv file contains information about the licenses that a number of the installations let depositors choose when creating datasets. When I collected ... Visit https://dataone.org/datasets/sha256%3Ad27d528dae8cf01e3ea915f450426c38fd6320e8c11d3e901c43580f997a3146 for complete metadata about this dataset.

  2. d

    Directory of SIRIM Industry Standards - Dataset - MAMPU

    • archive.data.gov.my
    Updated Dec 6, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2018). Directory of SIRIM Industry Standards - Dataset - MAMPU [Dataset]. https://archive.data.gov.my/data/dataset/directory-of-sirim-industry-standards
    Explore at:
    Dataset updated
    Dec 6, 2018
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset shows SIRIM Industry Standards by SIRIM Training Services Sdn. Bhd (STS).

  3. d

    Official Agency Directory

    • catalog.data.gov
    • datadiscoverystudio.org
    • +3more
    Updated Apr 21, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Federal Grain Inspection Service (2025). Official Agency Directory [Dataset]. https://catalog.data.gov/dataset/official-agency-directory
    Explore at:
    Dataset updated
    Apr 21, 2025
    Dataset provided by
    Federal Grain Inspection Service
    Description

    Provides a listing of the States and privately owned entities designated and/or delegated by the Grain Inspection, Packers and Stockyards Administration (GIPSA), Federal Grain Inspection Service (FGIS) to provide official inspection and/or weighing services under the authority of the United States Grain Standards Act (USGSA). Only entities listed in this Directory are recognized as Official Agencies (OAs) by FGIS.

  4. List of government APIs

    • data.europa.eu
    excel xlsx, ods
    Updated Jan 21, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Joint Research Centre (2020). List of government APIs [Dataset]. https://data.europa.eu/data/datasets/45ca8d82-ac31-4360-b3a1-ba43b0b07377
    Explore at:
    excel xlsx, odsAvailable download formats
    Dataset updated
    Jan 21, 2020
    Dataset authored and provided by
    Joint Research Centrehttps://joint-research-centre.ec.europa.eu/index_en
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This list contains the government API cases collected, cleaned and analysed in the APIs4DGov study "Web API landscape: relevant general purpose ICT standards, technical specifications and terms".

    The list does not represent a complete list of all government cases in Europe, as it is built to support the goals of the study and is limited to the analysis and data gathered from the following sources:

    • The EU open data portal

    • The European data portal

    • The INSPIRE catalogue

    • JoinUp: The API cases collected from the European Commission JoinUp platform

    • Literature-document review: the API cases gathered from the research activities of the study performed till the end of 2019

    • ProgrammableWeb: the ProgrammableWeb API directory

    • Smart 2015/0041: the database of 395 cases created by the study ‘The project Towards faster implementation and uptake of open government’ (SMART 2015/0041).

    • Workshops/meetings/interviews: a list of API cases collected in the workshops, surveys and interviews organised within the APIs4DGov

    Each API case is classified accordingly to the following rationale:

    • Unique id: a unique key of each case, obtained by concatenating the following fields: (Country Code) + (Governmental level) + (Name Id) + (Type of API)

    • API Country or type of provider: the country in which the API case has been published

    • API provider: the specific provider that published and maintain the API case

    • Name Id: an acronym of the name of the API case (it can be not unique)

    • Short description

    • Type of API: (i) API registry, a set, catalogue, registry or directory of APIs; (ii) API platform: a platform that supports the use of APIs; (iii) API tool: a tool used to manage APIs; (iv) API standard: a set of standards related to government APIs; (v) Data catalogue, an API published to access metadata of datasets, normally published by a data catalogue; (vi) Specific API, a unique (can have many endpoints) API built for a specific purpose

    • Number of APIs: normally only one, in the case of API registry, the number of APIs published by the registry at the 31/12/2019

    • Theme: list of domains related to the API case (controlled vocabulary)

    • Governmental level: the geographical scope of the API (city, regional, national or international)

    • Country code: the country two letters internal code

    • Source: the source (among the ones listed in the previous) from where the API case has been gathered

  5. d

    2018 Pre-K School Directory

    • catalog.data.gov
    • data.cityofnewyork.us
    • +2more
    Updated Nov 29, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    data.cityofnewyork.us (2024). 2018 Pre-K School Directory [Dataset]. https://catalog.data.gov/dataset/2018-pre-k-school-directory
    Explore at:
    Dataset updated
    Nov 29, 2024
    Dataset provided by
    data.cityofnewyork.us
    Description

    These data are collected to inform families applying to Pre-K for All of the programs available. This is a spreadsheet version of the exact data points printed in the borough-level directories and the online PDFs. Each record represents a school participating in Pre-K for All. The information for each school is collected by the Department of Early Childhood Education (Department of Education). The "Who Got Offers" section of the spreadsheet is calculated by the Office of Student Enrollment (Department of Education) based on results from Round 1 of the Fall 2017 admissions process. This spreadsheet is simply a different representation of the same material produced in the printed and widely distributed Pre-K directories. This spreadsheet should not be used to identify current programs, as the directory was printed in December 2017 and schools are subject to change. For the most updated list of Pre-K for All schools, use the UPK Sites Directory compiled by the Department of Early Childhood Education. Disclaimer: The following columns were added to this directory to meet the Geo-spatial Standards of Local Law 108 of 2015 • Postcode / Zip code • Latitude • Longitude • Community Board • Council District • Census tract • BIN • BBL • NTA

  6. a

    Healthdirect - NHSD - Services Directory 2023 - Dataset - AURIN

    • data.aurin.org.au
    Updated Mar 5, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). Healthdirect - NHSD - Services Directory 2023 - Dataset - AURIN [Dataset]. https://data.aurin.org.au/dataset/healthdirect_nhsd_services_directory_2023
    Explore at:
    Dataset updated
    Mar 5, 2025
    Description

    The National Health Services Directory ('NHSD'), published by Healthdirect Australia, is a comprehensive and consolidated national directory of health services and related health service providers in both the public and private sectors across all Australian jurisdictions. The purpose of the NHSD is to provide consistent, authoritative, reliable and easily accessible information about health services to the public and to support health professionals with the delivery of healthcare. Service types are classified using the SNOMED CT-AU terminology standard to classify the service types used in NHSD. Refer to the National Clinical Terminology Service for more information. This dataset was derived from service, location and organisation profiles accessed through the Healthdirect provided NHSD API and has been spatialised as a point dataset. This directory is presented as a snapshot in time as at August 2023. This dataset is available for access to academic users once they have agreed to the NHSD terms of use; other AURIN users can apply for access to the dataset here.

  7. CHN Retinotopic Mapping Dataset

    • openneuro.org
    Updated Apr 8, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kelly Chang; Ione Fine; Geoffrey M. Boynton (2024). CHN Retinotopic Mapping Dataset [Dataset]. http://doi.org/10.18112/openneuro.ds004698.v1.0.1
    Explore at:
    Dataset updated
    Apr 8, 2024
    Dataset provided by
    OpenNeurohttps://openneuro.org/
    Authors
    Kelly Chang; Ione Fine; Geoffrey M. Boynton
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Center for Human Neuroscience (CHN) Retinotopic Mapping Dataset

    The Center for Human Neuroscience (CHN) Retinotopic Mapping Dataset collected at the University of Washington is part of "Improving the reliability and accuracy of population receptive field measures using a 'log-bar' stimulus" by Kelly Chang, Ione Fine, and Geoffrey M. Boynton.

    The full dataset is comprised of the raw, preprocessed (with fMRIPrep), and pRF estimated data from 12 participants across 2 sessions.

    Dataset Organization

    • dataset
      This directory contains the raw, unprocessed data for each participant.

    • dataset/derivatives/fmriprep
      This directory contains the fMRIPrep processed data for each participant.

    • dataset/derivatives/freesurfer
      This directory contains the standard FreeSurfer processed data for each participant.

    • dataset/derivatives/prf-estimation
      This directory contains the pRF estimation data and results for each participant.

    • dataset/derivatives/prf-estimation/files
      This directory contains miscellaneous files used for pRF estimation or visualizations.

      • angle_lut.json: Custom polar angle lookup table for visualization with FreeSurfer's freeview.
      • eccen_lut.json: Custom eccentricity lookup table for visualization with FreeSurfer's freeview.
      • participants_hrf_paramters.json: Corresponding metadata for participants_hrf_paramters.tsv.
      • participants_hrf_paramters.tsv: Estimated HRF parameters used during pRF estimation by participant and hemisphere.
    • dataset/derivatives/prf-estimation/stimuli
      This directory contains the stimuli used in the experiment and stimulus apertures used in pRF estimation.

      • task-(fixed|log)bar_run-<n>: Name of the stimulus condition and run number.
      • *_desc-full_stim.mat: Stimulus images (uint8) at full resolution of 540 by 540 pixels and 6 Hz.
      • *_desc-down_aperture.mat: Stimulus aperture (binary) where 1s indicated stimulus and 0s indicated the background at a downsampled (down) resolution of 108 by 108 pixels and 1 Hz.
    • dataset/derivatives/prf-estimation/sub-<n>/anat
      This directory contains the participant's surface (inflated and sphere) and curvature files for visualization using FreeSurfer's freeview.

    • dataset/derivatives/prf-estimation/sub-<n>/func
      This directory contains the preprocessed and denoised functional data, sampled onto the participant's surface, used during pRF estimation.

    • dataset/derivatives/prf-estimation/sub-<n>/prfs This directory contains the estimated pRF parameter maps separated by which data was used during estimation.

      • ses-(01|02|all): Sessions used during pRF estimation, either Session 1, Session 2, or both.
      • task-(fixedbar|logbar|all): Stimuli type used during pRF estimation, either fixed-bar, log-bar, or both.

      Within the pRF estimate directories are the estimated pRF parameter maps for: - *_angle.mgz: Polar angle maps, degrees from (-180, 180). Negative values represent the left hemifield and positive values represent the right hemifield. - *_eccen.mgz: Eccentricity maps, visual degrees. - *_sigma.mgz: pRF size maps, visual degrees. - *_vexpl.mgz: Proportion of variance explained maps. - *_x0.mgz: x-coordinate maps, visual degrees, with origin (0,0) at screen center. - *_y0.mgz: y-coordinate maps, visual degrees, with origin (0,0) at screen center.

    • dataset/derivatives/prf-estimation/sub-<n>/rois
      This directory contains the roi (.label) files for each participant.

      • *_evc.label: Early visual cortex (EVC). A liberal ROI that covered V1, V2, and V3 used for pRF estimation.
      • *_fovea.label: Foveal confluence ROI.
      • *_v<n>.label: Corresponding visual area ROI files.
    • dataset/tutorials
      This directory contains tutorial scripts in MATLAB and Python to generate logarithmically distorted images from a directory of input images.

      • create_distorted_images.[m,ipynb]: Tutorial script that generates logarithmically distorted images when given an image input directory.
      • logarithmic_distortion_demo.[m,ipynb]: Tutorial script that demonstrates the logarithmic distortion warping on a single image.
      • fixed-bar: Sample image input directory for create_distorted_images.[m,ipynb].
      • log-bar: Sample image output directory for create_distorted_images.[m,ipynb].
  8. W

    Directory of Standard Malaysian Glove (SMG) Certified Suppliers

    • cloud.csiss.gmu.edu
    .xlsx
    Updated Jul 1, 2019
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Malaysia (2019). Directory of Standard Malaysian Glove (SMG) Certified Suppliers [Dataset]. https://cloud.csiss.gmu.edu/uddi/dataset/directory-of-standard-malaysian-glove-smg-certified-suppliers
    Explore at:
    .xlsxAvailable download formats
    Dataset updated
    Jul 1, 2019
    Dataset provided by
    Malaysia
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Malaysia
    Description

    Directory of Standard Malaysian Glove (SMG) Certified Suppliers

  9. Data articles in journals

    • zenodo.org
    bin, csv, txt
    Updated Sep 21, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Carlota Balsa-Sanchez; Carlota Balsa-Sanchez; Vanesa Loureiro; Vanesa Loureiro (2023). Data articles in journals [Dataset]. http://doi.org/10.5281/zenodo.7458466
    Explore at:
    bin, txt, csvAvailable download formats
    Dataset updated
    Sep 21, 2023
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Carlota Balsa-Sanchez; Carlota Balsa-Sanchez; Vanesa Loureiro; Vanesa Loureiro
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Last Version: 4

    Authors: Carlota Balsa-Sánchez, Vanesa Loureiro

    Date of data collection: 2022/12/15

    General description: The publication of datasets according to the FAIR principles, could be reached publishing a data paper (or software paper) in data journals or in academic standard journals. The excel and CSV file contains a list of academic journals that publish data papers and software papers.
    File list:

    - data_articles_journal_list_v4.xlsx: full list of 140 academic journals in which data papers or/and software papers could be published
    - data_articles_journal_list_v4.csv: full list of 140 academic journals in which data papers or/and software papers could be published

    Relationship between files: both files have the same information. Two different formats are offered to improve reuse

    Type of version of the dataset: final processed version

    Versions of the files: 4th version
    - Information updated: number of journals, URL, document types associated to a specific journal, publishers normalization and simplification of document types
    - Information added : listed in the Directory of Open Access Journals (DOAJ), indexed in Web of Science (WOS) and quartile in Journal Citation Reports (JCR) and/or Scimago Journal and Country Rank (SJR), Scopus and Web of Science (WOS), Journal Master List.

    Version: 3

    Authors: Carlota Balsa-Sánchez, Vanesa Loureiro

    Date of data collection: 2022/10/28

    General description: The publication of datasets according to the FAIR principles, could be reached publishing a data paper (or software paper) in data journals or in academic standard journals. The excel and CSV file contains a list of academic journals that publish data papers and software papers.
    File list:

    - data_articles_journal_list_v3.xlsx: full list of 124 academic journals in which data papers or/and software papers could be published
    - data_articles_journal_list_3.csv: full list of 124 academic journals in which data papers or/and software papers could be published

    Relationship between files: both files have the same information. Two different formats are offered to improve reuse

    Type of version of the dataset: final processed version

    Versions of the files: 3rd version
    - Information updated: number of journals, URL, document types associated to a specific journal, publishers normalization and simplification of document types
    - Information added : listed in the Directory of Open Access Journals (DOAJ), indexed in Web of Science (WOS) and quartile in Journal Citation Reports (JCR) and/or Scimago Journal and Country Rank (SJR).

    Erratum - Data articles in journals Version 3:

    Botanical Studies -- ISSN 1999-3110 -- JCR (JIF) Q2
    Data -- ISSN 2306-5729 -- JCR (JIF) n/a
    Data in Brief -- ISSN 2352-3409 -- JCR (JIF) n/a

    Version: 2

    Author: Francisco Rubio, Universitat Politècnia de València.

    Date of data collection: 2020/06/23

    General description: The publication of datasets according to the FAIR principles, could be reached publishing a data paper (or software paper) in data journals or in academic standard journals. The excel and CSV file contains a list of academic journals that publish data papers and software papers.
    File list:

    - data_articles_journal_list_v2.xlsx: full list of 56 academic journals in which data papers or/and software papers could be published
    - data_articles_journal_list_v2.csv: full list of 56 academic journals in which data papers or/and software papers could be published

    Relationship between files: both files have the same information. Two different formats are offered to improve reuse

    Type of version of the dataset: final processed version

    Versions of the files: 2nd version
    - Information updated: number of journals, URL, document types associated to a specific journal, publishers normalization and simplification of document types
    - Information added : listed in the Directory of Open Access Journals (DOAJ), indexed in Web of Science (WOS) and quartile in Scimago Journal and Country Rank (SJR)

    Total size: 32 KB

    Version 1: Description

    This dataset contains a list of journals that publish data articles, code, software articles and database articles.

    The search strategy in DOAJ and Ulrichsweb was the search for the word data in the title of the journals.
    Acknowledgements:
    Xaquín Lores Torres for his invaluable help in preparing this dataset.

  10. d

    Directory of Publishers in the Czech Republic

    • data.gov.cz
    xml
    Updated May 29, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Národní knihovna České republiky (2025). Directory of Publishers in the Czech Republic [Dataset]. https://data.gov.cz/dataset?iri=https%3A%2F%2Fdata.gov.cz%2Fzdroj%2Fdatov%C3%A9-sady%2F00023221%2F1495242775
    Explore at:
    xmlAvailable download formats
    Dataset updated
    May 29, 2025
    Dataset authored and provided by
    Národní knihovna České republiky
    Area covered
    Czechia
    Description

    The dataset contains data on publishers participating in the ISBN (International Standard Book Numbering) system in the Czech Republic since 1989 and in the ISMN (International Standard Music Numbering) system in the Czech Republic since 1996. It also contains data on publishers who have not registered with either of the two aforementioned systems but these data are not updated. This dataset contains almost 20,000 records.

  11. Public School Characteristics - Current

    • catalog.data.gov
    • s.cnmilf.com
    • +2more
    Updated Oct 21, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    National Center for Education Statistics (NCES) (2024). Public School Characteristics - Current [Dataset]. https://catalog.data.gov/dataset/public-school-characteristics-current-340b1
    Explore at:
    Dataset updated
    Oct 21, 2024
    Dataset provided by
    National Center for Education Statisticshttps://nces.ed.gov/
    Description

    The National Center for Education Statistics' (NCES) Education Demographic and Geographic Estimate (EDGE) program develops annually updated point locations (latitude and longitude) for public elementary and secondary schools included in the NCES Common Core of Data (CCD). The CCD program annually collects administrative and fiscal data about all public schools, school districts, and state education agencies in the United States. The data are supplied by state education agency officials and include basic directory and contact information for schools and school districts, as well as characteristics about student demographics, number of teachers, school grade span, and various other administrative conditions. CCD school and agency point locations are derived from reported information about the physical location of schools and agency administrative offices. The point locations and administrative attributes in this data layer represent the most current CCD collection. For more information about NCES school point data, see: https://nces.ed.gov/programs/edge/Geographic/SchoolLocations. For more information about these CCD attributes, as well as additional attributes not included, see: https://nces.ed.gov/ccd/files.asp.Notes:-1 or MIndicates that the data are missing.-2 or NIndicates that the data are not applicable.-9Indicates that the data do not meet NCES data quality standards.Collections are available for the following years:2022-232021-222020-212019-202018-192017-18All information contained in this file is in the public domain. Data users are advised to review NCES program documentation and feature class metadata to understand the limitations and appropriate use of these data. Collections are available for the following years:

  12. Tajweed Dataset

    • kaggle.com
    Updated Apr 6, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ala'a Abdu Saleh Alawdi (2025). Tajweed Dataset [Dataset]. https://www.kaggle.com/datasets/alawdisoft/tajweed-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 6, 2025
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Ala'a Abdu Saleh Alawdi
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    The provided code processes a Tajweed dataset, which appears to be a collection of audio recordings categorized by different Tajweed rules (Ikhfa, Izhar, Idgham, Iqlab). Let's break down the dataset's structure and the code's functionality:

    Dataset Structure:

    • Organized by Tajweed Rule and Sheikh: The dataset is structured into directories for each Tajweed rule (e.g., 'Ikhfa', 'Izhar'). Within each rule's directory, there are subdirectories representing different reciters (sheikhs). This hierarchical organization is crucial for creating a structured metadata file and for training machine learning models.
    • Audio Files: The audio files (presumably WAV or other supported formats) are stored within the sheikh's subdirectories. The original filenames are not standardized.
    • Multiple Sheikhs per Rule: The dataset includes multiple recitations for each rule from different sheikhs, offering diversity in pronunciation.
    • Google Drive Storage: The dataset is located on Google Drive, which requires mounting the drive to access the data within a Colab environment.

    Code Functionality:

    1. Initialization and Imports: The code begins with necessary imports (pandas, pydub) and mounts Google Drive. Pydub is used for audio file format conversion.

    2. Directory Listing: It initially checks if a specified directory exists (for example, Alaa_alhsri/Ikhfa) and lists its files, demonstrating basic file system access.

    3. Metadata Creation: The core of the script is the generation of metadata, which provides essential information about each audio file. The tajweed_paths dictionary maps each Tajweed rule to a list of paths, associating each path with the reciter's name.

      • Iterating through Paths: The code iterates through each Tajweed rule and its corresponding paths.
      • File Listing: Inside each directory, it iterates through the audio files.
      • Metadata Dictionary: For each audio file, it creates a metadata dictionary that includes:
        • global_id: A unique identifier for each audio file.
        • original_filename: The original filename of the audio file.
        • new_filename: A standardized filename that incorporates the Tajweed rule (label), sheikh's ID, audio number, and a global ID.
        • label: The Tajweed rule.
        • sheikh_id: A numerical identifier for each sheikh.
        • sheikh_name: The name of the reciter.
        • audio_number: A sequential number for the audio files within a specific sheikh and Tajweed rule combination.
        • original_path: Full path to the original audio file.
        • new_path: Full path to the intended location for the renamed and potentially converted audio file.
      • Pandas DataFrame: The metadata is collected in a list of dictionaries and then converted into a Pandas DataFrame for easier viewing and processing. This DataFrame is highly informative.
    4. File Renaming and Conversion:

      • File Renaming: (commented out) The code is able to rename the audio files to the standardized format defined in new_filename and store it in the designated directory.
      • Audio Conversion to WAV: The script then converts any files in the specified directories to .wav format, creating standardized files in a new output_dataset directory. The new filenames are based on rules, sheikh and a counter.
    5. Metadata Export: Finally, the compiled metadata is saved as a CSV file (metadata.csv) in the output directory. This CSV file is crucial for training any machine learning model using this data.

  13. d

    Directory of Standard Malaysian Rubber (SMR) Producer - Dataset - MAMPU

    • archive.data.gov.my
    Updated Sep 27, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2018). Directory of Standard Malaysian Rubber (SMR) Producer - Dataset - MAMPU [Dataset]. https://archive.data.gov.my/data/dataset/directory-of-standard-malaysian-rubber-smr-producer
    Explore at:
    Dataset updated
    Sep 27, 2018
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Directory of Standard Malaysian Rubber (SMR) Producer 2019

  14. d

    NODC Standard Product: World Ocean Database 1998 version 1 (5 disc set)...

    • catalog.data.gov
    Updated Jul 1, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (Point of Contact) (2025). NODC Standard Product: World Ocean Database 1998 version 1 (5 disc set) (NCEI Accession 0095340) [Dataset]. https://catalog.data.gov/dataset/nodc-standard-product-world-ocean-database-1998-version-1-5-disc-set-ncei-accession-0095340
    Explore at:
    Dataset updated
    Jul 1, 2025
    Dataset provided by
    (Point of Contact)
    Description

    The World Ocean Database 1998 (WOD98) is comprised of five CD-ROMs containing profile and plankton/biomass data in compressed format. WOD98-01 through WOD98-04 contain observed level data, WOD98-05 contains all the standard level data. World Ocean Database 1998 (WOD98) expands on World Ocean Atlas 1994 (WOA94) by including the additional variables nitrite, pH, alkalinity, chlorophyll, and plankton, as well as all available metadata and meteorology. WOD98 is an International Year of the Ocean product. WOD98-01 Observed Level Data; North Atlantic 30° N-90° N; WOD98-02 Observed Level Data; North Atlantic 0°-30° N, South Atlantic; WOD98-03 Observed Level Data; North Pacific 20° N-90° N; WOD98-01 Observed Level Data; North Pacific 0°-20° N; South Pacific, Indian; WOD98-01 Standard Level Data for all Ocean Basins. Discs may be created by burning the appropriate .iso file(s) in the data/0-data/disc_image/ directory to blank CD-ROM media using standard CD-ROM authoring software. Software that was developed or provided with this NODC Standard Product may be included in the disc_image/ directory as part of a disc image, but executable software that was developed or provided with this NODC Standard Product has been excluded from the disc_contents/ directory.

  15. Open Data Portal Catalogue

    • open.canada.ca
    • datasets.ai
    • +1more
    csv, json, jsonl, png +2
    Updated Jun 14, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Treasury Board of Canada Secretariat (2025). Open Data Portal Catalogue [Dataset]. https://open.canada.ca/data/en/dataset/c4c5c7f1-bfa6-4ff6-b4a0-c164cb2060f7
    Explore at:
    csv, sqlite, json, png, jsonl, xlsxAvailable download formats
    Dataset updated
    Jun 14, 2025
    Dataset provided by
    Treasury Board of Canada Secretariathttp://www.tbs-sct.gc.ca/
    Treasury Board of Canadahttps://www.canada.ca/en/treasury-board-secretariat/corporate/about-treasury-board.html
    License

    Open Government Licence - Canada 2.0https://open.canada.ca/en/open-government-licence-canada
    License information was derived automatically

    Description

    The open data portal catalogue is a downloadable dataset containing some key metadata for the general datasets available on the Government of Canada's Open Data portal. Resource 1 is generated using the ckanapi tool (external link) Resources 2 - 8 are generated using the Flatterer (external link) utility. ###Description of resources: 1. Dataset is a JSON Lines (external link) file where the metadata of each Dataset/Open Information Record is one line of JSON. The file is compressed with GZip. The file is heavily nested and recommended for users familiar with working with nested JSON. 2. Catalogue is a XLSX workbook where the nested metadata of each Dataset/Open Information Record is flattened into worksheets for each type of metadata. 3. datasets metadata contains metadata at the dataset level. This is also referred to as the package in some CKAN documentation. This is the main table/worksheet in the SQLite database and XLSX output. 4. Resources Metadata contains the metadata for the resources contained within each dataset. 5. resource views metadata contains the metadata for the views applied to each resource, if a resource has a view configured. 6. datastore fields metadata contains the DataStore information for CSV datasets that have been loaded into the DataStore. This information is displayed in the Data Dictionary for DataStore enabled CSVs. 7. Data Package Fields contains a description of the fields available in each of the tables within the Catalogue, as well as the count of the number of records each table contains. 8. data package entity relation diagram Displays the title and format for column, in each table in the Data Package in the form of a ERD Diagram. The Data Package resource offers a text based version. 9. SQLite Database is a .db database, similar in structure to Catalogue. This can be queried with database or analytical software tools for doing analysis.

  16. Natural Object Dataset: A large-scale fMRI dataset for human visual...

    • openneuro.org
    Updated Jul 8, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Zhengxin Gong; Ming Zhou; Yuxuan Dai; Yushan Wen; Youyi Liu; Zonglei Zhen (2023). Natural Object Dataset: A large-scale fMRI dataset for human visual processing of naturalistic scenes [Dataset]. http://doi.org/10.18112/openneuro.ds004496.v2.1.1
    Explore at:
    Dataset updated
    Jul 8, 2023
    Dataset provided by
    OpenNeurohttps://openneuro.org/
    Authors
    Zhengxin Gong; Ming Zhou; Yuxuan Dai; Yushan Wen; Youyi Liu; Zonglei Zhen
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Summary

    One ultimate goal of visual neuroscience is to understand how the brain processes visual stimuli encountered in the natural environment. Achieving this goal requires records of brain responses under massive amounts of naturalistic stimuli. Although the scientific community has put in a lot of effort to collect large-scale functional magnetic resonance imaging (fMRI) data under naturalistic stimuli, more naturalistic fMRI datasets are still urgently needed. We present here the Natural Object Dataset (NOD), a large-scale fMRI dataset containing responses to 57,120 naturalistic images from 30 participants. NOD strives for a balance between sampling variation between individuals and sampling variation between stimuli. This enables NOD to be utilized not only for determining whether an observation is generalizable across many individuals, but also for testing whether a response pattern is generalized to a variety of naturalistic stimuli. We anticipate that the NOD together with existing naturalistic neuroimaging datasets will serve as a new impetus for our understanding of the visual processing of naturalistic stimuli.

    Data record

    The data were organized according to the Brain-Imaging-Data-Structure (BIDS) Specification version 1.7.0 and can be accessed from the OpenNeuro public repository (accession number: XXX). In short, raw data of each subject were stored in “sub-

    Stimulus images The stimulus images for different fMRI experiments are deposited in separate folders: “stimuli/imagenet”, “stimuli/coco”, “stimuli/prf”, and “stimuli/floc”. Each experiment folder contains corresponding stimulus images, and the auxiliary files can be found within the “info” subfolder.

    Raw MRI data Each participant folder consists of several session folders: anat, coco, imagenet, prf, floc. Each session folder in turn includes “anat”, “func”, or “fmap” folders for corresponding modality data. The scan information for each session is provided in a TSV file.

    Preprocessed volume data from fMRIprep The preprocessed volume-based fMRI data are in subject's native space, saved as “sub-

    Preprocessed surface-based data from ciftify The preprocessed surface-based data are in standard fsLR space, saved as “sub-

    Brain activation data from surface-based GLM analyses The brain activation data are derived from GLM analyses on the standard fsLR space, saved as “sub-

  17. n

    #2023 GeoOps Folder Structure (Folders, GDBs, and layer files) - Dataset -...

    • nationaldataplatform.org
    Updated Feb 28, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). #2023 GeoOps Folder Structure (Folders, GDBs, and layer files) - Dataset - CKAN [Dataset]. https://nationaldataplatform.org/catalog/dataset/2023-geoops-folder-structure-folders-gdbs-and-layer-files
    Explore at:
    Dataset updated
    Feb 28, 2024
    Description

    2023 Updates to the National Incident Feature Service and Event Geodatabase For 2023, there are no schema updates and no major changes to GeoOps or the GISS Workflow! This is a conscious choice and is intended to provide a needed break for both users and administrators. Over the last 5 years, nearly every aspect of the GISS position has seen a major overhaul and while the advancements have been overwhelmingly positive, many of us are experiencing change fatigue. This is not to say there is no room for improvement. Many great suggestions were received throughout the season and in the GISS Survey, and they will be considered for inclusion in 2024. That there are no critical updates necessary also indicates that we have reached a level of maturity with the current state, and that is good news for everyone. Please continue to submit your ideas; they are appreciated and valuable insight, even if the change is not implemented. For information on 2023 AGOL updates please see the Create and Share Web Maps | NWCG page. There are three smaller changes worth noting this year: Standard Symbology is now the default on the NIFS For most workflows, the update will be seamless. All the Event Standard symbols are now supported in Field Maps and Map Viewer. Most users will now see the same symbols in all print and digital products. However, in AGOL some web apps do not support the complex line symbols. The simplified lines will still be present in the official Editing Apps (Operations, SITL, and GISS), and any custom apps built with the Web App Builder (WAB) interface. Experience Builder can be used for any new app creation. If you must use WAB or another app that cannot display the complex line symbology in the NIFS, please contact wildfireresponse@firenet.gov for guidance. Event Line now has Preconfigured Labels Labels on Event Line have historically been uncommon, but to speed their implementation when necessary, color-coded labels classes have been added to the NIFS and the lyrx files provided in the GIS Folder Structure. They can be disabled or modified as needed, should they interfere with any of your workflows. “Restricted” Folder added to GeoOps Folder Structure At the base level within the 2023_Template, a ‘restricted’ folder is now included. This folder should be used for all data and products that contain sensitive, restricted, or controlled-unclassified information. This will aid the DOCL and any future FOIA liaisons in protecting this information. When using OneDrive, this folder can optionally be password protected. Reminder: Sensitive Data is not allowed to be hosted within the NIFC Org.

  18. C

    NODC Standard Product: World Ocean Database 1998 version 1 (5 disc set)...

    • data.cnra.ca.gov
    • data.amerigeoss.org
    html
    Updated May 9, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ocean Data Partners (2019). NODC Standard Product: World Ocean Database 1998 version 1 (5 disc set) (NODC Accession 0095340) [Dataset]. https://data.cnra.ca.gov/dataset/nodc-standard-product-world-ocean-database-1998-version-1-5-disc-set-nodc-accession-0095340
    Explore at:
    htmlAvailable download formats
    Dataset updated
    May 9, 2019
    Dataset authored and provided by
    Ocean Data Partners
    Description

    The World Ocean Database 1998 (WOD98) is comprised of five CD-ROMs containing profile and plankton/biomass data in compressed format. WOD98-01 through WOD98-04 contain observed level data, WOD98-05 contains all the standard level data.

    World Ocean Database 1998 (WOD98) expands on World Ocean Atlas 1994 (WOA94) by including the additional variables nitrite, pH, alkalinity, chlorophyll, and plankton, as well as all available metadata and meteorology. WOD98 is an International Year of the Ocean product.

    WOD98-01 Observed Level Data; North Atlantic 30° N-90° N; WOD98-02 Observed Level Data; North Atlantic 0°-30° N, South Atlantic; WOD98-03 Observed Level Data; North Pacific 20° N-90° N; WOD98-01 Observed Level Data; North Pacific 0°-20° N; South Pacific, Indian; WOD98-01 Standard Level Data for all Ocean Basins.

    Copies of the World Ocean Atlas 1998 (WOA98) version 1 CD-ROMs are no longer available from the NODC Online Store. Duplicate discs may be created by burning the appropriate .iso file(s) in the data/0-data/disc_image/ directory to blank CD-ROM media using standard CD-ROM authoring software. Software that was developed or provided with this NODC Standard Product may be included in the disc_image/ directory as part of a disc image, but executable software that was developed or provided with this NODC Standard Product has been excluded from the disc_contents/ directory.

  19. Data from: A large-scale fMRI dataset for human action recognition

    • openneuro.org
    Updated Jun 21, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ming Zhou; Zhengxin Gong; Yuxuan Dai; Yushan Wen; Youyi Liu; Zonglei Zhen (2023). A large-scale fMRI dataset for human action recognition [Dataset]. http://doi.org/10.18112/openneuro.ds004488.v1.1.1
    Explore at:
    Dataset updated
    Jun 21, 2023
    Dataset provided by
    OpenNeurohttps://openneuro.org/
    Authors
    Ming Zhou; Zhengxin Gong; Yuxuan Dai; Yushan Wen; Youyi Liu; Zonglei Zhen
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Summary

    Human action recognition is one of our critical living abilities, allowing us to interact easily with the environment and others in everyday life. Although the neural basis of action recognition has been widely studied using a few categories of actions from simple contexts as stimuli, how the human brain recognizes diverse human actions in real-world environments still need to be explored. Here, we present the Human Action Dataset (HAD), a large-scale functional magnetic resonance imaging (fMRI) dataset for human action recognition. HAD contains fMRI responses to 21,600 video clips from 30 participants. The video clips encompass 180 human action categories and offer a comprehensive coverage of complex activities in daily life. We demonstrate that the data are reliable within and across participants and, notably, capture rich representation information of the observed human actions. This extensive dataset, with its vast number of action categories and exemplars, has the potential to deepen our understanding of human action recognition in natural environments.

    Data record

    The data were organized according to the Brain-Imaging-Data-Structure (BIDS) Specification version 1.7.0 and can be accessed from the OpenNeuro public repository (accession number: ds004488). The raw data of each subject were stored in "sub-< ID>" directories. The preprocessed volume data and the derived surface-based data were stored in “derivatives/fmriprep” and “derivatives/ciftify” directories, respectively. The video clips stimuli were stored in “stimuli” directory.

    Video clips stimuli The video clips stimuli selected from HACS are deposited in the "stimuli" folder. Each of the 180 action categories holds a folder in which 120 unique video clips are stored.

    Raw data The data for each participant are distributed in three sub-folders, including the “anat” folder for the T1 MRI data, the “fmap” folder for the field map data, and the “func” folder for functional MRI data. The events file in “func” folder contains the onset, duration, trial type (category index) in specific scanning run.

    Preprocessed volume data from fMRIprep The preprocessed volume-based fMRI data are in subject's native space, saved as “sub-

    Preprocessed surface data from ciftify Under the “results” folder, the preprocessed surface-based data are saved in standard fsLR space, named as “sub-

  20. Z

    Data from: #PraCegoVer dataset

    • data.niaid.nih.gov
    Updated Jan 19, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sandra Avila (2023). #PraCegoVer dataset [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_5710561
    Explore at:
    Dataset updated
    Jan 19, 2023
    Dataset provided by
    Gabriel Oliveira dos Santos
    Sandra Avila
    Esther Luna Colombini
    Description

    Automatically describing images using natural sentences is an essential task to visually impaired people's inclusion on the Internet. Although there are many datasets in the literature, most of them contain only English captions, whereas datasets with captions described in other languages are scarce.

    PraCegoVer arose on the Internet, stimulating users from social media to publish images, tag #PraCegoVer and add a short description of their content. Inspired by this movement, we have proposed the #PraCegoVer, a multi-modal dataset with Portuguese captions based on posts from Instagram. It is the first large dataset for image captioning in Portuguese with freely annotated images.

    PraCegoVer has 533,523 pairs with images and captions described in Portuguese collected from more than 14 thousand different profiles. Also, the average caption length in #PraCegoVer is 39.3 words and the standard deviation is 29.7.

    Dataset Structure

    PraCegoVer dataset is composed of the main file dataset.json and a collection of compressed files named images.tar.gz.partX

    containing the images. The file dataset.json comprehends a list of json objects with the attributes:

    user: anonymized user that made the post;

    filename: image file name;

    raw_caption: raw caption;

    caption: clean caption;

    date: post date.

    Each instance in dataset.json is associated with exactly one image in the images directory whose filename is pointed by the attribute filename. Also, we provide a sample with five instances, so the users can download the sample to get an overview of the dataset before downloading it completely.

    Download Instructions

    If you just want to have an overview of the dataset structure, you can download sample.tar.gz. But, if you want to use the dataset, or any of its subsets (63k and 173k), you must download all the files and run the following commands to uncompress and join the files:

    cat images.tar.gz.part* > images.tar.gz tar -xzvf images.tar.gz

    Alternatively, you can download the entire dataset from the terminal using the python script download_dataset.py available in PraCegoVer repository. In this case, first, you have to download the script and create an access token here. Then, you can run the following command to download and uncompress the image files:

    python download_dataset.py --access_token=

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Gautier, Julian (2023). Dataset metadata of known Dataverse installations [Dataset]. http://doi.org/10.7910/DVN/DCDKZQ

Dataset metadata of known Dataverse installations

Explore at:
Dataset updated
Nov 22, 2023
Dataset provided by
Harvard Dataverse
Authors
Gautier, Julian
Description

This dataset contains the metadata of the datasets published in 77 Dataverse installations, information about each installation's metadata blocks, and the list of standard licenses that dataset depositors can apply to the datasets they publish in the 36 installations running more recent versions of the Dataverse software. The data is useful for reporting on the quality of dataset and file-level metadata within and across Dataverse installations. Curators and other researchers can use this dataset to explore how well Dataverse software and the repositories using the software help depositors describe data. How the metadata was downloaded The dataset metadata and metadata block JSON files were downloaded from each installation on October 2 and October 3, 2022 using a Python script kept in a GitHub repo at https://github.com/jggautier/dataverse-scripts/blob/main/other_scripts/get_dataset_metadata_of_all_installations.py. In order to get the metadata from installations that require an installation account API token to use certain Dataverse software APIs, I created a CSV file with two columns: one column named "hostname" listing each installation URL in which I was able to create an account and another named "apikey" listing my accounts' API tokens. The Python script expects and uses the API tokens in this CSV file to get metadata and other information from installations that require API tokens. How the files are organized ├── csv_files_with_metadata_from_most_known_dataverse_installations │ ├── author(citation).csv │ ├── basic.csv │ ├── contributor(citation).csv │ ├── ... │ └── topic_classification(citation).csv ├── dataverse_json_metadata_from_each_known_dataverse_installation │ ├── Abacus_2022.10.02_17.11.19.zip │ ├── dataset_pids_Abacus_2022.10.02_17.11.19.csv │ ├── Dataverse_JSON_metadata_2022.10.02_17.11.19 │ ├── hdl_11272.1_AB2_0AQZNT_v1.0.json │ ├── ... │ ├── metadatablocks_v5.6 │ ├── astrophysics_v5.6.json │ ├── biomedical_v5.6.json │ ├── citation_v5.6.json │ ├── ... │ ├── socialscience_v5.6.json │ ├── ACSS_Dataverse_2022.10.02_17.26.19.zip │ ├── ADA_Dataverse_2022.10.02_17.26.57.zip │ ├── Arca_Dados_2022.10.02_17.44.35.zip │ ├── ... │ └── World_Agroforestry_-_Research_Data_Repository_2022.10.02_22.59.36.zip └── dataset_pids_from_most_known_dataverse_installations.csv └── licenses_used_by_dataverse_installations.csv └── metadatablocks_from_most_known_dataverse_installations.csv This dataset contains two directories and three CSV files not in a directory. One directory, "csv_files_with_metadata_from_most_known_dataverse_installations", contains 18 CSV files that contain the values from common metadata fields of all 77 Dataverse installations. For example, author(citation)_2022.10.02-2022.10.03.csv contains the "Author" metadata for all published, non-deaccessioned, versions of all datasets in the 77 installations, where there's a row for each author name, affiliation, identifier type and identifier. The other directory, "dataverse_json_metadata_from_each_known_dataverse_installation", contains 77 zipped files, one for each of the 77 Dataverse installations whose dataset metadata I was able to download using Dataverse APIs. Each zip file contains a CSV file and two sub-directories: The CSV file contains the persistent IDs and URLs of each published dataset in the Dataverse installation as well as a column to indicate whether or not the Python script was able to download the Dataverse JSON metadata for each dataset. For Dataverse installations using Dataverse software versions whose Search APIs include each dataset's owning Dataverse collection name and alias, the CSV files also include which Dataverse collection (within the installation) that dataset was published in. One sub-directory contains a JSON file for each of the installation's published, non-deaccessioned dataset versions. The JSON files contain the metadata in the "Dataverse JSON" metadata schema. The other sub-directory contains information about the metadata models (the "metadata blocks" in JSON files) that the installation was using when the dataset metadata was downloaded. I saved them so that they can be used when extracting metadata from the Dataverse JSON files. The dataset_pids_from_most_known_dataverse_installations.csv file contains the dataset PIDs of all published datasets in the 77 Dataverse installations, with a column to indicate if the Python script was able to download the dataset's metadata. It's a union of all of the "dataset_pids_..." files in each of the 77 zip files. The licenses_used_by_dataverse_installations.csv file contains information about the licenses that a number of the installations let depositors choose when creating datasets. When I collected ... Visit https://dataone.org/datasets/sha256%3Ad27d528dae8cf01e3ea915f450426c38fd6320e8c11d3e901c43580f997a3146 for complete metadata about this dataset.

Search
Clear search
Close search
Google apps
Main menu