100+ datasets found
  1. NIDDK Central Repository - fj8i-77zk - Archive Repository

    • healthdata.gov
    application/rdfxml +5
    Updated Aug 18, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2023). NIDDK Central Repository - fj8i-77zk - Archive Repository [Dataset]. https://healthdata.gov/dataset/NIDDK-Central-Repository-fj8i-77zk-Archive-Reposit/7phz-ieud
    Explore at:
    application/rssxml, csv, tsv, json, xml, application/rdfxmlAvailable download formats
    Dataset updated
    Aug 18, 2023
    Description

    This dataset tracks the updates made on the dataset "NIDDK Central Repository" as a repository for previous versions of the data and metadata.

  2. Number of open source projects and versions worldwide 2023, by ecosystem

    • statista.com
    Updated Jul 1, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2025). Number of open source projects and versions worldwide 2023, by ecosystem [Dataset]. https://www.statista.com/statistics/1268650/worldwide-open-source-projects-versions-ecosystems/
    Explore at:
    Dataset updated
    Jul 1, 2025
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    2023
    Area covered
    Worldwide
    Description

    At the end of 2022, there were approximately *** million JavaScript open source projects in the Maven Central Repository and around ** million JavaScript project versions worldwide. While JavaScript is the largest ecosystem in the Maven Central Repository, Java, Python, and .NET also have thousands of available open source projects.

  3. D

    2026-06-26 - NIDDK Central Repository - CoreTrustSeal Requirements 2020-2022...

    • dataverse.nl
    pdf
    Updated Mar 26, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    NIDDK Central Repository; NIDDK Central Repository (2024). 2026-06-26 - NIDDK Central Repository - CoreTrustSeal Requirements 2020-2022 [Dataset]. http://doi.org/10.34894/NOYYSF
    Explore at:
    pdf(230725)Available download formats
    Dataset updated
    Mar 26, 2024
    Dataset provided by
    DataverseNL
    Authors
    NIDDK Central Repository; NIDDK Central Repository
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    CoreTrustSeal certification

  4. Central Park Follow Up - raes-ukcy - Archive Repository

    • healthdata.gov
    application/rdfxml +5
    Updated Jul 26, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2023). Central Park Follow Up - raes-ukcy - Archive Repository [Dataset]. https://healthdata.gov/dataset/Central-Park-Follow-Up-raes-ukcy-Archive-Repositor/iftq-java
    Explore at:
    xml, csv, application/rssxml, json, application/rdfxml, tsvAvailable download formats
    Dataset updated
    Jul 26, 2023
    Description

    This dataset tracks the updates made on the dataset "Central Park Follow Up" as a repository for previous versions of the data and metadata.

  5. Central Park - 89bt-rfpj - Archive Repository

    • healthdata.gov
    application/rdfxml +5
    Updated Jul 25, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2023). Central Park - 89bt-rfpj - Archive Repository [Dataset]. https://healthdata.gov/dataset/Central-Park-89bt-rfpj-Archive-Repository/jcv9-meqc
    Explore at:
    json, csv, tsv, application/rssxml, xml, application/rdfxmlAvailable download formats
    Dataset updated
    Jul 25, 2023
    Description

    This dataset tracks the updates made on the dataset "Central Park" as a repository for previous versions of the data and metadata.

  6. Z

    Qualisign: Software Metrics and GoF Design Patterns of the Maven Central...

    • data.niaid.nih.gov
    • zenodo.org
    Updated Sep 24, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Aichberger, Johann (2020). Qualisign: Software Metrics and GoF Design Patterns of the Maven Central Repository [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_3731871
    Explore at:
    Dataset updated
    Sep 24, 2020
    Dataset authored and provided by
    Aichberger, Johann
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset contains software metric and design pattern data for around 100,000 projects from the Maven Central repository. The data was collected and analyzed as part of my master's thesis "Mining Software Repositories for the Effects of Design Patterns on Software Quality" (https://www.overleaf.com/read/vnfhydqxmpvx, https://zenodo.org/record/4048275).

    The included qualisign.* files all contain the same data in different formats: - qualisign.sql: standard SQL format (exported using "pg_dump --inserts ..."), - qualisign.psql: PostgreSQL plain format (exported using "pg_dump -Fp ..."), - qualisign.csql: PostgreSQL custom format (exported using "pg_dump -Fc ...").

    create-tables.sql has to be executed before importing one of the qualisign.* files. Once qualisign.*sql has been imported, create-views.sql can be executed to preprocess the data, thereby creating materialized views that are more appropriate for data analysis purposes.

    Software metrics were calculated using CKJM extended: http://gromit.iiar.pwr.wroc.pl/p_inf/ckjm/

    Included software metrics are (21 total): - AMC: Average Method Complexity - CA: Afferent Coupling - CAM: Cohesion Among Methods - CBM: Coupling Between Methods - CBO: Coupling Between Objects - CC: Cyclomatic Complexity - CE: Efferent Coupling - DAM: Data Access Metric - DIT: Depth of Inheritance Tree - IC: Inheritance Coupling - LCOM: Lack of Cohesion of Methods (Chidamber and Kemerer) - LCOM3: Lack of Cohesion of Methods (Constantine and Graham) - LOC: Lines of Code - MFA: Measure of Functional Abstraction - MOA: Measure of Aggregation - NOC: Number of Children - NOM: Number of Methods - NOP: Number of Polymorphic Methods - NPM: Number of Public Methods - RFC: Response for Class - WMC: Weighted Methods per Class

    In the qualisign.* data, these metrics are only available on the class level. create-views.sql additionally provides averages of these metrics on the package and project levels.

    Design patterns were detected using SSA: https://users.encs.concordia.ca/~nikolaos/pattern_detection.html

    Included design patterns are (15 total): - Adapter - Bridge - Chain of Responsibility - Command - Composite - Decorator - Factory Method - Observer - Prototype - Proxy - Singleton - State - Strategy - Template Method - Visitor

    The code to generate the dataset is available at: https://github.com/jaichberg/qualisign

    The code to perform quality analysis on the dataset is available at: https://github.com/jaichberg/qualisign-analysis

  7. d

    Data from: Data sharing through an NIH central database repository: a...

    • datadryad.org
    • data.niaid.nih.gov
    • +1more
    zip
    Updated Sep 2, 2016
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Joseph S. Ross; Jessica D. Ritchie; Emily Finn; Nihar R. Desai; Richard L. Lehman; Harlan M. Krumholz; Cary P. Gross (2016). Data sharing through an NIH central database repository: a cross-sectional survey of BioLINCC users [Dataset]. http://doi.org/10.5061/dryad.j38b7
    Explore at:
    zipAvailable download formats
    Dataset updated
    Sep 2, 2016
    Dataset provided by
    Dryad
    Authors
    Joseph S. Ross; Jessica D. Ritchie; Emily Finn; Nihar R. Desai; Richard L. Lehman; Harlan M. Krumholz; Cary P. Gross
    Time period covered
    Aug 31, 2016
    Description

    Dryad BioLINCC Survey Data 16-09-01This is the deidentified data from the 2015 cross-sectional survey of investigators who requested and received access to clinical research data from BioLINCC between 2007 and 2014.READ ME Dryad BioLINCC Survey 16-09-01.txtData Dictionary BioLINCC Survey 16-09-01This file lists and describes the variables from the 2015 cross-sectional BioLINCC survey.

  8. d

    CCMMercury System -.

    • datadiscoverystudio.org
    Updated Mar 1, 2017
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2017). CCMMercury System -. [Dataset]. http://datadiscoverystudio.org/geoportal/rest/metadata/item/8a769563b5a0462eada9e85e2d19d094/html
    Explore at:
    Dataset updated
    Mar 1, 2017
    Description

    description: The CCMMercury System IS a correspondence tracking (or control) system which (l) provides a central repository for agency correspondence, (2) tracks and manages correspondence, and (3) tracks and manages correspondence letters.; abstract: The CCMMercury System IS a correspondence tracking (or control) system which (l) provides a central repository for agency correspondence, (2) tracks and manages correspondence, and (3) tracks and manages correspondence letters.

  9. f

    Views regarding the format of data and governance arrangements for a central...

    • plos.figshare.com
    xls
    Updated May 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Catrin Tudur Smith; Kerry Dwan; Douglas G. Altman; Mike Clarke; Richard Riley; Paula R. Williamson (2023). Views regarding the format of data and governance arrangements for a central repository of IPD. [Dataset]. http://doi.org/10.1371/journal.pone.0097886.t001
    Explore at:
    xlsAvailable download formats
    Dataset updated
    May 30, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Catrin Tudur Smith; Kerry Dwan; Douglas G. Altman; Mike Clarke; Richard Riley; Paula R. Williamson
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Responders could provide more than one reason so the numbers do not add to 30.13 responders recorded two formats.28 responders recorded two governance issues, 1 responder recorded three governance issues, 2 responders recorded four governance issues, 1 responder recorded five governance issues.

  10. w

    Joint Asset Recovery Database

    • data.wu.ac.at
    • data.europa.eu
    Updated Dec 12, 2013
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Home Office (2013). Joint Asset Recovery Database [Dataset]. https://data.wu.ac.at/schema/data_gov_uk/ZTRiMzJkMDQtN2Q0MS00M2E1LWFkMzAtZmFlZDUxY2E1MDQ1
    Explore at:
    Dataset updated
    Dec 12, 2013
    Dataset provided by
    Home Office
    Description

    A central repository of information relating to seizures of the Proceeds of Crime.

  11. Utilization of open source projects worldwide 2021, by ecosystem

    • statista.com
    Updated Jan 9, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2024). Utilization of open source projects worldwide 2021, by ecosystem [Dataset]. https://www.statista.com/statistics/1268859/worldwide-open-source-projects-utilization-share-ecosystems/
    Explore at:
    Dataset updated
    Jan 9, 2024
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    Jul 31, 2021
    Area covered
    Worldwide
    Description

    At the end of July 2021, there were roughly 1.9 million JavaScript open source projects in the Maven Central Repository and 21 million JavaScript project versions worldwide. While JavaScript was the largest ecosystem for open source projects at that time, it also had one of the lowest ecosystem project utilization, with only 2 percent. Whereas, Java had the highest ecosystem project utilization with 15 percent.

  12. p

    OverdoseFreePA Repository TAC, UPITT Pharmacy and PCCD

    • data.pa.gov
    application/rdfxml +5
    Updated Jul 12, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Pennsylvania Overdose Reduction Technical Assistance Center (TAC), University of Pittsburgh School of Pharmacy (2018). OverdoseFreePA Repository TAC, UPITT Pharmacy and PCCD [Dataset]. https://data.pa.gov/w/nyv8-wsd2/33ch-zxdi?cur=Tt3IRx1pmm-&from=root
    Explore at:
    tsv, xml, application/rdfxml, application/rssxml, csv, jsonAvailable download formats
    Dataset updated
    Jul 12, 2018
    Dataset authored and provided by
    Pennsylvania Overdose Reduction Technical Assistance Center (TAC), University of Pittsburgh School of Pharmacy
    License

    U.S. Government Workshttps://www.usa.gov/government-works
    License information was derived automatically

    Description

    OverdoseFreePA OverdoseFreePA is made possible by the Pennsylvania Commission on Crime and Delinquency, and is directed and managed by the Pennsylvania Overdose Reduction Technical Assistance Center (TAC), University of Pittsburgh School of Pharmacy. The website is a result of collaboration with county and state partners across the Commonwealth of Pennsylvania.

    Our partnerships include:

    Pennsylvania District Attorneys Association Pennsylvania Medical Society Pennsylvania Pharmacist Association Pennsylvania Psychiatric Society The Hospital and Healthsystem Association of Pennsylvania Pennsylvania Dental Association Drug Enforcement Administration 360 Strategy There are a growing number of Pennsylvania counties involved in ramping up overdose prevention, treatment, and recovery activities to address the opioid overdose epidemic. The counties involved are collaborating to develop resources that can be used by all Pennsylvanians to increase community awareness and knowledge of overdose and overdose prevention strategies as well as to support initiatives aimed at decreasing drug overdoses and deaths within the participating counties. As a centralized resource and technical assistance hub, OverdoseFreePA is a central repository for these efforts to facilitate increased treatment and prevention efforts in these communities.

    Pennsylvania Opioid Overdose Reduction Technical Assistance Center (TAC) Pennsylvania, and the nation at large, is in the midst of opioid overdose epidemic. The TAC’s vision is to lead Pennsylvania communities to zero overdoses.The TAC hopes to achieve this vision by providing concierge technical assistance in the form of data driven recommendations and customized strategic planning to counties working to eliminate overdoses. The TAC strives to lead the field in identifying and sharing strategies to eliminate overdose through the central repository of OverdoseFreePA.

    Based out of the Program Evaluation and Research Unit (PERU) at the University of Pittsburgh’s School of Pharmacy, the TAC assists counties and communities in assessing needs, building capacity to address the needs, developing and implementing data driven plans with high quality outcomes, and sustaining initiatives to eliminate overdoses, both fatal and non-fatal, throughout Pennsylvania.

    More information here -http://www.overdosefreepa.pitt.edu/who-we-are/

  13. Central Elementary - 9d7y-yhi8 - Archive Repository

    • healthdata.gov
    application/rdfxml +5
    Updated Jul 26, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2023). Central Elementary - 9d7y-yhi8 - Archive Repository [Dataset]. https://healthdata.gov/dataset/Central-Elementary-9d7y-yhi8-Archive-Repository/d84z-jwpn
    Explore at:
    csv, tsv, application/rssxml, xml, application/rdfxml, jsonAvailable download formats
    Dataset updated
    Jul 26, 2023
    Description

    This dataset tracks the updates made on the dataset "Central Elementary" as a repository for previous versions of the data and metadata.

  14. g

    Observations of bullseye snakehead (Channa marulius) in Florida | gimi9.com

    • gimi9.com
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Observations of bullseye snakehead (Channa marulius) in Florida | gimi9.com [Dataset]. https://gimi9.com/dataset/data-gov_observations-of-bullseye-snakehead-channa-marulius-in-florida/
    Explore at:
    Area covered
    Florida
    Description

    This dataset contains information on the Bullseye Snakehead fish found only in southeastern Florida. It is a subset of a larger database, the Nonindigenous Aquatic Species Database (NAS). This information resource is an established central repository for spatially referenced biogeographic accounts of introduced aquatic species. The NAS website provides scientific reports, online/real-time queries, spatial data sets, distribution maps, fact sheets, and general information.

  15. Unified ICM/Unified CCE Databases

    • catalog.data.gov
    • datasets.ai
    • +2more
    Updated Mar 8, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Social Security Administration (2025). Unified ICM/Unified CCE Databases [Dataset]. https://catalog.data.gov/dataset/unified-icm-unified-cce-databases
    Explore at:
    Dataset updated
    Mar 8, 2025
    Dataset provided by
    Social Security Administrationhttp://ssa.gov/
    Description

    Unified ICM/Unified CCE software uses information in the central database to determine how to route N8NN calls, including information about telephone system configuration and routingscripts. The local database also contains tables of real-time information that describe activity at the callcenters. Historical information is stored in the central database.

  16. A

    Nonindigenous Aquatic Species Database Asian Tiger Shrimp

    • data.amerigeoss.org
    Updated Jul 15, 2019
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    ioos (2019). Nonindigenous Aquatic Species Database Asian Tiger Shrimp [Dataset]. https://data.amerigeoss.org/de/dataset/nonindigenous-aquatic-species-database-asian-tiger-shrimp
    Explore at:
    Dataset updated
    Jul 15, 2019
    Dataset provided by
    ioos
    Description
    The Nonindigenous Aquatic Species Database (NAS) information resource is an established central repository for spatially referenced biogeographic accounts of introduced aquatic species. The NAS website provides scientific reports, online/real-time queries, spatial data sets, distribution maps, fact sheets, and general information.
  17. d

    Asset database for the Central West subregion on 29 April 2015

    • data.gov.au
    • cloud.csiss.gmu.edu
    • +2more
    Updated Nov 19, 2019
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bioregional Assessment Program (2019). Asset database for the Central West subregion on 29 April 2015 [Dataset]. https://data.gov.au/data/dataset/5c3f9a56-7a48-4c26-a617-a186c2de5bf7
    Explore at:
    Dataset updated
    Nov 19, 2019
    Dataset authored and provided by
    Bioregional Assessment Program
    Description

    Abstract

    The dataset was derived by the Bioregional Assessment Programme from multiple source datasets. The source datasets are identified in the Lineage field in this metadata statement. The processes undertaken to produce this derived dataset are described in the History field in this metadata statement.

    This database is an initial Asset database for the Central West subregion on 29 April 2015. This dataset contains the spatial and non-spatial (attribute) components of the Central West subregion Asset List as one .mdb files, which is readable as an MS Access database and a personal geodatabase. Under the BA program, a spatial assets database is developed for each defined bioregional assessment project. The spatial elements that underpin the identification of water dependent assets are identified in the first instance by regional NRM organisations (via the WAIT tool) and supplemented with additional elements from national and state/territory government datasets. All reports received associated with the WAIT process for Central West are included in the zip file as part of this dataset. Elements are initially included in the preliminary assets database if they are partly or wholly within the subregion's preliminary assessment extent (Materiality Test 1, M1). Elements are then grouped into assets which are evaluated by project teams to determine whether they meet the second Materiality Test (M2). Assets meeting both Materiality Tests comprise the water dependent asset list. Descriptions of the assets identified in the Central West subregion are found in the "AssetList" table of the database. In this version of the database only M1 has been assessed. Assets are the spatial features used by project teams to model scenarios under the BA program. Detailed attribution does not exist at the asset level. Asset attribution includes only the core set of BA-derived attributes reflecting the BA classification hierarchy, as described in Appendix A of "CEN_asset_database_doc_20150429.doc ", located in the zip file as part of this dataset. The "Element_to_Asset" table contains the relationships and identifies the elements that were grouped to create each asset. Detailed information describing the database structure and content can be found in the document "CEN_asset_database_doc_20150429.doc" located in the zip file. Some of the source data used in the compilation of this dataset is restricted.

    Dataset History

    This is initial asset database.

    The Bioregional Assessments methodology (Barrett et al., 2013) defines a water-dependent asset as a spatially distinct, geo-referenced entity contained within a bioregion with characteristics having a defined cultural indigenous, economic or environmental value, and that can be linked directly or indirectly to a dependency on water quantity and/or quality.

    Under the BA program, a spatial assets database is developed for each defined bioregional assessment project. The spatial elements that underpin the identification of water dependent assets are identified in the first instance by regional NRM organisations (via the WAIT tool) and supplemented with additional elements from national and state/territory government datasets. Elements are initially included in database if they are partly or wholly within the subregion's preliminary assessment extent (Materiality Test 1, M1). Elements are then grouped into assets which are evaluated by project teams to determine whether they meet materiality test 2 (M2) - assets considered to be water dependent.

    Elements may be represented by a single, discrete spatial unit (polygon, line or point), or a number of spatial units occurring at more than one location (multipart polygons/lines or multipoints). Spatial features representing elements are not clipped to the preliminary assessment extent - features that extend beyond the boundary of the assessment extent have been included in full. To assist with an assessment of the relative importance of elements, area statements have been included as an attribute of the spatial data. Detailed attribute tables contain descriptions of the geographic features at the element level. Tables are organised by data source and can be joined to the spatial data on the "ElementID" field

    Elements are grouped into Assets, which are the objects used by project teams to model scenarios under the BA program. Detailed attribution does not exist at the asset level. Asset attribution includes only the core set of BA-derived attributes reflecting the BA classification hierarchy.

    The "Element_to_asset" table contains the relationships and identifies the elements that were grouped to create each asset.

    Following delivery of the first pass asset list, project teams make a determination as to whether an asset (comprised of one or more elements) is water dependent, as assessed against the materiality tests detailed in the BA Methodology. These decisions are provided to ERIN by the project team leader and incorporated into the Assetlist table in the Asset database. The Asset database is then re-registered into the BA repository.

    The Asset database dataset (which is registered to the BA repository) contains separate spatial and non-spatial databases.

    Non-spatial (tabular data) is provided in an ESRI personal geodatabase (.mdb - doubling as a MS Access database) to store, query, and manage non-spatial data. This database can be accessed using either MS Access or ESRI GIS products. Non-spatial data has been provided in the Access database to simplify the querying process for BA project teams. Source datasets are highly variable and have different attributes, so separate tables are maintained in the Access database to enable the querying of thematic source layers.

    Spatial data is provided as an ESRI file geodatabase (.gdb), and can only be used in an ESRI GIS environment. Spatial data is represented as a series of spatial feature classes (point, line and polygon layers). Non-spatial attribution can be joined from the Access database using the AID and ElementID fields, which are common to both the spatial and non-spatial datasets. Spatial layers containing all the point, line and polygon - derived elements and assets have been created to simplify management of the Elementlist and Assetlist tables, which list all the elements and assets, regardless of the spatial data geometry type. i.e. the total number of features in the combined spatial layers (points, lines, polygons) for assets (and elements) is equal to the total number of non-spatial records of all the individual data sources.

    Dataset Citation

    Department of the Environment (2013) Asset database for the Central West subregion on 29 April 2015. Bioregional Assessment Derived Dataset. Viewed 08 February 2017, http://data.bioregionalassessments.gov.au/dataset/5c3f9a56-7a48-4c26-a617-a186c2de5bf7.

    Dataset Ancestors

  18. RecFIN Database

    • fisheries.noaa.gov
    Updated Apr 2, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Pacific States Marine Fisheries Commission (2019). RecFIN Database [Dataset]. https://www.fisheries.noaa.gov/inport/item/55990
    Explore at:
    Dataset updated
    Apr 2, 2019
    Dataset provided by
    Pacific States Marine Fisheries Commission
    Description

    The Recreational Fisheries Information Network (RecFIN) database is a centralized repository for marine recreational fisheries data from California, Oregon, and Washington data collection programs.

  19. Replication package for: Altered Histories in Version Control System...

    • zenodo.org
    bin, zip
    Updated Jun 2, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Anonymous; Anonymous (2025). Replication package for: Altered Histories in Version Control System Repositories: Evidence from the Trenches [Dataset]. http://doi.org/10.5281/zenodo.15558282
    Explore at:
    bin, zipAvailable download formats
    Dataset updated
    Jun 2, 2025
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Anonymous; Anonymous
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description
    # History Alterations - Replication Package


    This repository contains the complete replication package for the research article Altered Histories in Version Control System Repositories: Evidence from the Trenches. The package provides tools to detect, analyze, and categorize Git history alterations across software repositories, along with Jupyter notebooks to reproduce the analysis presented in the paper.

    ## πŸ“‹ Table of Contents


    ## πŸ” Overview

    This replication package enables researchers to reproduce the analysis of altered Git histories in software repositories archived by Software Heritage. The study investigates how and why Git histories are modified over time, providing insights into developer practices and repository maintenance patterns.

    Main Research Questions:

    - How prevalent are Git history alterations in open-source repositories?
    - What types of changes are most commonly made to Git histories?
    - What are the root causes of these alterations?
    - How do these practices vary across different types of repositories?

    ## πŸ“ Repository Structure

    </div> <div>β”œβ”€β”€ README.md # This file</div> <div>β”œβ”€β”€ data/ # Pre-computed datasets</div> <div>β”‚ β”œβ”€β”€ ...</div> <div>β”œβ”€β”€ altered-history/ # Main analysis tool</div> <div>β”‚ β”œβ”€β”€ src/ # Rust source code</div> <div>β”‚ β”œβ”€β”€ notebooks/ # Analysis notebooks</div> <div>β”‚ β”‚ β”œβ”€β”€ analysis.ipynb # Main analysis notebook</div> <div>β”‚ β”‚ β”œβ”€β”€ build_analysis_dataset.ipynb</div> <div>β”‚ β”‚ └── utils_analysis.py # Analysis utilities</div> <div>β”‚ └── README.md</div> <div>β”œβ”€β”€ git-historian/ # History checking tool</div> <div>β”‚ β”œβ”€β”€ src/ # Rust source code</div> <div>β”‚ └── README.md</div> <div>β”œβ”€β”€ modified-files/ # File modification analysis tool</div> <div>β”‚ β”œβ”€β”€ src/ # Rust source code</div> <div>β”‚ β”œβ”€β”€ notebooks/ # Additional analysis notebooks</div> <div>β”‚ β”‚ β”œβ”€β”€ license_analysis.ipynb</div> <div>β”‚ β”‚ β”œβ”€β”€ license_categorization.py</div> <div>β”‚ β”‚ β”œβ”€β”€ secret-analysis.ipynb</div> <div>β”‚ β”‚ └── swh_license_files.py</div> <div>β”‚ └── README.md</div> <div>

    ## πŸš€ Quick Start

    ### Prerequisites

    - Rust (latest stable version)
    - Python 3.8+ with Jupyter
    - PostgreSQL (for database operations)
    - Git (for repository analysis)

    ### Installation

    1. Clone the repository:
    bash</div> <div>git clone <repository-url></div> <div>cd altered-histories-tool-replication-pkg</div> <div>

    2. Unzip all directories

    3. Install Python dependencies:
    bash</div> <div>pip install pandas matplotlib seaborn jupyter plotly numpy</div> <div>

    4. Build the Rust tools (optional, for dataset generation):
    bash</div> <div>cd altered-history && cargo build --release && cd ..</div> <div>cd git-historian && cargo build --release && cd ..</div> <div>cd modified-files && cargo build --release && cd ..</div> <div>

    ## πŸ“Š Reproducing the Analysis

    ### Option 1: Using Pre-computed Data (Recommended)

    The data/ directory contains pre-computed datasets that allow you to reproduce all analyses without running the computationally intensive data collection process.

    1. Open the main analysis notebook:
    bash</div> <div>cd altered-history/notebooks</div> <div>jupyter notebook analysis.ipynb</div> <div>

    2. Run all cells to reproduce the complete analysis.

    3. Explore additional analyses:

    Modify notebooks at will to explore the dataframe.
    bash</div> <div># Build analysis dataset (shows data preparation)</div> <div>jupyter notebook build_analysis_dataset.ipynb</div> <div> </div> <div># License-related analysis</div> <div>cd ../../modified-files/notebooks</div> <div>jupyter notebook license_analysis.ipynb</div> <div> </div> <div># Security and secrets analysis</div> <div>jupyter notebook secret-analysis.ipynb</div> <div>

    ### Option 2: Regenerating the Dataset

    To reproduce the complete data collection and analysis pipeline:

    1. Download Software Heritage datasets (see individual tool READMEs)
    2. Configure database connections in each tool
    3. Run the analysis pipeline following the step-by-step instructions in each tool's README
    4. Process results using the provided notebooks

    Note: Complete dataset regeneration requires significant computational resources and time (potentially weeks for large datasets).

    ## πŸ“‹ Data

    The data/ directory contains several key datasets including:

    - res.pkl: Main analysis results containing categorized alterations
    - stars_without_dup.pkl: Repository popularity metrics (GitHub stars)
    - visit_type.pkl: Classification of repository visit patterns
    - altered_histories_2024_08_23.dump: PostgreSQL database dump for git-historian tool

    ## πŸ› οΈ Tools Description

    ### 1. altered-history

    Purpose: Detects and categorizes Git history alterations in Software Heritage archives.

    Key Features:

    - Three-step analysis pipeline (detection β†’ root cause β†’ categorization)
    - Parallel processing for large datasets
    - Comprehensive alteration taxonomy

    Usage: See altered-history/README.md for detailed instructions.

    ### 2. git-historian

    Purpose: Checks individual repositories against the database of known alterations.

    Key Features:

    - PostgreSQL integration
    - Git hook integration for automated checking
    - Caching system for performance

    Usage: See git-historian/README.md for detailed instructions.

    ### 3. modified-files

    Purpose: Analyzes file-level modifications and their patterns.

    Key Features:

    - File modification tracking
    - License and security analysis
    - Integration with Software Heritage graph

    Usage: See modified-files/README.md for detailed instructions.

    ## πŸ“‹ Requirements

    ### System Requirements

    - Memory: Minimum 16GB RAM (1.5TB+ recommended for full dataset processing)
    - Storage: 600GB+ free space for complete datasets
    - CPU: Multi-core processor recommended for parallel processing

    ## πŸ”„ Reproducibility Notes

    1. Deterministic Results: The analysis notebooks will produce identical results when run with the provided datasets.

    2. Versioning: All tools are pinned to specific versions to ensure reproducibility.

    3. Random Seeds: Where applicable, random seeds are fixed in the analysis code.

  20. A

    Archive of Geosample Data and Information from the University of Rhode...

    • data.amerigeoss.org
    • datadiscoverystudio.org
    • +1more
    html, jsp, ods
    Updated Jul 28, 2019
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    United States[old] (2019). Archive of Geosample Data and Information from the University of Rhode Island (URI) Graduate School of Oceanography (GSO), Marine Geological Samples Laboratory (MGSL) [Dataset]. https://data.amerigeoss.org/fi/dataset/a69f5588-c4c1-48e8-b213-f8fd0ad6b4a0
    Explore at:
    html, jsp, odsAvailable download formats
    Dataset updated
    Jul 28, 2019
    Dataset provided by
    United States[old]
    License

    U.S. Government Workshttps://www.usa.gov/government-works
    License information was derived automatically

    Area covered
    Rhode Island
    Description

    The Marine Geological Samples Laboratory (MGSL) of the Graduate School of Oceanography (GSO), University of Rhode Island is a partner in the Index to Marine and Lacustrine Geological Samples (IMLGS) database, contributing information to the IMLGS to help researchers discover geological samples curated in their facility. The partner repository also sends some related data, documents, and imagery to NCEI for long-term archive, but the originating institution is the definitive source of information related to their sample collection. The MGSL serves as the central repository for dredge rocks, deep-sea cores, grabs and land-based geological samples collected by the Marine Geology and Geophysics group at GSO/URI. The facility is located on the Narragansett Bay Campus of the University of Rhode Island in Narragansett, R.I. A large part of the funding for curatorial activities in the MGSL is obtained from the Ocean Science Division of the National Science Foundation. The MGSL maintains a large collection of marine geological samples

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
(2023). NIDDK Central Repository - fj8i-77zk - Archive Repository [Dataset]. https://healthdata.gov/dataset/NIDDK-Central-Repository-fj8i-77zk-Archive-Reposit/7phz-ieud
Organization logo

NIDDK Central Repository - fj8i-77zk - Archive Repository

Explore at:
application/rssxml, csv, tsv, json, xml, application/rdfxmlAvailable download formats
Dataset updated
Aug 18, 2023
Description

This dataset tracks the updates made on the dataset "NIDDK Central Repository" as a repository for previous versions of the data and metadata.

Search
Clear search
Close search
Google apps
Main menu