This dataset tracks the updates made on the dataset "NIDDK Central Repository" as a repository for previous versions of the data and metadata.
At the end of 2022, there were approximately *** million JavaScript open source projects in the Maven Central Repository and around ** million JavaScript project versions worldwide. While JavaScript is the largest ecosystem in the Maven Central Repository, Java, Python, and .NET also have thousands of available open source projects.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
CoreTrustSeal certification
This dataset tracks the updates made on the dataset "Central Park Follow Up" as a repository for previous versions of the data and metadata.
This dataset tracks the updates made on the dataset "Central Park" as a repository for previous versions of the data and metadata.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset contains software metric and design pattern data for around 100,000 projects from the Maven Central repository. The data was collected and analyzed as part of my master's thesis "Mining Software Repositories for the Effects of Design Patterns on Software Quality" (https://www.overleaf.com/read/vnfhydqxmpvx, https://zenodo.org/record/4048275).
The included qualisign.* files all contain the same data in different formats: - qualisign.sql: standard SQL format (exported using "pg_dump --inserts ..."), - qualisign.psql: PostgreSQL plain format (exported using "pg_dump -Fp ..."), - qualisign.csql: PostgreSQL custom format (exported using "pg_dump -Fc ...").
create-tables.sql has to be executed before importing one of the qualisign.* files. Once qualisign.*sql has been imported, create-views.sql can be executed to preprocess the data, thereby creating materialized views that are more appropriate for data analysis purposes.
Software metrics were calculated using CKJM extended: http://gromit.iiar.pwr.wroc.pl/p_inf/ckjm/
Included software metrics are (21 total): - AMC: Average Method Complexity - CA: Afferent Coupling - CAM: Cohesion Among Methods - CBM: Coupling Between Methods - CBO: Coupling Between Objects - CC: Cyclomatic Complexity - CE: Efferent Coupling - DAM: Data Access Metric - DIT: Depth of Inheritance Tree - IC: Inheritance Coupling - LCOM: Lack of Cohesion of Methods (Chidamber and Kemerer) - LCOM3: Lack of Cohesion of Methods (Constantine and Graham) - LOC: Lines of Code - MFA: Measure of Functional Abstraction - MOA: Measure of Aggregation - NOC: Number of Children - NOM: Number of Methods - NOP: Number of Polymorphic Methods - NPM: Number of Public Methods - RFC: Response for Class - WMC: Weighted Methods per Class
In the qualisign.* data, these metrics are only available on the class level. create-views.sql additionally provides averages of these metrics on the package and project levels.
Design patterns were detected using SSA: https://users.encs.concordia.ca/~nikolaos/pattern_detection.html
Included design patterns are (15 total): - Adapter - Bridge - Chain of Responsibility - Command - Composite - Decorator - Factory Method - Observer - Prototype - Proxy - Singleton - State - Strategy - Template Method - Visitor
The code to generate the dataset is available at: https://github.com/jaichberg/qualisign
The code to perform quality analysis on the dataset is available at: https://github.com/jaichberg/qualisign-analysis
Dryad BioLINCC Survey Data 16-09-01This is the deidentified data from the 2015 cross-sectional survey of investigators who requested and received access to clinical research data from BioLINCC between 2007 and 2014.READ ME Dryad BioLINCC Survey 16-09-01.txtData Dictionary BioLINCC Survey 16-09-01This file lists and describes the variables from the 2015 cross-sectional BioLINCC survey.
description: The CCMMercury System IS a correspondence tracking (or control) system which (l) provides a central repository for agency correspondence, (2) tracks and manages correspondence, and (3) tracks and manages correspondence letters.; abstract: The CCMMercury System IS a correspondence tracking (or control) system which (l) provides a central repository for agency correspondence, (2) tracks and manages correspondence, and (3) tracks and manages correspondence letters.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Responders could provide more than one reason so the numbers do not add to 30.13 responders recorded two formats.28 responders recorded two governance issues, 1 responder recorded three governance issues, 2 responders recorded four governance issues, 1 responder recorded five governance issues.
A central repository of information relating to seizures of the Proceeds of Crime.
At the end of July 2021, there were roughly 1.9 million JavaScript open source projects in the Maven Central Repository and 21 million JavaScript project versions worldwide. While JavaScript was the largest ecosystem for open source projects at that time, it also had one of the lowest ecosystem project utilization, with only 2 percent. Whereas, Java had the highest ecosystem project utilization with 15 percent.
U.S. Government Workshttps://www.usa.gov/government-works
License information was derived automatically
OverdoseFreePA OverdoseFreePA is made possible by the Pennsylvania Commission on Crime and Delinquency, and is directed and managed by the Pennsylvania Overdose Reduction Technical Assistance Center (TAC), University of Pittsburgh School of Pharmacy. The website is a result of collaboration with county and state partners across the Commonwealth of Pennsylvania.
Our partnerships include:
Pennsylvania District Attorneys Association Pennsylvania Medical Society Pennsylvania Pharmacist Association Pennsylvania Psychiatric Society The Hospital and Healthsystem Association of Pennsylvania Pennsylvania Dental Association Drug Enforcement Administration 360 Strategy There are a growing number of Pennsylvania counties involved in ramping up overdose prevention, treatment, and recovery activities to address the opioid overdose epidemic. The counties involved are collaborating to develop resources that can be used by all Pennsylvanians to increase community awareness and knowledge of overdose and overdose prevention strategies as well as to support initiatives aimed at decreasing drug overdoses and deaths within the participating counties. As a centralized resource and technical assistance hub, OverdoseFreePA is a central repository for these efforts to facilitate increased treatment and prevention efforts in these communities.
Pennsylvania Opioid Overdose Reduction Technical Assistance Center (TAC) Pennsylvania, and the nation at large, is in the midst of opioid overdose epidemic. The TACβs vision is to lead Pennsylvania communities to zero overdoses.The TAC hopes to achieve this vision by providing concierge technical assistance in the form of data driven recommendations and customized strategic planning to counties working to eliminate overdoses. The TAC strives to lead the field in identifying and sharing strategies to eliminate overdose through the central repository of OverdoseFreePA.
Based out of the Program Evaluation and Research Unit (PERU) at the University of Pittsburghβs School of Pharmacy, the TAC assists counties and communities in assessing needs, building capacity to address the needs, developing and implementing data driven plans with high quality outcomes, and sustaining initiatives to eliminate overdoses, both fatal and non-fatal, throughout Pennsylvania.
More information here -http://www.overdosefreepa.pitt.edu/who-we-are/
This dataset tracks the updates made on the dataset "Central Elementary" as a repository for previous versions of the data and metadata.
This dataset contains information on the Bullseye Snakehead fish found only in southeastern Florida. It is a subset of a larger database, the Nonindigenous Aquatic Species Database (NAS). This information resource is an established central repository for spatially referenced biogeographic accounts of introduced aquatic species. The NAS website provides scientific reports, online/real-time queries, spatial data sets, distribution maps, fact sheets, and general information.
Unified ICM/Unified CCE software uses information in the central database to determine how to route N8NN calls, including information about telephone system configuration and routingscripts. The local database also contains tables of real-time information that describe activity at the callcenters. Historical information is stored in the central database.
The dataset was derived by the Bioregional Assessment Programme from multiple source datasets. The source datasets are identified in the Lineage field in this metadata statement. The processes undertaken to produce this derived dataset are described in the History field in this metadata statement.
This database is an initial Asset database for the Central West subregion on 29 April 2015. This dataset contains the spatial and non-spatial (attribute) components of the Central West subregion Asset List as one .mdb files, which is readable as an MS Access database and a personal geodatabase. Under the BA program, a spatial assets database is developed for each defined bioregional assessment project. The spatial elements that underpin the identification of water dependent assets are identified in the first instance by regional NRM organisations (via the WAIT tool) and supplemented with additional elements from national and state/territory government datasets. All reports received associated with the WAIT process for Central West are included in the zip file as part of this dataset. Elements are initially included in the preliminary assets database if they are partly or wholly within the subregion's preliminary assessment extent (Materiality Test 1, M1). Elements are then grouped into assets which are evaluated by project teams to determine whether they meet the second Materiality Test (M2). Assets meeting both Materiality Tests comprise the water dependent asset list. Descriptions of the assets identified in the Central West subregion are found in the "AssetList" table of the database. In this version of the database only M1 has been assessed. Assets are the spatial features used by project teams to model scenarios under the BA program. Detailed attribution does not exist at the asset level. Asset attribution includes only the core set of BA-derived attributes reflecting the BA classification hierarchy, as described in Appendix A of "CEN_asset_database_doc_20150429.doc ", located in the zip file as part of this dataset. The "Element_to_Asset" table contains the relationships and identifies the elements that were grouped to create each asset. Detailed information describing the database structure and content can be found in the document "CEN_asset_database_doc_20150429.doc" located in the zip file. Some of the source data used in the compilation of this dataset is restricted.
This is initial asset database.
The Bioregional Assessments methodology (Barrett et al., 2013) defines a water-dependent asset as a spatially distinct, geo-referenced entity contained within a bioregion with characteristics having a defined cultural indigenous, economic or environmental value, and that can be linked directly or indirectly to a dependency on water quantity and/or quality.
Under the BA program, a spatial assets database is developed for each defined bioregional assessment project. The spatial elements that underpin the identification of water dependent assets are identified in the first instance by regional NRM organisations (via the WAIT tool) and supplemented with additional elements from national and state/territory government datasets. Elements are initially included in database if they are partly or wholly within the subregion's preliminary assessment extent (Materiality Test 1, M1). Elements are then grouped into assets which are evaluated by project teams to determine whether they meet materiality test 2 (M2) - assets considered to be water dependent.
Elements may be represented by a single, discrete spatial unit (polygon, line or point), or a number of spatial units occurring at more than one location (multipart polygons/lines or multipoints). Spatial features representing elements are not clipped to the preliminary assessment extent - features that extend beyond the boundary of the assessment extent have been included in full. To assist with an assessment of the relative importance of elements, area statements have been included as an attribute of the spatial data. Detailed attribute tables contain descriptions of the geographic features at the element level. Tables are organised by data source and can be joined to the spatial data on the "ElementID" field
Elements are grouped into Assets, which are the objects used by project teams to model scenarios under the BA program. Detailed attribution does not exist at the asset level. Asset attribution includes only the core set of BA-derived attributes reflecting the BA classification hierarchy.
The "Element_to_asset" table contains the relationships and identifies the elements that were grouped to create each asset.
Following delivery of the first pass asset list, project teams make a determination as to whether an asset (comprised of one or more elements) is water dependent, as assessed against the materiality tests detailed in the BA Methodology. These decisions are provided to ERIN by the project team leader and incorporated into the Assetlist table in the Asset database. The Asset database is then re-registered into the BA repository.
The Asset database dataset (which is registered to the BA repository) contains separate spatial and non-spatial databases.
Non-spatial (tabular data) is provided in an ESRI personal geodatabase (.mdb - doubling as a MS Access database) to store, query, and manage non-spatial data. This database can be accessed using either MS Access or ESRI GIS products. Non-spatial data has been provided in the Access database to simplify the querying process for BA project teams. Source datasets are highly variable and have different attributes, so separate tables are maintained in the Access database to enable the querying of thematic source layers.
Spatial data is provided as an ESRI file geodatabase (.gdb), and can only be used in an ESRI GIS environment. Spatial data is represented as a series of spatial feature classes (point, line and polygon layers). Non-spatial attribution can be joined from the Access database using the AID and ElementID fields, which are common to both the spatial and non-spatial datasets. Spatial layers containing all the point, line and polygon - derived elements and assets have been created to simplify management of the Elementlist and Assetlist tables, which list all the elements and assets, regardless of the spatial data geometry type. i.e. the total number of features in the combined spatial layers (points, lines, polygons) for assets (and elements) is equal to the total number of non-spatial records of all the individual data sources.
Department of the Environment (2013) Asset database for the Central West subregion on 29 April 2015. Bioregional Assessment Derived Dataset. Viewed 08 February 2017, http://data.bioregionalassessments.gov.au/dataset/5c3f9a56-7a48-4c26-a617-a186c2de5bf7.
Derived From Macquarie Marshes Vegetation 1991-2008 VIS_ID 3920
Derived From NSW Office of Water GW licence extract linked to spatial locations NIC v2 (28 February 2014)
Derived From NSW Office of Water Surface Water Entitlements Locations v1_Oct2013
Derived From Travelling Stock Route Conservation Values
Derived From NSW Wetlands
Derived From Communities of National Environmental Significance Database - RESTRICTED - Metadata only
Derived From National Groundwater Dependent Ecosystems (GDE) Atlas
Derived From Birds Australia - Important Bird Areas (IBA) 2009
Derived From Environmental Asset Database - Commonwealth Environmental Water Office
Derived From NSW Office of Water Surface Water Offtakes - NIC v1 20131024
Derived From National Groundwater Dependent Ecosystems (GDE) Atlas (including WA)
Derived From Ramsar Wetlands of Australia
Derived From Native Vegetation Management (NVM) - Manage Benefits
Derived From Key Environmental Assets - KEA - of the Murray Darling Basin
Derived From National Heritage List Spatial Database (NHL) (v2.1)
Derived From Climate Change Corridors (Dry Habitat) for North East NSW
Derived From Great Artesian Basin and Laura Basin groundwater recharge areas
Derived From NSW Office of Water combined geodatabase of regulated rivers and water sharing plan regions
Derived From [New South Wales NSW Regional CMA Water Asset
The Recreational Fisheries Information Network (RecFIN) database is a centralized repository for marine recreational fisheries data from California, Oregon, and Washington data collection programs.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
</div>
<div>βββ README.md # This file</div>
<div>βββ data/ # Pre-computed datasets</div>
<div>β βββ ...</div>
<div>βββ altered-history/ # Main analysis tool</div>
<div>β βββ src/ # Rust source code</div>
<div>β βββ notebooks/ # Analysis notebooks</div>
<div>β β βββ analysis.ipynb # Main analysis notebook</div>
<div>β β βββ build_analysis_dataset.ipynb</div>
<div>β β βββ utils_analysis.py # Analysis utilities</div>
<div>β βββ README.md</div>
<div>βββ git-historian/ # History checking tool</div>
<div>β βββ src/ # Rust source code</div>
<div>β βββ README.md</div>
<div>βββ modified-files/ # File modification analysis tool</div>
<div>β βββ src/ # Rust source code</div>
<div>β βββ notebooks/ # Additional analysis notebooks</div>
<div>β β βββ license_analysis.ipynb</div>
<div>β β βββ license_categorization.py</div>
<div>β β βββ secret-analysis.ipynb</div>
<div>β β βββ swh_license_files.py</div>
<div>β βββ README.md</div>
<div>
bash</div>
<div>git clone <repository-url></div>
<div>cd altered-histories-tool-replication-pkg</div>
<div>
bash</div>
<div>pip install pandas matplotlib seaborn jupyter plotly numpy</div>
<div>
bash</div>
<div>cd altered-history && cargo build --release && cd ..</div>
<div>cd git-historian && cargo build --release && cd ..</div>
<div>cd modified-files && cargo build --release && cd ..</div>
<div>
data/
directory contains pre-computed datasets that allow you to reproduce all analyses without running the computationally intensive data collection process.bash</div>
<div>cd altered-history/notebooks</div>
<div>jupyter notebook analysis.ipynb</div>
<div>
bash</div>
<div># Build analysis dataset (shows data preparation)</div>
<div>jupyter notebook build_analysis_dataset.ipynb</div>
<div> </div>
<div># License-related analysis</div>
<div>cd ../../modified-files/notebooks</div>
<div>jupyter notebook license_analysis.ipynb</div>
<div> </div>
<div># Security and secrets analysis</div>
<div>jupyter notebook secret-analysis.ipynb</div>
<div>
data/
directory contains several key datasets including:res.pkl
: Main analysis results containing categorized alterationsstars_without_dup.pkl
: Repository popularity metrics (GitHub stars)visit_type.pkl
: Classification of repository visit patternsaltered_histories_2024_08_23.dump
: PostgreSQL database dump for git-historian toolaltered-history/README.md
for detailed instructions.git-historian/README.md
for detailed instructions.modified-files/README.md
for detailed instructions.U.S. Government Workshttps://www.usa.gov/government-works
License information was derived automatically
The Marine Geological Samples Laboratory (MGSL) of the Graduate School of Oceanography (GSO), University of Rhode Island is a partner in the Index to Marine and Lacustrine Geological Samples (IMLGS) database, contributing information to the IMLGS to help researchers discover geological samples curated in their facility. The partner repository also sends some related data, documents, and imagery to NCEI for long-term archive, but the originating institution is the definitive source of information related to their sample collection. The MGSL serves as the central repository for dredge rocks, deep-sea cores, grabs and land-based geological samples collected by the Marine Geology and Geophysics group at GSO/URI. The facility is located on the Narragansett Bay Campus of the University of Rhode Island in Narragansett, R.I. A large part of the funding for curatorial activities in the MGSL is obtained from the Ocean Science Division of the National Science Foundation. The MGSL maintains a large collection of marine geological samples
This dataset tracks the updates made on the dataset "NIDDK Central Repository" as a repository for previous versions of the data and metadata.