Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This fileset provides supporting data and corpora for the empirical study described in:Rafael S. Gonçalves and Mark A. Musen. The variable quality of metadata about biological samples used in biomedical experiments. Scientific Data, in press (2019).Description of filesAnalysis spreadsheet files:- ncbi-biosample-metadata-study.xlsx contains data to support the analysis of the quality of metadata in the NCBI BioSample.- ebi-biosamples-metadata-study.xlsx contains data to support the analysis of the quality of metadata in the EBI BioSamples.Validation data files:- ncbi-biosample-validation-data.tar.gz is an archive containing the validation data for the analysis of the entire NCBI BioSample dataset.- ncbi-biosample-packaged-validation-data.tar.gz is an archive containing the validation data for the analysis of the subset of metadata records in the NCBI BioSample that use a BioSample package definition.- ebi-ncbi-shared-records-validation-data.tar.gz is an archive containing the validation data for the analysis of the set of metadata records that exist both in EBI BioSamples and NCBI BioSample.Corpus files:- ebi-biosamples-corpus.xml.gz corresponds to the EBI BioSamples corpus.- ncbi-biosample-corpus.xml.gz corresponds to the NCBI BioSample corpus.- ncbi-biosample-packaged-records-corpus.tar.gz corresponds to the NCBI BioSample metadata records that declare a package definition.- ebi-ncbi-shared-records-corpus.tar.gz corresponds to the corpus of metadata records that exist both in NCBI BioSample and EBI BioSamples.
Since 2021, local authorities and their establishments must fill in a Social Database (BDS), which gathers a variety of data (see below the link to the Order of 10 December 2021) relating to their human resources (employment, recruitment, career paths, training, remuneration, health and safety at work, work organisation and improvement of working conditions and quality of life, social action and social protection, social dialogue, discipline, etc.). These data then enable the Department to draw up its Single Social Report, which itself serves as a basis for drawing up the Management Guidelines which enable public employers to formalise or update their multiannual human resources management strategy. The summary of the social database proposed here is the automatic summary carried out in November 2023 via the application www.bs.donnees-sociales of the Management Centres by extracting the data as at 31 December 2022 transmitted by the local authority to the Departmental Management Centre of Seine-Maritime (see metadata). Together with the summary of the single social report produced by the Seine-Maritime Department and the csv files from the database, also published on site, it constitutes the Single Social Report. Metadata Link to metadata Additional resources * Légifrance website: https://www.legifrance.gouv.fr/loda/id/JORFTEXT000044930851/ The website of the Public Service for the Dissemination of Law offers for download the consolidated version of the Order of 10 December 2021, which frames the indicators contained in the social database (BDS, see definition in metadata). * Examples of unique social relationships: https://www.paris.fr/pages/le-rapport-social-unique-22259 and https://donnees-sociales.fr/dossier-de-presse-2-2/ For comparison, it is possible to find various RSU (see definition in metadata) online, downloadable in .pdf format. The attached links refer to the RSUs of the city of Paris and to the annual national summary available on the site données-sociales.fr of the management centers (see definition in the metadata). * INSEE website: https://www.insee.fr/information/2407785 The website of the National Institute of Statistics and Economic Studies (INSEE) offers a page dedicated to the All-Employees Base (BTS), based on the Registered Social Declarations (DSN), which must be fed by any company employing employees, in accordance with the Social Security Code and the General Tax Code. Since 2009, its scope has been extended to the three public functions (State, territorial and hospital), and to the employees of private employers, thus covering all employees in the French economic sector. This database provides extensive employment data (number of FTE jobs, gross and net pay, job qualification, type of employment contract, hours worked by gender and qualification, etc.), some of which are available free of charge.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The Peatland Decomposition Database (PDD) stores data from published litterbag experiments related to peatlands. Currently, the database focuses on northern peatlands and Sphagnum litter and peat, but it also contains data from some vascular plant litterbag experiments. Currently, the database contains entries from 34 studies, 2,160 litterbag experiments, and 7,297 individual samples with 117,841 measurements for various attributes (e.g. relative mass remaining, N content, holocellulose content, mesh size). The aim is to provide a harmonized data source that can be useful to re-analyse existing data and to plan future litterbag experiments.
The Peatland Productivity and Decomposition Parameter Database (PPDPD) (Bona et al. 2018) is similar to the Peatland Decomposition Database (PDD) in that both contain data from peatland litterbag experiments. The differences are that both databases partly contain different data, that PPDPD additionally contains information on vegetation productivity, which PDD does not, and that PDD provides more information and metadata on litterbag experiments, and also measurement errors.
Compared to version 1.0.0, this version has a new structure for table experimental_design_format
, contains additional metadata on the experimental design (these were omitted in version 1.0.0), and contains the scripts that were used to import the data into the database.
Data for the database was collected from published litterbag studies, by extracting published data from figures, tables, or other data sources, and by contacting the authors of the studies to obtain raw data. All data processing was done with R (R version 4.2.0 (2022-04-22)) (R Core Team 2022).
Studies were identified via a Scopus search with search string (TITLE-ABS-KEY ( peat* AND ( "litter bag" OR "decomposition rate" OR "decay rate" OR "mass loss")) AND NOT ("tropic*"))
(2022-12-17). These studies were further screened to exclude those which do not contain litterbag data or which recycle data from other studies that have already been considered. Additional studies with litterbag experiments in northern peatlands we were aware of, but which were not identified in the literature search were added to the list of publications. For studies not older than 10 years, authors were contacted to obtain raw data, however this was successful only in few cases. To date, the database focuses on Sphagnum litterbag experiments and not from all studies that were identified by the literature search data have been included yet in the database.
Data from figures were extracted using the package ‘metaDigitise’ (1.0.1) (Pick, Nakagawa, and Noble 2018). Data from tables were extracted manually.
Data from the following studies are currently included: Farrish and Grigal (1985), Bartsch and Moore (1985), Farrish and Grigal (1988), Vitt (1990), Hogg, Lieffers, and Wein (1992), Sanger, Billett, and Cresser (1994), Hiroki and Watanabe (1996), Szumigalski and Bayley (1996), Prevost, Belleau, and Plamondon (1997), Arp, Cooper, and Stednick (1999), Robbert A. Scheffer and Aerts (2000), R. A. Scheffer, Van Logtestijn, and Verhoeven (2001), Limpens and Berendse (2003), Waddington, Rochefort, and Campeau (2003), Asada, Warner, and Banner (2004), Thormann, Bayley, and Currah (2001), Trinder, Johnson, and Artz (2008), Breeuwer et al. (2008), Trinder, Johnson, and Artz (2009), Bragazza and Iacumin (2009), Hoorens, Stroetenga, and Aerts (2010), Straková et al. (2010), Straková et al. (2012), Orwin and Ostle (2012), Lieffers (1988), Manninen et al. (2016), Johnson and Damman (1991), Bengtsson, Rydin, and Hájek (2018a), Bengtsson, Rydin, and Hájek (2018b), Asada and Warner (2005), Bengtsson, Granath, and Rydin (2017), Bengtsson, Granath, and Rydin (2016), Hagemann and Moroni (2015), Hagemann and Moroni (2016), B. Piatkowski et al. (2021), B. T. Piatkowski et al. (2021), Mäkilä et al. (2018), Golovatskaya and Nikonova (2017), Golovatskaya and Nikonova (2017).
The database is a ‘MariaDB’ database and the database schema was designed to store data and metadata following the Ecological Metadata Language (EML) (Jones et al. 2019). Descriptions of the tables are shown in Tab. 1.
The database contains general metadata relevant for litterbag experiments (e.g., geographical, temporal, and taxonomic coverage, mesh sizes, experimental design). However, it does not contain a detailed description of sample handling, sample preprocessing methods, site descriptions, because there currently are no discipline-specific metadata and reporting standards.
Table 1: Description of the individual tables in the database.Name | Description |
---|---|
attributes | Defines the attributes of the database and the values in column attribute_name in table data . |
citations | Stores bibtex entries for references and data sources. |
citations_to_datasets | Links entries in table citations with entries in table datasets . |
custom_units | Stores custom units. |
data | Stores measured values for samples, for example remaining masses. |
datasets | Lists the individual datasets. |
experimental_design_format | Stores information on the experimental design of litterbag experiments. |
measurement_scales, measurement_scales_date_time, measurement_scales_interval, measurement_scales_nominal, measurement_scales_ordinal, measurement_scales_ratio | Defines data value types. |
missing_value_codes | Defines how missing values are encoded. |
samples | Stores information on individual samples. |
samples_to_samples | Links samples to other samples, for example litter samples collected in the field to litter samples collected during the incubation of the litterbags. |
units, unit_types | Stores information on measurement units. |
attribute_name
in table data
.
Name | Definition | Example value | Unit | Measurement scale | Number type | Minimum value | Maximum value | String format |
---|---|---|---|---|---|---|---|---|
4_hydroxyacetophenone_mass_absolute | A numeric value representing the content of 4-hydroxyacetophenone, as described in Straková et al. (2010). | 0.26 | g | ratio | real | 0 | Inf | NA |
4_hydroxyacetophenone_mass_relative_mass | A numeric value representing the content of 4-hydroxyacetophenone, as described in Straková et al. (2010). | 0.26 | g/g | ratio | real | 0 | 1 | NA |
4_hydroxybenzaldehyde_mass_absolute | A numeric value representing the content of 4-hydroxybenzaldehyde, as described in Straková et al. (2010). | 0.26 | g | ratio | real | 0 | Inf | NA |
4_hydroxybenzaldehyde_mass_relative_mass | A numeric value representing the content of 4-hydroxybenzaldehyde, as described in Straková et al. (2010). | 0.26 | g/g | ratio | real | 0 | 1 | NA |
4_hydroxybenzoic_acid_mass_absolute | A numeric value representing the content of 4-hydroxybenzoic acid, as described in Straková et al. (2010). | 0.26 | g | ratio | real | 0 | Inf | NA |
4_hydroxybenzoic_acid_mass_relative_mass | A numeric value representing the content of 4-hydroxybenzoic acid, as described in Straková et al. (2010). | 0.26 | g/g | ratio | real | 0 | 1 | NA |
abbreviation | In table custom_units : A string representing an abbreviation for the custom unit. | gC | NA | nominal | NA | NA | NA | NA |
acetone_extractives_mass_absolute | A numeric value representing the content of acetone extractives, as described in Straková et al. (2010). | 0.26 | g | ratio | real | 0 | Inf | NA |
acetone_extractives_mass_relative_mass | A numeric value representing the content of acetone extractives, as described in Straková et al. (2010). | 0.26 | g/g | ratio | real | 0 | 1 | NA |
acetosyringone_mass_absolute | A numeric value representing the content of acetosyringone, as described in Straková et al. (2010). | 0.26 | g | ratio | real | 0 | Inf | NA |
acetosyringone_mass_relative_mass | A numeric value representing the content of acetosyringone, as described in Straková et al. (2010). | 0.26 | g/g | ratio | real | 0 | 1 | NA |
acetovanillone_mass_absolute | A numeric value representing the content of acetovanillone, as described in Straková et al. |
https://www.etalab.gouv.fr/licence-ouverte-open-licencehttps://www.etalab.gouv.fr/licence-ouverte-open-licence
Since 2021, local authorities and their establishments must fill in a Social Database (BDS), which gathers a variety of data (see below the link to the Order of 10 December 2021) relating to their human resources (employment, recruitment, career paths, training, remuneration, health and safety at work, work organisation and improvement of working conditions and quality of life, social action and social protection, social dialogue, discipline, etc.).
These data then enable the Department to draw up its Single Social Report, which itself serves as a basis for drawing up the Management Guidelines which enable public employers to formalise or update their multiannual human resources management strategy.
The summary of the social database proposed here is the automatic summary carried out in November 2023 via the application www.bs.donnees-sociales of the Management Centres by extracting the data as at 31 December 2022 transmitted by the local authority to the Departmental Management Centre of Seine-Maritime (see metadata).
Together with the summary of the single social report produced by the Seine-Maritime Department and the csv files from the database, also published on site, it constitutes the Single Social Report.
Metadata
Additional resources
The website of the Public Service for the Dissemination of Law offers for download the consolidated version of the Order of 10 December 2021, which frames the indicators contained in the social database (BDS, see definition in metadata).
For comparison, it is possible to find various RSU (see definition in metadata) online, downloadable in .pdf format. The attached links refer to the RSUs of the city of Paris and to the annual national summary available on the site données-sociales.fr of the management centers (see definition in the metadata).
The website of the National Institute of Statistics and Economic Studies (INSEE) offers a page dedicated to the All-Employees Base (BTS), based on the Registered Social Declarations (DSN), which must be fed by any company employing employees, in accordance with the Social Security Code and the General Tax Code.
Since 2009, its scope has been extended to the three public functions (State, territorial and hospital), and to the employees of private employers, thus covering all employees in the French economic sector.
This database provides extensive employment data (number of FTE jobs, gross and net pay, job qualification, type of employment contract, hours worked by gender and qualification, etc.), some of which are available free of charge.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset contains a comprehensive collection of lexical items from various languages within the Carib linguistic family. It is structured to facilitate computational historical linguistics analysis, offering detailed information on language characteristics, word forms, and cognacy judgments. The data is curated to support research in linguistic typology, historical linguistics, and related fields.
Data Structure
The dataset is presented in a TSV (Tab-Separated Values) format, ensuring easy integration with common data analysis tools. Each lexical item in the dataset is detailed with multiple linguistic attributes, including phonological transcriptions, morphological analysis, and cognacy information. The following table summarizes the fields included in the dataset:
Field Name Data Type Description
ID string Unique identifier for each dataset entry.
ID_lang string Unique identifier for the language within the dataset.
Glottocode string Code uniquely identifying the language in the Glottolog database.
Glottolog_Name string Name of the language as recorded in the Glottolog database.
ISO639P3code string ISO 639-3 code for the language.
ID_param string Unique identifier for the linguistic parameter or concept within the dataset.
Concepticon_ID integer Identifier for the concept in the Concepticon database.
Concepticon_Gloss string Gloss or definition of the concept from the Concepticon database.
Value string Value of the linguistic data point, typically a word or phrase in the language.
Form string Phonetic or phonological transcription of the linguistic data point.
Segments string Further phonetic or phonological breakdown of the form.
Source string Reference to the source or citation where the data was obtained.
Morphemes string Morphological breakdown of the form.
SimpleCognate integer Cognacy judgment, indicating whether the form is cognate with forms of the same meaning in related languages.
PartialCognates string Partial cognacy coding, detailing the cognacy of individual segments or morphemes.
Intended Use
This dataset is intended for researchers and linguists specializing in the Carib linguistic family. It provides valuable insights into the lexical similarities and differences across the languages within this family, supporting studies on language evolution, relationships, and structure.
Additional Resources
Metadata for Validation: This dataset comes with comprehensive metadata following the Frictionless Data standard, ensuring that the data structure and types are accurately described for validation purposes. This metadata aids in maintaining the integrity and usability of the data across various computational platforms and research projects.
CLDF Version Available: For researchers utilizing the Cross-Linguistic Data Formats (CLDF), a version of this dataset is available in CLDF specifications. This version is provided as a zipped file, facilitating easier distribution and handling.
https://datacatalog.worldbank.org/public-licenses?fragment=cchttps://datacatalog.worldbank.org/public-licenses?fragment=cc
The Global Financial Development Database is an extensive dataset of financial system characteristics for 214 economies. It contains annual data, starting from 1960. It has been last updated in September 2022 and contains data through 2021 for 108 indicators, capturing various aspects of financial institutions and markets. Please, be advised that the latest release presents changes in the methodology to compute some of the indicators, which have been properly identified in blue in the tab "Metadata".
The Global Financial Development Database is based on a “4x2 framework”. Specifically, it includes measures of (1) depth, (2) access, (3) efficiency, and (4) stability of financial systems. Each of these characteristics is captured both for (1) financial institutions (for example banks and insurance companies), and (2) financial markets (such as stock markets and bond markets). The database builds on, updates, and extends previous efforts, in particular the data collected for the World Bank database “Financial Development and Structure”.
For more information on the Global Financial Development Database, the 4x2 framework, and the underlying theoretical and empirical literature, see chapter 1 of the 2013 Global Financial Development Report, and Martin Cihák, Asli Demirgüç-Kunt, Erik Feyen, and Ross Levine. 2012. “Benchmarking Financial Systems around the World.” Policy Research Working Paper 6175, World Bank, Washington, DC. (A version of the paper also appeared in the Journal of Financial Perspectives.)
The World Bank is not responsible for the quality or accuracy of the information reported in the database. The data set may contain errors and omissions. For a description of the various indicators, please refer to the concepts and definitions reported in the definition and sources tab. The original source of the data is also identified. Users are advised to consult the accompanying metadata and contact directly the original data providers for specific inquiries on data points and series.
Since 1 January 2021, local and regional authorities and public institutions must draw up an annual Single Social Report (SSR) for the previous year. This report compiles indicators relating to human resources policies around the following themes:employment, recruitment, career paths, training, pay, occupational health and safety, etc. This file constitutes the summary of the RSU 2022, carried out by the Department on the basis of data from the Social Database (BDS) as at 31 December 2022. Together with the summary of the database produced by the Centre Départemental de Gestion de la Seine-Maritime (see metadata) and the csv files from the database, also published on site, it constitutes the Single Social Report. The RSU then serves as the basis for the establishment of the Management Guidelines, which enable public employers to formalise or update their multiannual human resources management strategy. Metadata Link to metadata Additional resources * Légifrance website: https://www.legifrance.gouv.fr/loda/id/JORFTEXT000044930851/ The website of the Public Service for the Dissemination of Law offers for download the consolidated version of the Order of 10 December 2021, which frames the indicators contained in the social database (BDS, see definition in metadata). * Examples of unique social relationships: https://www.paris.fr/pages/le-rapport-social-unique-22259 and https://donnees-sociales.fr/dossier-de-presse-2-2/ For comparison, it is possible to find various RSU (see definition in metadata) online, downloadable in .pdf format. The attached links refer to the RSUs of the city of Paris and to the annual national summary available on the site données-sociales.fr of the management centers (see definition in the metadata). * INSEE website: https://www.insee.fr/information/2407785 The website of the National Institute of Statistics and Economic Studies (INSEE) offers a page dedicated to the All-Employees Base (BTS), based on the Registered Social Declarations (DSN), which must be fed by any company employing employees, in accordance with the Social Security Code and the General Tax Code. Since 2009, its scope has been extended to the three public functions (State, territorial and hospital), and to the employees of private employers, thus covering all employees in the French economic sector. This database provides extensive employment data (number of FTE jobs, gross and net pay, job qualification, type of employment contract, hours worked by gender and qualification, etc.), some of which are available free of charge.
The Intelligent Building Agents (IBA) project is part of the Embedded Intelligence in Buildings Program in the Engineering Laboratory at the National Institute of Standards and Technology (NIST). A key part of the IBA Project is the IBA Laboratory (IBAL), a unique facility consisting of a mixed system of off the shelf equipment, including chillers and air handling units, controlled by a data acquisition system and capable of supporting building system optimization research under realistic and reproducible operating conditions.The database contains the values of approximately 300 sensors/actuators in the IBAL, including both sensor measurements and control actions, as well as approximately 850 process data, which are typically related to control settings and decisions. Each of the sensors/actuators has associated metadata. The metadata, sensors/actuators, and process data are defined on the "metadata", "sensors", and "parameters" tabs in the definitions file. Data are collected every 10 s.The database contains two dashboards: 1) Experiments - select data from individual experiments and 2) Measurements - select individual sensor/actuator and parameter data. The Experiments Dashboard contains three sections. The "Experiment Data Plot" shows plots of the sensor/actuator data selected in the second section, "Experiment/Metadata". There are plots of both scaled and raw data (see the meta data file for the conversion from raw to scaled data). Underneath the plots is a "Download CSV" button; select that button and a csv file of the data in the plot is automatically generated. In "Experiment/Metadata", first select an "Experiment" from the options in the table on the left. A specific experiment or type of experiment can be found by entering terms in the search box. For example, searching for the word "Charge" will bring up experiments in which the ice thermal storage tank is charged. The table of experiments also includes the duration of the experiment in minutes.Once an experiment is selected, specific sensor/actuator data points can be selected from the "Measurements" table on the right. These data can be filtered by subsystem (e.g., primary loop, secondary loop, Chiller1) and/or measurement type (e.g., pressure, flow, temperature). These data will then be shown in the plots at the top. The final section, "Process", contains the process data, which are shown by the subsystem. These data are not shown in the plots but can be downloaded by selecting the "Download CSV" button in the "Process" section. The Measurements Dashboard contains three sections. The "Date Range" section is used to select the time range of the data. The "All Measurements" section is used to select specific sensor/actuator data. As in the Experiments Dashboard, these data can be filtered by subsystem and/or measurement type. The scaled and raw values of the selected data are then plotted in the "Historical Data Plot" section. The "Download CSV" button underneath the plots will automatically download the selected data.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The current and future consequences of anthropogenic impacts such as climate change and habitat loss on ecosystems will be better understood and therefore addressed if diverse ecological data from multiple environmental contexts are more effectively shared. Re-use requires that data are readily available to the scientific scrutiny of the research community. A number of repositories to store shared data have emerged in different ecological domains and developments are underway to define common data and metadata standards. Nevertheless, the goal is far from being achieved and many challenges still need to be addressed. The definition of best practices for data sharing and re-use can benefit from the experience accumulated by pilot collaborative projects. The Euromammals bottom-up initiative has pioneered collaborative science in spatial animal ecology since 2007. It involves more than 150 institutes to address scientific, management and conservation questions regarding terrestrial mammal species in Europe using data stored in a shared database. In this manuscript we present some key lessons that we have learnt from the process of making shared data and knowledge accessible to researchers and we stress the importance of data management for data quality assurance. We suggest putting in place a pro-active data review before data are made available in shared repositories via robust technical support and users’ training in data management and standards. We recommend pursuing the definition of common data collection protocols, data and metadata standards, and shared vocabularies with direct involvement of the community to boost their implementation. We stress the importance of knowledge sharing, in addition to data sharing. We show the crucial relevance of collaborative networking with pro-active involvement of data providers in all stages of the scientific process. Our main message is that for data-sharing collaborative efforts to obtain substantial and durable scientific returns, the goals should not only consist in the creation of e-infrastructures and software tools but primarily in the establishment of a network and community trust. This requires moderate investment, but over long-term horizons.
Open Data Commons Attribution License (ODC-By) v1.0https://www.opendatacommons.org/licenses/by/1.0/
License information was derived automatically
Covid19Kerala.info-Data is a consolidated multi-source open dataset of metadata from the COVID-19 outbreak in the Indian state of Kerala. It is created and maintained by volunteers of ‘Collective for Open Data Distribution-Keralam’ (CODD-K), a nonprofit consortium of individuals formed for the distribution and longevity of open-datasets. Covid19Kerala.info-Data covers a set of correlated temporal and spatial metadata of SARS-CoV-2 infections and prevention measures in Kerala. Static releases of this dataset snapshots are manually produced from a live database maintained as a set of publicly accessible Google sheets. This dataset is made available under the Open Data Commons Attribution License v1.0 (ODC-BY 1.0).
Schema and data package Datapackage with schema definition is accessible at https://codd-k.github.io/covid19kerala.info-data/datapackage.json. Provided datapackage and schema are based on Frictionless data Data Package specification.
Temporal and Spatial Coverage
This dataset covers COVID-19 outbreak and related data from the state of Kerala, India, from January 31, 2020 till the date of the publication of this snapshot. The dataset shall be maintained throughout the entirety of the COVID-19 outbreak.
The spatial coverage of the data lies within the geographical boundaries of the Kerala state which includes its 14 administrative subdivisions. The state is further divided into Local Self Governing (LSG) Bodies. Reference to this spatial information is included on appropriate data facets. Available spatial information on regions outside Kerala was mentioned, but it is limited as a reference to the possible origins of the infection clusters or movement of the individuals.
Longevity and Provenance
The dataset snapshot releases are published and maintained in a designated GitHub repository maintained by CODD-K team. Periodic snapshots from the live database will be released at regular intervals. The GitHub commit logs for the repository will be maintained as a record of provenance, and archived repository will be maintained at the end of the project lifecycle for the longevity of the dataset.
Data Stewardship
CODD-K expects all administrators, managers, and users of its datasets to manage, access, and utilize them in a manner that is consistent with the consortium’s need for security and confidentiality and relevant legal frameworks within all geographies, especially Kerala and India. As a responsible steward to maintain and make this dataset accessible— CODD-K absolves from all liabilities of the damages, if any caused by inaccuracies in the dataset.
License
This dataset is made available by the CODD-K consortium under ODC-BY 1.0 license. The Open Data Commons Attribution License (ODC-By) v1.0 ensures that users of this dataset are free to copy, distribute and use the dataset to produce works and even to modify, transform and build upon the database, as long as they attribute the public use of the database or works produced from the same, as mentioned in the citation below.
Disclaimer
Covid19Kerala.info-Data is provided under the ODC-BY 1.0 license as-is. Though every attempt is taken to ensure that the data is error-free and up to date, the CODD-K consortium do not bear any responsibilities for inaccuracies in the dataset or any losses—monetary or otherwise—that users of this dataset may incur.
This file, from the Single Social Report of the Department of Seine-Maritime, provides the number of staff holding a paid functional post at 31/12/2022, by origin status (official or contract), job context, sector (administrative, technical and fire and rescue), grade of secondment and gender. Functional posts are permanent posts in the management of local authorities and their public establishments, the exhaustive list of which is provided for in Article 53 of the Law of 26 January 1984. They may be occupied by civil servants seconded to these posts (Article 53 of the Law of 26 January 1984). An official may be seconded to a functional post: * from one authority or establishment to another, with return to the authority or establishment of origin at the end of the secondment; * or within the same community or establishment, either where the staff member previously held a post corresponding to his grade there or where he is recruited there by transfer prior to secondment. Certain functional posts may also be occupied by contract staff, under the conditions laid down in Article 47 of the Law of 26 January 1984 referred to above. The contract is then concluded for a fixed term of up to three years, renewable for periods of up to three years and may not be renewed as a contract of indefinite duration. Clarification: Indicator 1.1.0 identifies officials and contract staff in functional permanent posts paid on 31 December of the year (1 person = 1 unit). Metadata Link to metadata Additional resources * Légifrance website: https://www.legifrance.gouv.fr/loda/id/JORFTEXT000044930851/ The website of the Public Service for the Dissemination of Law offers for download the consolidated version of the Order of 10 December 2021, which frames the indicators contained in the social database (BDS, see definition in metadata). * Examples of unique social relationships: https://www.paris.fr/pages/le-rapport-social-unique-22259 and https://donnees-sociales.fr/dossier-de-presse-2-2/ For comparison, it is possible to find various RSU (see definition in metadata) online, downloadable in .pdf format. The attached links refer to the RSUs of the city of Paris and to the annual national summary available on the site données-sociales.fr of the management centers (see definition in the metadata). * INSEE website: https://www.insee.fr/information/2407785 The website of the National Institute of Statistics and Economic Studies (INSEE) offers a page dedicated to the All-Employees Base (BTS), based on the Registered Social Declarations (DSN), which must be fed by any company employing employees, in accordance with the Social Security Code and the General Tax Code. Since 2009, its scope has been extended to the three public functions (State, territorial and hospital), and to the employees of private employers, thus covering all employees in the French economic sector. This database provides extensive employment data (number of FTE jobs, gross and net pay, job qualification, type of employment contract, hours worked by gender and qualification, etc.), some of which are available free of charge.
Attribution-NonCommercial-ShareAlike 3.0 (CC BY-NC-SA 3.0)https://creativecommons.org/licenses/by-nc-sa/3.0/
License information was derived automatically
Abstract: We present version 2 of a database for Central American volcanic rocks and related components. The Rutgers University Central American Geochemical data (RU_CAGeochem) includes the active volcanoes related to the Cocos-Caribbean convergent plate boundary that extends from Guatemala to Costa Rica in Central America. The RU prefix signifies that the data and/or samples are primarily from the long-term Central American research project started at Dartmouth College in 1970 and continued at Rutgers University between 1974 to the present. The database is decidedly uneven because of the impressive improvement of analytical techniques during the period of sample and data production. Further complications arise because most of the sampling and analysis were part of the educational process for many different undergraduate and graduate students using different types of instruments. This note presents, as a reasonably coherent whole, geochemical data and metadata for about 1325 samples collected by at least 40 students and colleagues. Many new Sr, Nd and Pb isotopic ratios are included here but most of the new data are metadata that provide greatly improved descriptions of the locations and status of the samples as well as estimates of data quality. Version 2 includes updated, more precise geocoordinates, and additional descriptive metadata: Lcode, Unit, and Relative position that allow definition of units (Lcode, Unit) and stratigraphic position. Other Description: Carr, M. J., Feigenson, M. D., Bolge, L. L., Walker, J. A. and Gazel, E. (2014), RU_CAGeochem, a database and sample repository for Central American volcanic rocks at Rutgers University. Geoscience Data Journal.
An excel template with data elements and conventions corresponding to the openLCA unit process data model. Includes LCA Commons data and metadata guidelines and definitions Resources in this dataset:Resource Title: READ ME - data dictionary. File Name: lcaCommonsSubmissionGuidelines_FINAL_2014-09-22.pdfResource Title: US Federal LCA Commons Life Cycle Inventory Unit Process Template. File Name: FedLCA_LCI_template_blank EK 7-30-2015.xlsxResource Description: Instructions: This template should be used for life cycle inventory (LCI) unit process development and is associated with an openLCA plugin to import these data into an openLCA database. See www.openLCA.org to download the latest release of openLCA for free, and to access available plugins.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The common vampire bat (Desmodus rotundus) is a hematophagous bat species found across the North, Central, and South American continents. Desmodus rotundus is one of the only tree mammal species exclusively having a sanguivorous diet (i.e., blood). The species has a large distributional range and predates on a large range of vertebrate species. Desmodus rotundus is a know reservoir for the rabies virus and contributes to the continued spread of this pathogen across Latin America. Nevertheless, little is known about the historical distribuion of D. rotundus across it range. Historical occurrence data are critical for the assessment of past and current distributions for this species, and is necessary for a plethora of other ecological, biogeographic, and epidemiological studies. This is a dataset of D. rotundus historical occurrence including >37,000 locality reports across the Americas to facilitate spatiotemporal studies of the species.Data and metadata definitions. The following table provides standardized definitions of each occurrence metadata based on the the Darwin Core Archive format. Each piece of metadata for each occurrence is organized and recorded under the listed column headers.Validation Code: This file contains code for usage and cleaning of the Desmodus rotundus record database. This code was also used for the Technical Validation process of the final D. rotunudus dataset.
The dataset was derived by the Bioregional Assessment Programme from multiple source datasets. The source datasets are identified in the Lineage field in this metadata statement. The processes undertaken to produce this derived dataset are described in the History field in this metadata statement.
The dataset contains 10,000 replicates of AWRA model pre-processing outputs (streamflow Qtot and baseflow Qb), used for calculating additional coal resources development impacts on hydrological response variables in 30 simulation nodes (Zhang et al., 2016).
References
Zhang Y Q, Viney N R, Peeters L J M, Wang B, Yang A, Li L T, McVicar T R, Marvanek S P, Rachakonda P K, Shi X G, Pagendam D E and Singh R M (2016) Surface water numerical modelling for the Gloucester subregion. Product 2.6.1 for the Gloucester subregion from the Northern Sydney Basin Bioregional Assessment. Department of the Environment, Bureau of Meteorology, CSIRO and Geoscience Australia, Australia., Department of the Environment, Bureau of Meteorology, CSIRO and Geoscience Australia, Australia., http://data.bioregionalassessments.gov.au/product/NSB/GLO/2.6.1.
This pre-processing data is used for estimating AWRA post-processing streamflow outputs under CRDP and baseline conditions, respectively.
The dataset has all files and scripts necessary to execute the 10,000 runs on the linux platform of the CSIRO High Performance Cluster computers.
The AWRA-L model version 4.5 has been used for all BA surface water simulations. The application is developed with the C# language. All execution and class (dll) files can be found at \OSM-07-CDC.it.csiro.au\OSM_CBR_LW_BA_working\Disciplines\SurfaceWater\Modelling\AWRA-LG\Bin. The executable file "BACalibrationAndSimulationApp.exe" generates global definition files which define the input and output data and input time series locations. The executable file "SimulateModel.exe" runs simulations based on the global definition files and outputs required variables (Qtot, Qb, Dd) in NetCDF format. All simulation runs have implemented on local Windows 7 work stations.
The AWRA preprocessing data are the inputs for estimating AWRA post-processing model outputs (GUID: http://data.bioregionalassessments.gov.au/dataset/15ca8f9d-84b4-4395-87db-ab4ff15b9f07).
The dataset was uploaded to
\lw-osm-01-cdc.it.csiro.au\OSM_CBR_LW_BAModelRuns_app\GLO\AWRA_ScalingChange_rerun on 03 September 2016
This dataset were further used to compute daily streamflow post-processing outputs under CRDP and baseline conditions, respectively.
Bioregional Assessment Programme (XXXX) GLO AWRA Model Pre-Processing Data v01. Bioregional Assessment Derived Dataset. Viewed 18 July 2018, http://data.bioregionalassessments.gov.au/dataset/51079bcc-96a8-409d-a951-3671fbbad6a2.
Derived From Standard Instrument Local Environmental Plan (LEP) - Heritage (HER) (NSW)
Derived From NSW Office of Water GW licence extract linked to spatial locations - GLO v5 UID elements 27032014
Derived From GLO SW Receptors 20150828 withRivers&CatchmentAreas
Derived From Groundwater Economic Assets GLO 20150326
Derived From Gloucester digitised coal mine boundaries
Derived From Groundwater Dependent Ecosystems supplied by the NSW Office of Water on 13/05/2014
Derived From NSW Office of Water GW licence extract linked to spatial locations GLOv4 UID 14032014
Derived From Communities of National Environmental Significance Database - RESTRICTED - Metadata only
Derived From GLO SW receptor total catchment areas V01
Derived From National Groundwater Dependent Ecosystems (GDE) Atlas
Derived From Asset database for the Gloucester subregion on 12 September 2014
Derived From GEODATA 9 second DEM and D8: Digital Elevation Model Version 3 and Flow Direction Grid 2008
Derived From National Groundwater Information System (NGIS) v1.1
Derived From GLO Receptors 20150518
Derived From Groundwater Entitlement Data GLO NSW Office of Water 20150320 PersRemoved
Derived From Natural Resource Management (NRM) Regions 2010
Derived From Groundwater Entitlement Data Gloucester - NSW Office of Water 20150320
Derived From New South Wales NSW Regional CMA Water Asset Information WAIT tool databases, RESTRICTED Includes ALL Reports
Derived From National Groundwater Dependent Ecosystems (GDE) Atlas (including WA)
Derived From EIS Gloucester Coal 2010
Derived From GEODATA TOPO 250K Series 3, File Geodatabase format (.gdb)
Derived From GEODATA TOPO 250K Series 3
Derived From Asset database for the Gloucester subregion on 28 May 2015
Derived From NSW Catchment Management Authority Boundaries 20130917
Derived From Geological Provinces - Full Extent
Derived From Geofabric Surface Cartography - V2.1
Derived From NSW Office of Water GW licence extract linked to spatial locations GLOv3 12032014
Derived From EIS for Rocky Hill Coal Project 2013
Derived From Bioregional Assessment areas v03
Derived From BILO Gridded Climate Data: Daily Climate Data for each year from 1900 to 2012
Derived From National Heritage List Spatial Database (NHL) (v2.1)
Derived From Asset database for the Gloucester subregion on 8 April 2015
Derived From Gloucester - Additional assets from local councils
Derived From NSW Office of Water combined geodatabase of regulated rivers and water sharing plan regions
Derived From Asset database for the Gloucester subregion on 29 August 2014
Derived From Collaborative Australian Protected Areas Database (CAPAD) 2010 - External Restricted
Derived From Groundwater Modelling Report for Stratford Coal Mine
Derived From Directory of Important Wetlands in Australia (DIWA) Spatial Database (Public)
Derived From NSW Office of Water Groundwater Licence Extract Gloucester - Oct 2013
Derived From New South Wales NSW - Regional - CMA - Water Asset Information Tool - WAIT - databases
Derived From Freshwater Fish Biodiversity Hotspots
Derived From NSW Office of Water Groundwater licence extract linked to spatial locations GLOv2 19022014
Derived From GLO climate data stats summary
Derived From Australia - Species of National Environmental Significance Database
Derived From
GIS Layer Boundary Geometry:
GIS Format Data Files: Ideally, Tax Year Parcel data should be provided in a shapefile (please include the .shp, .shx, .dbf, .prj, and .xml component files) or file geodatabase format. An empty shapefile and file geodatabase schema are available for download at:
ftp://ftp.agrc.utah.gov/UtahSGID_Vector/UTM12_NAD83/CADASTRE/LIR_ParcelSchema.zip
At the request of a county, AGRC will provide technical assistance to counties to extract, transform, and load parcel and assessment information into the GIS layer format.
Geographic Coverage: Tax year parcel polygons should cover the area of each county for which assessment information is created and digital parcels are available. Full coverage may not be available yet for each county. The county may provide parcels that have been adjusted to remove gaps and overlaps for administrative tax purposes or parcels that retain these expected discrepancies that take their source from the legally described boundary or the process of digital conversion. The diversity of topological approaches will be noted in the metadata.
One Tax Parcel Record Per Unique Tax Notice: Some counties produce an annual tax year parcel GIS layer with one parcel polygon per tax notice. In some cases, adjacent parcel polygons that compose a single taxed property must be merged into a single polygon. This is the goal for the statewide layer but may not be possible in all counties. AGRC will provide technical support to counties, where needed, to merge GIS parcel boundaries into the best format to match with the annual assessment information.
Standard Coordinate System: Parcels will be loaded into Utah’s statewide coordinate system, Universal Transverse Mercator coordinates (NAD83, Zone 12 North). However, boundaries stored in other industry standard coordinate systems will be accepted if they are both defined within the data file(s) and documented in the metadata (see below).
Descriptive Attributes:
Database Field/Column Definitions: The table below indicates the field names and definitions for attributes requested for each Tax Parcel Polygon record.
FIELD NAME FIELD TYPE LENGTH DESCRIPTION EXAMPLE
SHAPE (expected) Geometry n/a The boundary of an individual parcel or merged parcels that corresponds with a single county tax notice ex. polygon boundary in UTM NAD83 Zone 12 N or other industry standard coordinates including state plane systems
COUNTY_NAME Text 20 - County name including spaces ex. BOX ELDER
COUNTY_ID (expected) Text 2 - County ID Number ex. Beaver = 1, Box Elder = 2, Cache = 3,..., Weber = 29
ASSESSOR_SRC (expected) Text 100 - Website URL, will be to County Assessor in most all cases ex. webercounty.org/assessor
BOUNDARY_SRC (expected) Text 100 - Website URL, will be to County Recorder in most all cases ex. webercounty.org/recorder
DISCLAIMER (added by State) Text 50 - Disclaimer URL ex. gis.utah.gov...
CURRENT_ASOF (expected) Date - Parcels current as of date ex. 01/01/2016
PARCEL_ID (expected) Text 50 - County designated Unique ID number for individual parcels ex. 15034520070000
PARCEL_ADD (expected, where available) Text 100 - Parcel’s street address location. Usually the address at recordation ex. 810 S 900 E #304 (example for a condo)
TAXEXEMPT_TYPE (expected) Text 100 - Primary category of granted tax exemption ex. None, Religious, Government, Agriculture, Conservation Easement, Other Open Space, Other
TAX_DISTRICT (expected, where applicable) Text 10 - The coding the county uses to identify a unique combination of property tax levying entities ex. 17A
TOTAL_MKT_VALUE (expected) Decimal - Total market value of parcel's land, structures, and other improvements as determined by the Assessor for the most current tax year ex. 332000
LAND _MKT_VALUE (expected) Decimal - The market value of the parcel's land as determined by the Assessor for the most current tax year ex. 80600
PARCEL_ACRES (expected) Decimal - Parcel size in acres ex. 20.360
PROP_CLASS (expected) Text 100 - Residential, Commercial, Industrial, Mixed, Agricultural, Vacant, Open Space, Other ex. Residential
PRIMARY_RES (expected) Text 1 - Is the property a primary residence(s): Y'(es), 'N'(o), or 'U'(nknown) ex. Y
HOUSING_CNT (expected, where applicable) Text 10 - Number of housing units, can be single number or range like '5-10' ex. 1
SUBDIV_NAME (optional) Text 100 - Subdivision name if applicable ex. Highland Manor Subdivision
BLDG_SQFT (expected, where applicable) Integer - Square footage of primary bldg(s) ex. 2816
BLDG_SQFT_INFO (expected, where applicable) Text 100 - Note for how building square footage is counted by the County ex. Only finished above and below grade areas are counted.
FLOORS_CNT (expected, where applicable) Decimal - Number of floors as reported in county records ex. 2
FLOORS_INFO (expected, where applicable) Text 100 - Note for how floors are counted by the County ex. Only above grade floors are counted
BUILT_YR (expected, where applicable) Short - Estimated year of initial construction of primary buildings ex. 1968
EFFBUILT_YR (optional, where applicable) Short - The 'effective' year built' of primary buildings that factors in updates after construction ex. 1980
CONST_MATERIAL (optional, where applicable) Text 100 - Construction Material Types, Values for this field are expected to vary greatly by county ex. Wood Frame, Brick, etc
Contact: Sean Fernandez, Cadastral Manager (email: sfernandez@utah.gov; office phone: 801-209-9359)
TEMPO-Online provides the following functions and services: Free access to statistical information.Export of tables in .csv and .xls formats and its printing. What is the content of TEMPO-Online? The National Institute of Statistics offers a statistical database, TEMPO-Online, that gives the possibility to access a large range of information.The content of the above-mentioned database consists of:Approximately 1100 statistical indicators, divided in socio-economical fields and sub-fields; Metadata associated to the statistical indicators (definition, starting and ending year of the time series, the last period of data loading, statistical methodology, the last updating); Detailed indicators at statistical characteristics group and/or sub-group level ( ex. The total number of employees at the end of the year by employee category, activities of the national economy - sections, sexes, areas and counties); Time series starting with 1990 - till today: With a monthly, quarterly, semi-annual and annual frequency; At national level, development region level, county and commune level. Search according to key words The search key words allows the finding of various objects (tables with statistical variables divided on time series). The search will give back results based on the matrix code and on the key words in the title or in the definition of a matrix. The result of the search will show on a list with specific objects. For a key word, one can use the searching section from the menu bar on the left.Tables As a whole, the tables that result following an interrogation have a flexible structure. For instance, the user may select the variables and attributes with the help of the interrogation interface, according to his needs.The user can save the table that results following an interrogation in .csv and .xls formats and its printingNote: in order to access tables at place level (very large), the user has to select each county with the respective places, so that the access be faster and avoid technical blocks.
description: The Runway Incursion database contains records of events involving the incorrect presence of an aircraft, vehicle or person on the protected area of a surface designated for the landing and take off of aircraft. Runway incursion events are reported by the respective air traffic control tower. The data reflect the ICAO definition of a runway incursion as well as the related severity categories. The runway incursion database is maintained by the FAA Office of Runway Safety.; abstract: The Runway Incursion database contains records of events involving the incorrect presence of an aircraft, vehicle or person on the protected area of a surface designated for the landing and take off of aircraft. Runway incursion events are reported by the respective air traffic control tower. The data reflect the ICAO definition of a runway incursion as well as the related severity categories. The runway incursion database is maintained by the FAA Office of Runway Safety.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Trusted Research Environments (TREs) enable analysis of sensitive data under strict security assertions that protect the data with technical organizational and legal measures from (accidentally) being leaked outside the facility. While many TREs exist in Europe, little information is available publicly on the architecture and descriptions of their building blocks & their slight technical variations. To shine light on these problems, we give an overview of existing, publicly described TREs and a bibliography linking to the system description. We further analyze their technical characteristics, especially in their commonalities & variations and provide insight on their data type characteristics and availability. Our literature study shows that 47 TREs worldwide provide access to sensitive data of which two-thirds provide data themselves, predominantly via secure remote access. Statistical offices make available a majority of available sensitive data records included in this study.
We performed a literature study covering 47 TREs worldwide using scholarly databases (Scopus, Web of Science, IEEE Xplore, Science Direct), a computer science library (dblp.org), Google and grey literature focusing on retrieving the following source material:
The goal for this literature study is to discover existing TREs, analyze their characteristics and data availability to give an overview on available infrastructure for sensitive data research as many European initiatives have been emerging in recent months.
This dataset consists of five comma-separated values (.csv) files describing our inventory:
Additionally, a MariaDB (10.5 or higher) schema definition .sql file is needed, properly modelling the schema for databases:
The analysis was done through Jupyter Notebook which can be found in our source code repository: https://gitlab.tuwien.ac.at/martin.weise/tres/-/blob/master/analysis.ipynb
This file, from the Single Social Report, itself fed by the Social Database (see metadata below), concerns officials, holders and trainees of the Seine-Maritime Department, holding a permanent full-time or non-full-time job and having been paid for at least one day in 2022 (excluding overtime and/or additional hours). It lists the number of staff in ETPR (1 ETPR = 1 unit). The Paid Full Time Equivalent (PRTE) is proportional to the activity of an agent, measured by his working time and by his period of activity over the year. However, it does not take into account overtime and/or overtime worked by the staff member. The basis of calculation for a full-time staff member (35 hours), having been in active employment throughout the year, corresponds to the total number of hours paid, i.e. 1 820 hours for a year. Periods of paid activity are included in this calculation basis (leave, absence, etc.). The number of paid hours to be taken into account for an agent is the number of annual hours accumulated on the last day of the year or on the last day of work of the agent. Special case of staff working in the cultural sector: one artistic teaching assistant working 20 hours a week (full-time reference) corresponds to 1 FTE (base 35 paid hours). An artistic teacher working 16 hours a week (full-time reference) corresponds to 1 ETPR. E xemples: - one full-time paid staff member present all year round corresponds to 1 FTE, i.e. 1 820 hours - one part-time staff member (80 %) present all year round corresponds to 0.8 FTE - one non-full-time staff member (25 hours per week) who has been present for 4 months of the year corresponds to 0.24 FTE > calculation: (25 hours /35)*(4 months /12) - one part-time staff member (80 %) being reassigned full-time on 1 June of the year corresponds to 0.9 FTE > calculation: (0.8 (5 months /12)) + (1(7 months /12)) Data are broken down by channel, hierarchical category (see metadata) and gender. Metadata Link to metadata Additional resources * Légifrance website: https://www.legifrance.gouv.fr/loda/id/JORFTEXT000044930851/ The website of the Public Service for the Dissemination of Law offers for download the consolidated version of the Order of 10 December 2021, which frames the indicators contained in the social database (BDS, see definition in metadata). * Examples of unique social relationships: https://www.paris.fr/pages/le-rapport-social-unique-22259 and https://donnees-sociales.fr/dossier-de-presse-2-2/ For comparison, it is possible to find various RSU (see definition in metadata) online, downloadable in .pdf format. The attached links refer to the RSUs of the city of Paris and to the annual national summary available on the site données-sociales.fr of the management centers (see definition in the metadata). * INSEE website: https://www.insee.fr/information/2407785 The website of the National Institute of Statistics and Economic Studies (INSEE) offers a page dedicated to the All-Employees Base (BTS), based on the Registered Social Declarations (DSN), which must be fed by any company employing employees, in accordance with the Social Security Code and the General Tax Code. Since 2009, its scope has been extended to the three public functions (State, territorial and hospital), and to the employees of private employers, thus covering all employees in the French economic sector. This database provides extensive employment data (number of FTE jobs, gross and net pay, job qualification, type of employment contract, hours worked by gender and qualification, etc.), some of which are available free of charge.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This fileset provides supporting data and corpora for the empirical study described in:Rafael S. Gonçalves and Mark A. Musen. The variable quality of metadata about biological samples used in biomedical experiments. Scientific Data, in press (2019).Description of filesAnalysis spreadsheet files:- ncbi-biosample-metadata-study.xlsx contains data to support the analysis of the quality of metadata in the NCBI BioSample.- ebi-biosamples-metadata-study.xlsx contains data to support the analysis of the quality of metadata in the EBI BioSamples.Validation data files:- ncbi-biosample-validation-data.tar.gz is an archive containing the validation data for the analysis of the entire NCBI BioSample dataset.- ncbi-biosample-packaged-validation-data.tar.gz is an archive containing the validation data for the analysis of the subset of metadata records in the NCBI BioSample that use a BioSample package definition.- ebi-ncbi-shared-records-validation-data.tar.gz is an archive containing the validation data for the analysis of the set of metadata records that exist both in EBI BioSamples and NCBI BioSample.Corpus files:- ebi-biosamples-corpus.xml.gz corresponds to the EBI BioSamples corpus.- ncbi-biosample-corpus.xml.gz corresponds to the NCBI BioSample corpus.- ncbi-biosample-packaged-records-corpus.tar.gz corresponds to the NCBI BioSample metadata records that declare a package definition.- ebi-ncbi-shared-records-corpus.tar.gz corresponds to the corpus of metadata records that exist both in NCBI BioSample and EBI BioSamples.