100+ datasets found
  1. t

    Trusted Research Environments: Analysis of Characteristics and Data...

    • researchdata.tuwien.ac.at
    bin, csv
    Updated Jun 25, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Martin Weise; Martin Weise; Andreas Rauber; Andreas Rauber (2024). Trusted Research Environments: Analysis of Characteristics and Data Availability [Dataset]. http://doi.org/10.48436/cv20m-sg117
    Explore at:
    bin, csvAvailable download formats
    Dataset updated
    Jun 25, 2024
    Dataset provided by
    TU Wien
    Authors
    Martin Weise; Martin Weise; Andreas Rauber; Andreas Rauber
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Trusted Research Environments (TREs) enable analysis of sensitive data under strict security assertions that protect the data with technical organizational and legal measures from (accidentally) being leaked outside the facility. While many TREs exist in Europe, little information is available publicly on the architecture and descriptions of their building blocks & their slight technical variations. To shine light on these problems, we give an overview of existing, publicly described TREs and a bibliography linking to the system description. We further analyze their technical characteristics, especially in their commonalities & variations and provide insight on their data type characteristics and availability. Our literature study shows that 47 TREs worldwide provide access to sensitive data of which two-thirds provide data themselves, predominantly via secure remote access. Statistical offices make available a majority of available sensitive data records included in this study.

    Methodology

    We performed a literature study covering 47 TREs worldwide using scholarly databases (Scopus, Web of Science, IEEE Xplore, Science Direct), a computer science library (dblp.org), Google and grey literature focusing on retrieving the following source material:

    • Peer-reviewed articles where available,
    • TRE websites,
    • TRE metadata catalogs.

    The goal for this literature study is to discover existing TREs, analyze their characteristics and data availability to give an overview on available infrastructure for sensitive data research as many European initiatives have been emerging in recent months.

    Technical details

    This dataset consists of five comma-separated values (.csv) files describing our inventory:

    • countries.csv: Table of countries with columns id (number), name (text) and code (text, in ISO 3166-A3 encoding, optional)
    • tres.csv: Table of TREs with columns id (number), name (text), countryid (number, refering to column id of table countries), structureddata (bool, optional), datalevel (one of [1=de-identified, 2=pseudonomized, 3=anonymized], optional), outputcontrol (bool, optional), inceptionyear (date, optional), records (number, optional), datatype (one of [1=claims, 2=linked records]), optional), statistics_office (bool), size (number, optional), source (text, optional), comment (text, optional)
    • access.csv: Table of access modes of TREs with columns id (number), suf (bool, optional), physical_visit (bool, optional), external_physical_visit (bool, optional), remote_visit (bool, optional)
    • inclusion.csv: Table of included TREs into the literature study with columns id (number), included (bool), exclusion reason (one of [peer review, environment, duplicate], optional), comment (text, optional)
    • major_fields.csv: Table of data categorization into the major research fields with columns id (number), life_sciences (bool, optional), physical_sciences (bool, optional), arts_and_humanities (bool, optional), social_sciences (bool, optional).

    Additionally, a MariaDB (10.5 or higher) schema definition .sql file is needed, properly modelling the schema for databases:

    • schema.sql: Schema definition file to create the tables and views used in the analysis.

    The analysis was done through Jupyter Notebook which can be found in our source code repository: https://gitlab.tuwien.ac.at/martin.weise/tres/-/blob/master/analysis.ipynb

  2. d

    Integrated Library System (ILS) Data Dictionary

    • catalog.data.gov
    • cos-data.seattle.gov
    • +1more
    Updated Jul 12, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    data.seattle.gov (2025). Integrated Library System (ILS) Data Dictionary [Dataset]. https://catalog.data.gov/dataset/integrated-library-system-ils-data-dictionary-cf2d9
    Explore at:
    Dataset updated
    Jul 12, 2025
    Dataset provided by
    data.seattle.gov
    Description

    Lookup table of Horizon item and borrower codes. The source of this data are the code definition tables in Horizon, such as bstat, btype, collection, and itype. This dataset is useful for understanding the codes used in some of Seattle Public Library's other open datasets. These codes (namely "ItemType" and "ItemCollection") are systematically used in the cataloging of items within Integrated Library System (ILS), Horizon (Sirsidynix).

  3. a

    Data Dictionary For "RC PermitDetailByAddress" Table

    • hub.arcgis.com
    • data-santarosa.opendata.arcgis.com
    • +1more
    Updated Feb 13, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    City of Santa Rosa (2020). Data Dictionary For "RC PermitDetailByAddress" Table [Dataset]. https://hub.arcgis.com/documents/56c20ea61e31432c9214b79a20269a78
    Explore at:
    Dataset updated
    Feb 13, 2020
    Dataset authored and provided by
    City of Santa Rosa
    Description

    Data Dictionary providing background and insights into the composition and structure of data made available in the table named "RC_Recovery_Progress_SharedOpendata_PermitDetailByAddress" . That table can be retrieved from the City's Open Data site: https://data-santarosa.opendata.arcgis.com/search?tags=PEDhttps://santarosa.maps.arcgis.com/home/item.html?id=4ecf7a61f10847a3b019ce39e7f1cc96

  4. Z

    CFFGKBS Ver. 5 (All Tables) Data Definition - Updated Jan 2020

    • data-staging.niaid.nih.gov
    Updated Aug 17, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nur Marahaini Mohd Nizar; Ayman Salama; Ebrahim Jahanshiri; Siti Sarah Mohd Sinin; Anil Shekar Tharmandram; Yuveena Gopalan (2020). CFFGKBS Ver. 5 (All Tables) Data Definition - Updated Jan 2020 [Dataset]. https://data-staging.niaid.nih.gov/resources?id=zenodo_3988136
    Explore at:
    Dataset updated
    Aug 17, 2020
    Dataset provided by
    Crops For the Future
    Authors
    Nur Marahaini Mohd Nizar; Ayman Salama; Ebrahim Jahanshiri; Siti Sarah Mohd Sinin; Anil Shekar Tharmandram; Yuveena Gopalan
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    A compilation of data definition for the Global Knowledge Base for Underutilised Crops.

  5. d

    Open Data Dictionary Template Individual

    • catalog.data.gov
    • hub.arcgis.com
    Updated Feb 4, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Office of the Chief Tecnology Officer (2025). Open Data Dictionary Template Individual [Dataset]. https://catalog.data.gov/dataset/open-data-dictionary-template-individual
    Explore at:
    Dataset updated
    Feb 4, 2025
    Dataset provided by
    Office of the Chief Tecnology Officer
    Description

    This template covers section 2.5 Resource Fields: Entity and Attribute Information of the Data Discovery Form cited in the Open Data DC Handbook (2022). It completes documentation elements that are required for publication. Each field column (attribute) in the dataset needs a description clarifying the contents of the column. Data originators are encouraged to enter the code values (domains) of the column to help end-users translate the contents of the column where needed, especially when lookup tables do not exist.

  6. a

    PCWEBF21 Parcel Table Field Dictionary

    • data-dcpw.opendata.arcgis.com
    • hub.arcgis.com
    Updated May 3, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Douglas County MN Survey & GIS (2018). PCWEBF21 Parcel Table Field Dictionary [Dataset]. https://data-dcpw.opendata.arcgis.com/documents/d6e41ff0d3954a6991d99db5a2a366de
    Explore at:
    Dataset updated
    May 3, 2018
    Dataset authored and provided by
    Douglas County MN Survey & GIS
    Description

    Use this data dictionary to identify what field names mean in the PCWEBF21 Parcel/Tax Information Table from the Tax System.

  7. Medical Service Study Area Data Dictionary

    • data.chhs.ca.gov
    • data.ca.gov
    • +4more
    Updated Sep 5, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Department of Health Care Access and Information (2024). Medical Service Study Area Data Dictionary [Dataset]. https://data.chhs.ca.gov/dataset/medical-service-study-area-data-dictionary
    Explore at:
    kml, zip, html, arcgis geoservices rest api, geojson, csvAvailable download formats
    Dataset updated
    Sep 5, 2024
    Dataset authored and provided by
    Department of Health Care Access and Information
    Description
    Field NameData TypeDescription
    StatefpNumberUS Census Bureau unique identifier of the state
    CountyfpNumberUS Census Bureau unique identifier of the county
    CountynmTextCounty name
    TractceNumberUS Census Bureau unique identifier of the census tract
    GeoidNumberUS Census Bureau unique identifier of the state + county + census tract
    AlandNumberUS Census Bureau defined land area of the census tract
    AwaterNumberUS Census Bureau defined water area of the census tract
    AsqmiNumberArea calculated in square miles from the Aland
    MSSAidTextID of the Medical Service Study Area (MSSA) the census tract belongs to
    MSSAnmTextName of the Medical Service Study Area (MSSA) the census tract belongs to
    DefinitionTextType of MSSA, possible values are urban, rural and frontier.
    TotalPovPopNumberUS Census Bureau total population for whom poverty status is determined of the census tract, taken from the 2020 ACS 5 YR S1701
  8. SQL Integrity Journey: Unleashing Data Constraints

    • kaggle.com
    zip
    Updated Oct 9, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Radha Gandhi (2023). SQL Integrity Journey: Unleashing Data Constraints [Dataset]. https://www.kaggle.com/datasets/radhagandhi/sql-integrity-journey-unleashing-data-constraints
    Explore at:
    zip(13817 bytes)Available download formats
    Dataset updated
    Oct 9, 2023
    Authors
    Radha Gandhi
    Description

    **Title: **Practical Exploration of SQL Constraints: Building a Foundation in Data Integrity Introduction: Welcome to my Data Analysis project, where I focus on mastering SQL constraints—a pivotal aspect of database management. This project centers on hands-on experience with SQL's Data Definition Language (DDL) commands, emphasizing constraints such as PRIMARY KEY, FOREIGN KEY, UNIQUE, CHECK, and DEFAULT. In this project, I aim to demonstrate my foundational understanding of enforcing data integrity and maintaining a structured database environment. Purpose: The primary purpose of this project is to showcase my proficiency in implementing and managing SQL constraints for robust data governance. By delving into the realm of constraints, you'll gain insights into my SQL skills and how I utilize constraints to ensure data accuracy, consistency, and reliability within relational databases. What to Expect: Within this project, you will find a series of projects that focus on the implementation and utilization of SQL constraints. These projects highlight my command over the following key constraint types: NOT NULL: The NOT NULL constraint is crucial for ensuring the presence of essential data in a column. PRIMARY KEY: Ensuring unique identification of records for data integrity. FOREIGN KEY: Establishing relationships between tables to maintain referential integrity. UNIQUE: Guaranteeing the uniqueness of values within specified columns. CHECK: Implementing custom conditions to validate data entries. DEFAULT: Setting default values for columns to enhance data reliability. Each exercise within this project is accompanied by clear and concise SQL scripts, explanations of the intended outcomes, and practical insights into the application of these constraints. My goal is to showcase how SQL constraints serve as crucial tools for creating a structured and dependable database foundation. I invite you to explore these projects in detail, where I provide hands-on examples that highlight the importance and utility of SQL constraints. Together, these projects underscore my commitment to upholding data quality, ensuring data accuracy, and harnessing the power of SQL constraints for informed decision-making in data analysis. 3.1 CONSTRAINT - ENFORCING NOT NULL CONSTRAINT WHILE CREATING NEW TABLE. 3.2 CONSTRAINT- ENFORCE NOT NULL CONSTRAINT ON EXISTING COLUMN. 3.3 CONSTRAINT - ENFORCING PRIMARY KEY CONSTRAINT WHILE CREATING A NEW TABLE. 3.4 CONSTRAINT - ENFORCE PRIMARY KEY CONSTRAINT ON EXISTING COLUMN. 3.5 CONSTRAINT - ENFORCING FOREIGN KEY CONSTRAINT WHILE CREATING NEW TABLE. 3.6 CONSTRAINT - ENFORCE FOREIGN KEY CONSTRAINT ON EXISTING COLUMN. 3.7CONSTRAINT - ENFORCING UNIQUE CONSTRAINTS WHILE CREATING A NEW TABLE. 3.8 CONSTRAINT - ENFORCING UNIQUE CONSTRAINT IN EXISTING TABLE. 3.9 CONSTRAINT - ENFORCING CHECK CONSTRAINT IN NEW TABLE. 3.10 CONSTRAINT - ENFORCING CHECK CONSTRAINT IN THE EXISTING TABLE. 3.11 CONSTRAINT - ENFORCING DEFAULT CONSTRAINT IN THE NEW TABLE. 3.12 CONSTRAINT - ENFORCING DEFAULT CONSTRAINT IN THE EXISTING TABLE.

  9. Superstore

    • kaggle.com
    zip
    Updated Oct 3, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ibrahim Elsayed (2022). Superstore [Dataset]. https://www.kaggle.com/datasets/ibrahimelsayed182/superstore
    Explore at:
    zip(167457 bytes)Available download formats
    Dataset updated
    Oct 3, 2022
    Authors
    Ibrahim Elsayed
    Description

    Context

    super Store in USA , the data contain about 10000 rows

    Data Dictionary

    AttributesDefinitionexample
    Ship ModeSecond Class
    SegmentSegment CategoryConsumer
    CountryUnited State
    CityLos Angeles
    StateCalifornia
    Postal Code90032
    RegionWest
    CategoryCategories of productTechnology
    Sub-CategoryPhones
    Salesnumber of sales114.9
    Quantity3
    Discount0.45
    Profit14.1694

    Acknowledgements

    All thanks to The Sparks Foundation For making this data set

    Inspiration

    Get the data and try to take insights. Good luck ❤️

    Don't forget to Upvote😊🥰

  10. d

    Data from: (Table S1 c) Definition of planktic 14C plateaus for sediment...

    • search.dataone.org
    • doi.pangaea.de
    Updated Jan 8, 2018
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Balmer, Sven; Sarnthein, Michael; Mudelsee, Manfred; Grootes, Pieter Meiert (2018). (Table S1 c) Definition of planktic 14C plateaus for sediment core KNR159-5-36GGC [Dataset]. http://doi.org/10.1594/PANGAEA.863637
    Explore at:
    Dataset updated
    Jan 8, 2018
    Dataset provided by
    PANGAEA Data Publisher for Earth and Environmental Science
    Authors
    Balmer, Sven; Sarnthein, Michael; Mudelsee, Manfred; Grootes, Pieter Meiert
    Time period covered
    Oct 14, 1998
    Area covered
    Description

    No description is available. Visit https://dataone.org/datasets/3e90b1ba2075e41e917c8edbffd4c761 for complete metadata about this dataset.

  11. Number of licensed day care center slots per 1,000 children aged 0-5 years

    • data.ca.gov
    • data.chhs.ca.gov
    • +3more
    pdf, xlsx, zip
    Updated Nov 7, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    California Department of Public Health (2025). Number of licensed day care center slots per 1,000 children aged 0-5 years [Dataset]. https://data.ca.gov/dataset/number-of-licensed-day-care-center-slots-per-1000-children-aged-0-5-years
    Explore at:
    pdf, xlsx, zipAvailable download formats
    Dataset updated
    Nov 7, 2025
    Dataset authored and provided by
    California Department of Public Healthhttps://www.cdph.ca.gov/
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This table contains data on the number of licensed day care center slots (facility capacity) per 1,000 children aged 0-5 years in California, its regions, counties, cities, towns, and census tracts. The table contains 2015 data, and includes type of facility (day care center or infant center). Access to child care has become a critical support for working families. Many working families find high-quality child care unaffordable, and the increasing cost of child care can be crippling for low-income families and single parents. These barriers can impact parental choices of child care. Increased availability of child care facilities can positively impact families by providing more choices of child care in terms of price and quality. Estimates for this indicator are provided for the total population, and are not available by race/ethnicity. More information on the data table and a data dictionary can be found in the Data and Resources section. The licensed day care centers table is part of a series of indicators in the Healthy Communities Data and Indicators Project (HCI) of the Office of Health Equity. The goal of HCI is to enhance public health by providing data, a standardized set of statistical measures, and tools that a broad array of sectors can use for planning healthy communities and evaluating the impact of plans, projects, policy, and environmental changes on community health. The creation of healthy social, economic, and physical environments that promote healthy behaviors and healthy outcomes requires coordination and collaboration across multiple sectors, including transportation, housing, education, agriculture and others. Statistical metrics, or indicators, are needed to help local, regional, and state public health and partner agencies assess community environments and plan for healthy communities that optimize public health. More information on HCI can be found here: https://www.cdph.ca.gov/Programs/OHE/CDPH%20Document%20Library/Accessible%202%20CDPH_Healthy_Community_Indicators1pager5-16-12.pdf

    The format of the licensed day care centers table is based on the standardized data format for all HCI indicators. As a result, this data table contains certain variables used in the HCI project (e.g., indicator ID, and indicator definition). Some of these variables may contain the same value for all observations.

  12. e

    Labour force; international definition 1996-2013

    • data.europa.eu
    atom feed, json
    Updated Oct 12, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). Labour force; international definition 1996-2013 [Dataset]. https://data.europa.eu/data/datasets/1415-beroepsbevolking-internationale-definitie-1996-2013
    Explore at:
    atom feed, jsonAvailable download formats
    Dataset updated
    Oct 12, 2025
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    In this table you will find annual figures on the composition of the Dutch labour force according to the international definition.

    The Dutch definition of the labour force differs from the definition that applies internationally as a standard: the International Labour Organisation (ILO). As a result, the size and composition of the labour force differs. First, the Dutch definition uses a threshold of 12 hours for the number of hours per week that someone works or wants to work. This is not the case in the international definition. Secondly, the unemployed labour force is defined differently. According to the international definition, someone should be able to start a job within two weeks. In the Dutch definition, in certain cases, a period of three months is maintained on the period during which someone can start working or have engaged in search activities.

    Data available from 1996/1998 to 2011/2013

    Status of the figures Figures based on the EBB are always final.

    Changes as of 26 February 2015 None, this table has been discontinued.

    Changes as of 1 April 2014: The figures for 2013 have been added to this table. All data on the profession in 2012 and 2013 are not yet available. As soon as this data becomes available, they will be added to this table. The data on the level of education from 2012 onwards is provisional.

    When are new figures coming? This table has been discontinued. The update of 1 April 2014 was the last update of this table. New revised tables on the labour force were published on 26 February 2015. This revision of the statistics of the labour force has two parts. The definitions have been adapted to the internationally agreed definitions and data collection has been improved by being the first statistical office in Europe to survey via the Internet. For more information on the revision, see the link to the press release in paragraph 3.

  13. S

    The Semantic Data Dictionary – An Approach for Describing and Annotating...

    • scidb.cn
    Updated Oct 17, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sabbir M. Rashid; James P. McCusker; Paulo Pinheiro; Marcello P. Bax; Henrique Santos; Jeanette A. Stingone; Amar K. Das; Deborah L. McGuinness (2020). The Semantic Data Dictionary – An Approach for Describing and Annotating Data [Dataset]. http://doi.org/10.11922/sciencedb.j00104.00060
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Oct 17, 2020
    Dataset provided by
    Science Data Bank
    Authors
    Sabbir M. Rashid; James P. McCusker; Paulo Pinheiro; Marcello P. Bax; Henrique Santos; Jeanette A. Stingone; Amar K. Das; Deborah L. McGuinness
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    17 tables and two figures of this paper. Table 1 is a subset of explicit entries identified in NHANES demographics data. Table 2 is a subset of implicit entries identified in NHANES demographics data. Table 3 is a subset of NHANES demographic Codebook entries. Table 4 presents a subset of explicit entries identified in SEER. Table 5 is a subset of Dictionary Mapping for the MIMIC-III Admission table. Table 6 shows high-level comparison of semantic data dictionaries, traditional data dictionaries, approaches involving mapping languages, and general data integration tools. Table A1 shows namespace prefixes and IRIs for relevant ontologies. Table B1 shows infosheet specification. Table B2 shows infosheet metadata supplement. Table B3 shows dictionary mapping specification. Table B4 is a codebook specification. Table B5 is a timeline specification. Table B6 is properties specification. Table C1 shows NHANES demographics infosheet. Table C2 shows NHANES demographic implicit entries. Table C3 shows NHANES demographic explicit entries. Table C4 presents expanded NHANES demographic Codebook entries. Figure 1 is a conceptual diagram of the Dictionary Mapping that allows for a representation model that aligns with existing scientific ontologies. The Dictionary Mapping is used to create a semantic representation of data columns. Each box, along with the “Relation” label, corresponds to a column in the Dictionary Mapping table. Blue rounded boxes correspond to columns that contain resource URIs, while white boxes refer to entities that are generated on a per-row/column basis. The actual cell value in concrete columns is, if there is no Codebook for the column, mapped to the “has value” object of the column object, which is generally either an attribute or an entity. Figure 2 presents (a) A conceptual diagram of the Codebook, which can be used to assign ontology classes to categorical concepts. Unlike other mapping approaches, the use of the Codebook allows for the annotation of cell values, rather than just columns. (b) A conceptual diagram of the Timeline, which can be used to represent complex time associated concepts, such as time intervals.

  14. Z

    Data from: Research table for visualization of research performance of the...

    • data-staging.niaid.nih.gov
    • data.niaid.nih.gov
    Updated Jan 24, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Weinberg, Tabea (2020). Research table for visualization of research performance of the German National Library of Science and Technology, German National Library of Medicine and German National Library of Economics via different communication channels [Dataset]. https://data-staging.niaid.nih.gov/resources?id=zenodo_321388
    Explore at:
    Dataset updated
    Jan 24, 2020
    Authors
    Weinberg, Tabea
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    A collection of data for (a practice) work during my studies for Bachelor. Since there wasn't much time given the data collection isn't as complete as it could be. Also, all text is in German.

    Definition of variables for data collection can be found in my finished work "How is reserach performance of the German National Library of Science and Technology, German National Library of Medicine and German National Library of Economics made visible via different communication channels; analysis of information of publication on official social media channels and news on offical webside" (in question to be published on Zenodo).

    Collected data from following official communication channels: Facebook, Twitter, Blogs, Google +, Youtube, Flickr, LinkedIn, Research-Gate, Slideshare, repository and news on websites. For some channels not all data could be collected. Research period from 01.08.2016 to 01.02.2017 (half a year).

  15. Data from: (Table S1 a) Definition of planktic 14C plateaus for sediment...

    • doi.pangaea.de
    • search.dataone.org
    html, tsv
    Updated Aug 10, 2016
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Michael Sarnthein; Manfred Mudelsee; Pieter Meiert Grootes; Sven Balmer (2016). (Table S1 a) Definition of planktic 14C plateaus for sediment core GeoB1711-4 [Dataset]. http://doi.org/10.1594/PANGAEA.863635
    Explore at:
    html, tsvAvailable download formats
    Dataset updated
    Aug 10, 2016
    Dataset provided by
    PANGAEA
    Authors
    Michael Sarnthein; Manfred Mudelsee; Pieter Meiert Grootes; Sven Balmer
    License

    Attribution 3.0 (CC BY 3.0)https://creativecommons.org/licenses/by/3.0/
    License information was derived automatically

    Time period covered
    Jan 11, 1992
    Area covered
    Variables measured
    Plateau, ΔΔ14C, Age, dated, Reservoir age, Depth, top/min, Age, maximum/old, Depth, bottom/max, Age, minimum/young, Standard deviation, Fraction modern carbon, and 1 more
    Description

    Plateau No. 1 - 3: By comparison with Suigetsu Plateau 3 the short length of Plateau 3 may be explained by a ~200-yr drop in local 14C reservoir age over Plateau 3. Its actual range,156.5-160.5 cm, is given by a dotted line in Fig. 2. Plateau No. 4 -6a: Below Plateau 6a a 14C jump of 2000 yr reflects a hiatus at 210-212 cm depth on top of sediments older than Plateau 7 (23 cal. ka).

  16. n

    Data from: Generalizable EHR-R-REDCap pipeline for a national...

    • data.niaid.nih.gov
    • datadryad.org
    zip
    Updated Jan 9, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sophia Shalhout; Farees Saqlain; Kayla Wright; Oladayo Akinyemi; David Miller (2022). Generalizable EHR-R-REDCap pipeline for a national multi-institutional rare tumor patient registry [Dataset]. http://doi.org/10.5061/dryad.rjdfn2zcm
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jan 9, 2022
    Dataset provided by
    Massachusetts General Hospital
    Harvard Medical School
    Authors
    Sophia Shalhout; Farees Saqlain; Kayla Wright; Oladayo Akinyemi; David Miller
    License

    https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html

    Description

    Objective: To develop a clinical informatics pipeline designed to capture large-scale structured EHR data for a national patient registry.

    Materials and Methods: The EHR-R-REDCap pipeline is implemented using R-statistical software to remap and import structured EHR data into the REDCap-based multi-institutional Merkel Cell Carcinoma (MCC) Patient Registry using an adaptable data dictionary.

    Results: Clinical laboratory data were extracted from EPIC Clarity across several participating institutions. Labs were transformed, remapped and imported into the MCC registry using the EHR labs abstraction (eLAB) pipeline. Forty-nine clinical tests encompassing 482,450 results were imported into the registry for 1,109 enrolled MCC patients. Data-quality assessment revealed highly accurate, valid labs. Univariate modeling was performed for labs at baseline on overall survival (N=176) using this clinical informatics pipeline.

    Conclusion: We demonstrate feasibility of the facile eLAB workflow. EHR data is successfully transformed, and bulk-loaded/imported into a REDCap-based national registry to execute real-world data analysis and interoperability.

    Methods eLAB Development and Source Code (R statistical software):

    eLAB is written in R (version 4.0.3), and utilizes the following packages for processing: DescTools, REDCapR, reshape2, splitstackshape, readxl, survival, survminer, and tidyverse. Source code for eLAB can be downloaded directly (https://github.com/TheMillerLab/eLAB).

    eLAB reformats EHR data abstracted for an identified population of patients (e.g. medical record numbers (MRN)/name list) under an Institutional Review Board (IRB)-approved protocol. The MCCPR does not host MRNs/names and eLAB converts these to MCCPR assigned record identification numbers (record_id) before import for de-identification.

    Functions were written to remap EHR bulk lab data pulls/queries from several sources including Clarity/Crystal reports or institutional EDW including Research Patient Data Registry (RPDR) at MGB. The input, a csv/delimited file of labs for user-defined patients, may vary. Thus, users may need to adapt the initial data wrangling script based on the data input format. However, the downstream transformation, code-lab lookup tables, outcomes analysis, and LOINC remapping are standard for use with the provided REDCap Data Dictionary, DataDictionary_eLAB.csv. The available R-markdown ((https://github.com/TheMillerLab/eLAB) provides suggestions and instructions on where or when upfront script modifications may be necessary to accommodate input variability.

    The eLAB pipeline takes several inputs. For example, the input for use with the ‘ehr_format(dt)’ single-line command is non-tabular data assigned as R object ‘dt’ with 4 columns: 1) Patient Name (MRN), 2) Collection Date, 3) Collection Time, and 4) Lab Results wherein several lab panels are in one data frame cell. A mock dataset in this ‘untidy-format’ is provided for demonstration purposes (https://github.com/TheMillerLab/eLAB).

    Bulk lab data pulls often result in subtypes of the same lab. For example, potassium labs are reported as “Potassium,” “Potassium-External,” “Potassium(POC),” “Potassium,whole-bld,” “Potassium-Level-External,” “Potassium,venous,” and “Potassium-whole-bld/plasma.” eLAB utilizes a key-value lookup table with ~300 lab subtypes for remapping labs to the Data Dictionary (DD) code. eLAB reformats/accepts only those lab units pre-defined by the registry DD. The lab lookup table is provided for direct use or may be re-configured/updated to meet end-user specifications. eLAB is designed to remap, transform, and filter/adjust value units of semi-structured/structured bulk laboratory values data pulls from the EHR to align with the pre-defined code of the DD.

    Data Dictionary (DD)

    EHR clinical laboratory data is captured in REDCap using the ‘Labs’ repeating instrument (Supplemental Figures 1-2). The DD is provided for use by researchers at REDCap-participating institutions and is optimized to accommodate the same lab-type captured more than once on the same day for the same patient. The instrument captures 35 clinical lab types. The DD serves several major purposes in the eLAB pipeline. First, it defines every lab type of interest and associated lab unit of interest with a set field/variable name. It also restricts/defines the type of data allowed for entry for each data field, such as a string or numerics. The DD is uploaded into REDCap by every participating site/collaborator and ensures each site collects and codes the data the same way. Automation pipelines, such as eLAB, are designed to remap/clean and reformat data/units utilizing key-value look-up tables that filter and select only the labs/units of interest. eLAB ensures the data pulled from the EHR contains the correct unit and format pre-configured by the DD. The use of the same DD at every participating site ensures that the data field code, format, and relationships in the database are uniform across each site to allow for the simple aggregation of the multi-site data. For example, since every site in the MCCPR uses the same DD, aggregation is efficient and different site csv files are simply combined.

    Study Cohort

    This study was approved by the MGB IRB. Search of the EHR was performed to identify patients diagnosed with MCC between 1975-2021 (N=1,109) for inclusion in the MCCPR. Subjects diagnosed with primary cutaneous MCC between 2016-2019 (N= 176) were included in the test cohort for exploratory studies of lab result associations with overall survival (OS) using eLAB.

    Statistical Analysis

    OS is defined as the time from date of MCC diagnosis to date of death. Data was censored at the date of the last follow-up visit if no death event occurred. Univariable Cox proportional hazard modeling was performed among all lab predictors. Due to the hypothesis-generating nature of the work, p-values were exploratory and Bonferroni corrections were not applied.

  17. h

    Data from: Table 4

    • hepdata.net
    Updated Oct 11, 1996
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (1996). Table 4 [Dataset]. http://doi.org/10.17182/hepdata.47800.v1/t4
    Explore at:
    Dataset updated
    Oct 11, 1996
    Description

    Transverse momentum PTOUT w.r.t. the Shericity axis. For the first table Sphericity axis definition is from seen charged particles corrected...

  18. Mastering the Essentials:Hands-On DDL Command Prac

    • kaggle.com
    zip
    Updated Sep 25, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Radha Gandhi (2023). Mastering the Essentials:Hands-On DDL Command Prac [Dataset]. https://www.kaggle.com/datasets/radhagandhi/1practical-exercise-in-ddl-commands/code
    Explore at:
    zip(7378 bytes)Available download formats
    Dataset updated
    Sep 25, 2023
    Authors
    Radha Gandhi
    Description

    The Practical Exercise in SQL Data Definition Language (DDL) Commands is a hands-on project designed to help you gain a deep understanding of fundamental DDL commands in SQL, including:

    • CREATE TABLE,
    • ALTER(ADD, RENAME, DROP)TABLE,
    • TRUNCATE TABLE.

    This project aims to enhance your proficiency in using SQL to create, modify, and manage database structures effectively.

    1.1 DDL-CREATE TABLE

    1.2 DDL-ALTER TABLE(ADD)

    1.3 DDL-ALTER(RENAME COLUMN NAME)

    1.4 DDL-ALTER(RENAME TABLE NAME)

    1.5 DDL-ALTER(DROP COLUMN FROM TABLE)

    1.6 DDL-ALTER(DROP TABLE)

    1.7 DDL- TRUNCATE TABLE

  19. Park, Beach, Open Space, or Coastline Access

    • data.ca.gov
    • data.chhs.ca.gov
    • +3more
    csv, html, pdf, xlsx +1
    Updated Nov 7, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    California Department of Public Health (2025). Park, Beach, Open Space, or Coastline Access [Dataset]. https://data.ca.gov/dataset/park-beach-open-space-or-coastline-access
    Explore at:
    xlsx, pdf, zip, html, csvAvailable download formats
    Dataset updated
    Nov 7, 2025
    Dataset authored and provided by
    California Department of Public Healthhttps://www.cdph.ca.gov/
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This table contains data on access to parks measured as the percent of population within ½ a mile of a parks, beach, open space or coastline for California, its regions, counties, county subdivisions, cities, towns, and census tracts. More information on the data table and a data dictionary can be found in the Data and Resources section. As communities become increasingly more urban, parks and the protection of green and open spaces within cities increase in importance. Parks and natural areas buffer pollutants and contribute to the quality of life by providing communities with social and psychological benefits such as leisure, play, sports, and contact with nature. Parks are critical to human health by providing spaces for health and wellness activities. The access to parks table is part of a series of indicators in the Healthy Communities Data and Indicators Project (HCI) of the Office of Health Equity. The goal of HCI is to enhance public health by providing data, a standardized set of statistical measures, and tools that a broad array of sectors can use for planning healthy communities and evaluating the impact of plans, projects, policy, and environmental changes on community health. The creation of healthy social, economic, and physical environments that promote healthy behaviors and healthy outcomes requires coordination and collaboration across multiple sectors, including transportation, housing, education, agriculture and others. Statistical metrics, or indicators, are needed to help local, regional, and state public health and partner agencies assess community environments and plan for healthy communities that optimize public health. The format of the access to parks table is based on the standardized data format for all HCI indicators. As a result, this data table contains certain variables used in the HCI project (e.g., indicator ID, and indicator definition). Some of these variables may contain the same value for all observations.

  20. e

    Course participants (old definition); industries 2011

    • data.europa.eu
    atom feed, json
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Course participants (old definition); industries 2011 [Dataset]. https://data.europa.eu/88u/dataset/3489-cursusdeelnemers-oude-definitie-bedrijfstakken-2011
    Explore at:
    atom feed, jsonAvailable download formats
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This table provides data on the number and proportion of persons in the working population who have participated in at least one work-related course and the number of persons in the working population who did not take part in work-related courses. The proportion of the working population taking a work-related course is also presented the number and proportion that at least one work-related course (partially) is paid by the employer.

    This information is broken down by industry (sector) of the company in which the person is employed.

    In 2013, figures from the regular, periodically recurring AES will be published for the first time. The frequency has not yet been definitively established at the time of publication of these tables (see 4. Sources and methods).

    Data available for 2011

    Status of the figures: The figures in this table are final.

    Changes as of 23 October 2015: None, this table has been discontinued.

    When are new figures coming? No longer applicable.

    This table is followed by Course Participants; industries, 2011. See paragraph 3.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Martin Weise; Martin Weise; Andreas Rauber; Andreas Rauber (2024). Trusted Research Environments: Analysis of Characteristics and Data Availability [Dataset]. http://doi.org/10.48436/cv20m-sg117

Trusted Research Environments: Analysis of Characteristics and Data Availability

Explore at:
bin, csvAvailable download formats
Dataset updated
Jun 25, 2024
Dataset provided by
TU Wien
Authors
Martin Weise; Martin Weise; Andreas Rauber; Andreas Rauber
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Trusted Research Environments (TREs) enable analysis of sensitive data under strict security assertions that protect the data with technical organizational and legal measures from (accidentally) being leaked outside the facility. While many TREs exist in Europe, little information is available publicly on the architecture and descriptions of their building blocks & their slight technical variations. To shine light on these problems, we give an overview of existing, publicly described TREs and a bibliography linking to the system description. We further analyze their technical characteristics, especially in their commonalities & variations and provide insight on their data type characteristics and availability. Our literature study shows that 47 TREs worldwide provide access to sensitive data of which two-thirds provide data themselves, predominantly via secure remote access. Statistical offices make available a majority of available sensitive data records included in this study.

Methodology

We performed a literature study covering 47 TREs worldwide using scholarly databases (Scopus, Web of Science, IEEE Xplore, Science Direct), a computer science library (dblp.org), Google and grey literature focusing on retrieving the following source material:

  • Peer-reviewed articles where available,
  • TRE websites,
  • TRE metadata catalogs.

The goal for this literature study is to discover existing TREs, analyze their characteristics and data availability to give an overview on available infrastructure for sensitive data research as many European initiatives have been emerging in recent months.

Technical details

This dataset consists of five comma-separated values (.csv) files describing our inventory:

  • countries.csv: Table of countries with columns id (number), name (text) and code (text, in ISO 3166-A3 encoding, optional)
  • tres.csv: Table of TREs with columns id (number), name (text), countryid (number, refering to column id of table countries), structureddata (bool, optional), datalevel (one of [1=de-identified, 2=pseudonomized, 3=anonymized], optional), outputcontrol (bool, optional), inceptionyear (date, optional), records (number, optional), datatype (one of [1=claims, 2=linked records]), optional), statistics_office (bool), size (number, optional), source (text, optional), comment (text, optional)
  • access.csv: Table of access modes of TREs with columns id (number), suf (bool, optional), physical_visit (bool, optional), external_physical_visit (bool, optional), remote_visit (bool, optional)
  • inclusion.csv: Table of included TREs into the literature study with columns id (number), included (bool), exclusion reason (one of [peer review, environment, duplicate], optional), comment (text, optional)
  • major_fields.csv: Table of data categorization into the major research fields with columns id (number), life_sciences (bool, optional), physical_sciences (bool, optional), arts_and_humanities (bool, optional), social_sciences (bool, optional).

Additionally, a MariaDB (10.5 or higher) schema definition .sql file is needed, properly modelling the schema for databases:

  • schema.sql: Schema definition file to create the tables and views used in the analysis.

The analysis was done through Jupyter Notebook which can be found in our source code repository: https://gitlab.tuwien.ac.at/martin.weise/tres/-/blob/master/analysis.ipynb

Search
Clear search
Close search
Google apps
Main menu