Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
The Registry of Open Data on AWS contains publicly available datasets that are available for access from AWS resources. Note that datasets in this registry are available via AWS resources, but they are not provided by AWS; these datasets are owned and maintained by a variety of government organizations, researchers, businesses, and individuals. This dataset contains derived forms of the data in https://github.com/awslabs/open-data-registry that have been transformed for ease of use with machine interfaces. Currently, only the ndjson form of the registry is populated here.
The purpose of the Fiscal Service Data Registry is to promote the common identification, use and sharing of data/information across the federal government.
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
The ont-open-data registry provides reference sequencing data from Oxford Nanopore Technologies to support, 1) Exploration of the characteristics of nanopore sequence data. 2) Assessment and reproduction of performance benchmarks 3) Development of tools and methods. The data deposited showcases DNA sequences from a representative subset of sequencing chemistries. The datasets correspond to publicly-available reference samples (e.g. Genome In A Bottle reference cell lines). Raw data are provided with metadata and scripts to describe sample and data provenance.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
MultiCoNER 1 is a large multilingual dataset (11 languages) for Named Entity Recognition. It is designed to represent some of the contemporary challenges in NER, including low-context scenarios (short and uncased text), syntactically complex entities such as movie titles, and long-tail entity distributions. MultiCoNER 2 is a large multilingual dataset (12 languages) for fine grained Named Entity Recognition. Its fine-grained taxonomy contains 36 NE classes, representing real-world challenges for NER, where named entities, apart from the surface form, context represents a critical role in distinguishing between the different fine-grained types (e.g. Scientist vs. Athlete). Furthermore, the test data of MultiCoNER 2 contains noisy instances, where the noise has been applied to both context tokens as well as the entity tokens. The noise includes typing errors at character level based on keyboard layouts in the the different languages.
https://data.csiro.au/dap/ws/v2/licences/1161https://data.csiro.au/dap/ws/v2/licences/1161
The CSIRO Linked Data Registry provides a service form management and public access to codes, codelists, vocabularies, ontologies and other reference resources authorized or adopted by CSIRO.
It is based on the UK Government Linked Data design for a Linked Data registry developed by Epimorphics.
https://woudc.org/en/data/data-use-policyhttps://woudc.org/en/data/data-use-policy
Connection to datasets in the WOUDC Data Registry Search Index.
Attribution 3.0 (CC BY 3.0)https://creativecommons.org/licenses/by/3.0/
License information was derived automatically
Data released under the Department of Energy's (DOE) Open Energy Data Initiative (OEDI). The Open Energy Data Initiative aims to improve and automate access of high-value energy data sets across the U.S. Department of Energy’s programs, offices, and national laboratories. OEDI aims to make data actionable and discoverable by researchers and industry to accelerate analysis and advance innovation.
The dataset collection consists of one or more dataset tables sourced from the website of Helsingin kaupunkiympäristön toimiala in Finland.
https://www.usa.gov/government-workshttps://www.usa.gov/government-works
This dataset contains the up-to-date metadata on Work Zone feeds that meet the Work Zone Data Exchange (WZDx) specifications and is registered with USDOT ITS DataHub. The current work zone data from each feed can be accessed through their respective API links. Some links provide direct access, while others require a user to create their own API access key first. Please see the attached API Key Instructions document to learn how to sign up for API keys for the requisite feeds.
The ITS Work Zone Sandbox, contains an archive of work zone data collected from each feed at a rate of at least every 15 minutes. This is not intended as a replacement for the work zone feeds and in many cases does not update as frequently as the feed does.
READ is EPA's authoritative source for information about Agency information resources, including applications/systems, datasets and models. READ is one component of the System of Registries (SoR).
https://woudc.org/en/data/data-use-policyhttps://woudc.org/en/data/data-use-policy
Connection to dataset metadata in the WOUDC Data Registry Search Index.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
A scoping review of the literature was conducted to identify trauma registries that included information on injuries sustained by civilians and local combatants in the MENA conflict nations. The search was not restricted by date. Definitions were pre-determined to prevent ambiguity surrounding the concepts of trauma registries and the MENA conflict nations. After the publications were identified, bibliometric analysis was performed on the large percentage of the articles indexed in the Web of Science database and published in journals indexed in Clarivate's Journal Citation Report (JCR). The search was developed by a professional medical librarian in consultation with the author team; it included both keywords and subject headings representing the selected MENA nations, data, and trauma. The searches were independently peer reviewed by another librarian using a modified PRESS Checklist, and were conducted in MEDLINE via PubMed, Embase via Elsevier, Scopus via Elsevier, Global Health Database via EBSCOhost, and Web of Science via Clarivate.29 The searches were executed on January 10, 2023 and found 1,033 unique citations. Complete reproducible search strategies for all registries and the grey literature search plan are detailed in the Supplementary Materials. All citations were imported into Covidence, a systematic review screening software. PRISMA reporting guidelines were followed throughout. The study focused on civilian victims within conflict zones. We excluded studies solely focused on military casualties. We included registries that contained demographics, processes of care, or outcomes of injured patients seen by a health care facility. Countries of interest were selected based on geographic location in 21st century conflicts. Our list of conflict countries was defined by the World Bank's MENA nations list and its 2022 List of Fragile and Conflict-affected Situations.10,11 The following 6 nations were included in both lists: Lebanon, Gaza/West Bank, Syria, Iraq, Yemen, and Libya. ... [Read More]
Objective: To develop a clinical informatics pipeline designed to capture large-scale structured EHR data for a national patient registry.
Materials and Methods: The EHR-R-REDCap pipeline is implemented using R-statistical software to remap and import structured EHR data into the REDCap-based multi-institutional Merkel Cell Carcinoma (MCC) Patient Registry using an adaptable data dictionary.
Results: Clinical laboratory data were extracted from EPIC Clarity across several participating institutions. Labs were transformed, remapped and imported into the MCC registry using the EHR labs abstraction (eLAB) pipeline. Forty-nine clinical tests encompassing 482,450 results were imported into the registry for 1,109 enrolled MCC patients. Data-quality assessment revealed highly accurate, valid labs. Univariate modeling was performed for labs at baseline on overall survival (N=176) using this clinical informatics pipeline.
Conclusion: We demonstrate feasibility of the facile eLAB workflow. EHR...
The Sentinel-2 mission is a land monitoring constellation of two satellites that provide high resolution optical imagery and provide continuity for the current SPOT and Landsat missions. The mission provides a global coverage of the Earth's land surface every 5 days, making the data of great use in on-going studies. L1C data are available from June 2015 globally. L2A data are available from November 2016 over Europe region and globally since January 2017.
https://iknl.nl/nkr/cijfers-op-maat/gegevensaanvraaghttps://iknl.nl/nkr/cijfers-op-maat/gegevensaanvraag
The data from the Dutch Cancer Registry (NKR) provide insights into improving care for people with cancer.
The NKR includes information about diagnostics, diagnosis, tumor characteristics and initial treatment, regardless of the treatment location. For an increasing number of cancer types, follow-up data are also available for subsequent treatments. The data is collected by specially trained IKNL data managers in hospitals based on information in the medical file. Since 1989, the database contains data from patients from all over the Netherlands.
https://project-open-data.cio.gov/unknown-license/https://project-open-data.cio.gov/unknown-license/
FAA Data Registry
Harness AI-Driven Precision for Global Company Insights Leverage cutting-edge AI agents to fetch and validate company registry data in real-time, bypassing obsolete databases. Unlike traditional providers, our service dynamically retrieves data directly from government registries worldwide, ensuring up-to-the-minute accuracy and eliminating outdated records.
Key Features 1. AI-Powered Real-Time Access: Deploy autonomous AI agents to collect and structure data from any national registry, even those with dynamic layouts or authentication barriers.
Universal Registry Compatibility: Seamlessly extract data from 250+ countries, including hard-to-access regions, with automatic translation and normalization.
Document Processing: Parse financial filings, annual reports, and legal documents (PDF, DOCX) using NLP-driven analysis. Extract key attributes like ownership structures, director details, and compliance status.
Format Flexibility: Receive data via API, CSV, JSON, or custom formats (e.g., PostgreSQL DB, Google Sheets) with hourly/daily refresh options.
99% Accuracy Guarantee: Multi-layer validation via AI cross-referencing and human audits ensures error-free datasets.
Data Sourcing & Coverage 1. Sources: Direct integration with 1,800+ government registries of your choice on demand, supplemented by AI-enhanced verification of public filings and regulatory submissions.
Attributes: Company name, registration number, directors, shareholders, financials, litigation history, and industry-specific certifications (e.g., ISO, NAICS).
Historical Data: 10+ years of archived records, updated in real-time.
Use Cases 1. Due Diligence: Verify company legitimacy for mergers, acquisitions, or partnerships.
Compliance: Streamline KYC/AML workflows with automated registry checks.
Market Research: Track competitor expansions, ownership changes, or industry trends.
Risk Management: Monitor regulatory violations or financial instability signals.
Credit Reporting: Automate end-to-end credit report creation process.
Technical Specifications 1. Delivery: API (REST/GraphQL), SFTP, cloud sync (AWS S3, Google Cloud).
Integration: Custom connectors for Salesforce, HubSpot, and BI tools (Tableau, Power BI).
Latency: Sub-5-second to 60 mins response time for on-demand queries based on the complexity and response time of registry.
Why Choose Us? 1. Pioneers in AI Agent Technology: Outperform static datasets with live registry scraping.
GDPR/CCPA Compliance: Data sourced ethically from public registries, with audit trails on output.
Free Sample: Test 100 records at zero cost.
City of Austin Open Data Terms of Use https://data.austintexas.gov/stories/s/ranj-cccq This dataset is a monthly upload of the Community Registry (www.AustinTexas.gov/CR), where community organizations such as neighborhood associations may register with the City of Austin to receive notices of land development permit applications within 500 feet of the organization's specified boundaries. This dataset can be used to contact multiple registered organizations at once by filtering/sorting, for example, by Association Type or by Association ZipCode. The organizations' boundaries can be viewed in the City's interactive map at www.AustinTexas.gov/GIS/PropertyProfile/ - the Community Registry layer is under the Boundaries/Grids folder.
The Evaluation Registry is the main source for USAID evaluation reporting, including the number of evaluations completed by USAID operating units each year, and how evaluations are used. USAID evaluations are the systematic collection and analysis of data about the characteristics and outcomes of strategies, projects, and activities. They are used as evidence to inform decisions, to improve effectiveness of current program activities, and future programming. The data contained in this dataset is derived from the USAID Evaluation Registry.
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
The Registry of Open Data on AWS contains publicly available datasets that are available for access from AWS resources. Note that datasets in this registry are available via AWS resources, but they are not provided by AWS; these datasets are owned and maintained by a variety of government organizations, researchers, businesses, and individuals. This dataset contains derived forms of the data in https://github.com/awslabs/open-data-registry that have been transformed for ease of use with machine interfaces. Currently, only the ndjson form of the registry is populated here.