This is an abstract and brief overview of a test public dataset.
https://choosealicense.com/licenses/cdla-sharing-1.0/https://choosealicense.com/licenses/cdla-sharing-1.0/
Dataset containing synthetically generated (by GPT-3.5 and GPT-4) short stories that only use a small vocabulary. Described in the following paper: https://arxiv.org/abs/2305.07759. The models referred to in the paper were trained on TinyStories-train.txt (the file tinystories-valid.txt can be used for validation loss). These models can be found on Huggingface, at roneneldan/TinyStories-1M/3M/8M/28M/33M/1Layer-21M. Additional resources: tinystories_all_data.tar.gz - contains a superset of… See the full description on the dataset page: https://huggingface.co/datasets/dark-xet/test-public-dataset.
The COVID-19 Search Trends symptoms dataset shows aggregated, anonymized trends in Google searches for a broad set of health symptoms, signs, and conditions. The dataset provides a daily or weekly time series for each region showing the relative volume of searches for each symptom. This dataset is intended to help researchers to better understand the impact of COVID-19. It shouldn't be used for medical diagnostic, prognostic, or treatment purposes. It also isn't intended to be used for guidance on personal travel plans. To learn more about the dataset, how we generate it and preserve privacy, read the data documentation . To visualize the data, try exploring these interactive charts and map of symptom search trends . As of Dec. 15, 2020, the dataset was expanded to include trends for Australia, Ireland, New Zealand, Singapore, and the United Kingdom. This expanded data is available in new tables that provide data at country and two subregional levels. We will not be updating existing state/county tables going forward. All bytes processed in queries against this dataset will be zeroed out, making this part of the query free. Data joined with the dataset will be billed at the normal rate to prevent abuse. After September 15, queries over these datasets will revert to the normal billing rate. This public dataset is hosted in Google BigQuery and is included in BigQuery's 1TB/mo of free tier processing. This means that each user receives 1TB of free BigQuery processing every month, which can be used to run queries on this public dataset. Watch this short video to learn how to get started quickly using BigQuery to access public datasets. What is BigQuery .
Public access allowing for public search of the FDA Adverse Events Database
The Google Trends dataset will provide critical signals that individual users and businesses alike can leverage to make better data-driven decisions. This dataset simplifies the manual interaction with the existing Google Trends UI by automating and exposing anonymized, aggregated, and indexed search data in BigQuery. This dataset includes the Top 25 stories and Top 25 Rising queries from Google Trends. It will be made available as two separate BigQuery tables, with a set of new top terms appended daily. Each set of Top 25 and Top 25 rising expires after 30 days, and will be accompanied by a rolling five-year window of historical data in 210 distinct locations in the United States. This Google dataset is hosted in Google BigQuery as part of Google Cloud's Datasets solution and is included in BigQuery's 1TB/mo of free tier processing. This means that each user receives 1TB of free BigQuery processing every month, which can be used to run queries on this public dataset. Watch this short video to learn how to get started quickly using BigQuery to access public datasets. What is BigQuery
https://data.go.kr/ugs/selectPortalPolicyView.dohttps://data.go.kr/ugs/selectPortalPolicyView.do
This is AI learning data for the LLM model created based on government documents. It consists of corpus learning data constructed using press releases, speeches, publications, policy reports, and official documents of meeting/event plans, and objective task learning data for question answering, reconstruction, and summarization. Its main features include: ● To support multimodal LLM and improve LLM understanding of documents with complex tables, tables (html) and pictures (save separately and path indicated) are included in the corpus. ● Includes task datasets for Q&A, summarization, and rewriting that can be utilized to fine-tune the LLM to follow instructions.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description from the SaRNet: A Dataset for Deep Learning Assisted Search and Rescue with Satellite Imagery GitHub Repository * The "Note" was added by the Roboflow team.
This is a single class dataset consisting of tiles of satellite imagery labeled with potential 'targets'. Labelers were instructed to draw boxes around anything they suspect may a paraglider wing, missing in a remote area of Nevada. Volunteers were shown examples of similar objects already in the environment for comparison. The missing wing, as it was found after 3 weeks, is shown below.
https://michaeltpublic.s3.amazonaws.com/images/anomaly_small.jpg" alt="anomaly">
The dataset contains the following:
Set | Images | Annotations |
---|---|---|
Train | 1808 | 3048 |
Validate | 490 | 747 |
Test | 254 | 411 |
Total | 2552 | 4206 |
The data is in the COCO format, and is directly compatible with faster r-cnn as implemented in Facebook's Detectron2.
Download the data here: sarnet.zip
Or follow these steps
# download the dataset
wget https://michaeltpublic.s3.amazonaws.com/sarnet.zip
# extract the files
unzip sarnet.zip
***Note* with Roboflow, you can download the data here** (original, raw images, with annotations): https://universe.roboflow.com/roboflow-public/sarnet-search-and-rescue/ (download v1, original_raw-images) * Download the dataset in COCO JSON format, or another format of choice, and import them to Roboflow after unzipping the folder to get started on your project.
Get started with a Faster R-CNN model pretrained on SaRNet: SaRNet_Demo.ipynb
Source code for the paper is located here: SaRNet_train_test.ipynb
@misc{thoreau2021sarnet,
title={SaRNet: A Dataset for Deep Learning Assisted Search and Rescue with Satellite Imagery},
author={Michael Thoreau and Frazer Wilson},
year={2021},
eprint={2107.12469},
archivePrefix={arXiv},
primaryClass={eess.IV}
}
The source data was generously provided by Planet Labs, Airbus Defence and Space, and Maxar Technologies.
The COVID-19 Vaccination Search Insights data shows aggregated, anonymized trends in searches related to COVID-19 vaccination. The dataset provides a weekly time series for each region showing the relative interest of Google searches related to COVID-19 vaccination, across several categories. The data is intended to help public health officials design, target, and evaluate public education campaigns. To explore and download the data, use our interactive dashboard . To learn more about the dataset, how we generate it and preserve privacy, read the data documentation . This public dataset is hosted in Google BigQuery and is included in BigQuery's 1TB/mo of free tier processing. This means that each user receives 1TB of free BigQuery processing every month, which can be used to run queries on this public dataset. Watch this video to learn how to get started quickly using BigQuery to access public datasets. What is BigQuery .
OpenWeb Ninja's Google Images Data (Google SERP Data) API provides real-time image search capabilities for images sourced from all public sources on the web.
The API enables you to search and access more than 100 billion images from across the web including advanced filtering capabilities as supported by Google Advanced Image Search. The API provides Google Images Data (Google SERP Data) including details such as image URL, title, size information, thumbnail, source information, and more data points. The API supports advanced filtering and options such as file type, image color, usage rights, creation time, and more. In addition, any Advanced Google Search operators can be used with the API.
OpenWeb Ninja's Google Images Data & Google SERP Data API common use cases:
Creative Media Production: Enhance digital content with a vast array of real-time images, ensuring engaging and brand-aligned visuals for blogs, social media, and advertising.
AI Model Enhancement: Train and refine AI models with diverse, annotated images, improving object recognition and image classification accuracy.
Trend Analysis: Identify emerging market trends and consumer preferences through real-time visual data, enabling proactive business decisions.
Innovative Product Design: Inspire product innovation by exploring current design trends and competitor products, ensuring market-relevant offerings.
Advanced Search Optimization: Improve search engines and applications with enriched image datasets, providing users with accurate, relevant, and visually appealing search results.
OpenWeb Ninja's Annotated Imagery Data & Google SERP Data Stats & Capabilities:
100B+ Images: Access an extensive database of over 100 billion images.
Images Data from all Public Sources (Google SERP Data): Benefit from a comprehensive aggregation of image data from various public websites, ensuring a wide range of sources and perspectives.
Extensive Search and Filtering Capabilities: Utilize advanced search operators and filters to refine image searches by file type, color, usage rights, creation time, and more, making it easy to find exactly what you need.
Rich Data Points: Each image comes with more than 10 data points, including URL, title (annotation), size information, thumbnail, and source information, providing a detailed context for each image.
This statistic shows the opinion of Americans in October 2015 on whether or not they would favor a law which would require universal background checks for all gun purchases in the U.S. using a centralized database across all 50 states. In October 2015, 86 percent of the respondents stated that they would favor such a law.
United States agricultural researchers have many options for making their data available online. This dataset aggregates the primary sources of ag-related data and determines where researchers are likely to deposit their agricultural data. These data serve as both a current landscape analysis and also as a baseline for future studies of ag research data. Purpose As sources of agricultural data become more numerous and disparate, and collaboration and open data become more expected if not required, this research provides a landscape inventory of online sources of open agricultural data. An inventory of current agricultural data sharing options will help assess how the Ag Data Commons, a platform for USDA-funded data cataloging and publication, can best support data-intensive and multi-disciplinary research. It will also help agricultural librarians assist their researchers in data management and publication. The goals of this study were to establish where agricultural researchers in the United States-- land grant and USDA researchers, primarily ARS, NRCS, USFS and other agencies -- currently publish their data, including general research data repositories, domain-specific databases, and the top journals compare how much data is in institutional vs. domain-specific vs. federal platforms determine which repositories are recommended by top journals that require or recommend the publication of supporting data ascertain where researchers not affiliated with funding or initiatives possessing a designated open data repository can publish data Approach The National Agricultural Library team focused on Agricultural Research Service (ARS), Natural Resources Conservation Service (NRCS), and United States Forest Service (USFS) style research data, rather than ag economics, statistics, and social sciences data. To find domain-specific, general, institutional, and federal agency repositories and databases that are open to US research submissions and have some amount of ag data, resources including re3data, libguides, and ARS lists were analysed. Primarily environmental or public health databases were not included, but places where ag grantees would publish data were considered. Search methods We first compiled a list of known domain specific USDA / ARS datasets / databases that are represented in the Ag Data Commons, including ARS Image Gallery, ARS Nutrition Databases (sub-components), SoyBase, PeanutBase, National Fungus Collection, i5K Workspace @ NAL, and GRIN. We then searched using search engines such as Bing and Google for non-USDA / federal ag databases, using Boolean variations of “agricultural data” /“ag data” / “scientific data” + NOT + USDA (to filter out the federal / USDA results). Most of these results were domain specific, though some contained a mix of data subjects. We then used search engines such as Bing and Google to find top agricultural university repositories using variations of “agriculture”, “ag data” and “university” to find schools with agriculture programs. Using that list of universities, we searched each university web site to see if their institution had a repository for their unique, independent research data if not apparent in the initial web browser search. We found both ag specific university repositories and general university repositories that housed a portion of agricultural data. Ag specific university repositories are included in the list of domain-specific repositories. Results included Columbia University – International Research Institute for Climate and Society, UC Davis – Cover Crops Database, etc. If a general university repository existed, we determined whether that repository could filter to include only data results after our chosen ag search terms were applied. General university databases that contain ag data included Colorado State University Digital Collections, University of Michigan ICPSR (Inter-university Consortium for Political and Social Research), and University of Minnesota DRUM (Digital Repository of the University of Minnesota). We then split out NCBI (National Center for Biotechnology Information) repositories. Next we searched the internet for open general data repositories using a variety of search engines, and repositories containing a mix of data, journals, books, and other types of records were tested to determine whether that repository could filter for data results after search terms were applied. General subject data repositories include Figshare, Open Science Framework, PANGEA, Protein Data Bank, and Zenodo. Finally, we compared scholarly journal suggestions for data repositories against our list to fill in any missing repositories that might contain agricultural data. Extensive lists of journals were compiled, in which USDA published in 2012 and 2016, combining search results in ARIS, Scopus, and the Forest Service's TreeSearch, plus the USDA web sites Economic Research Service (ERS), National Agricultural Statistics Service (NASS), Natural Resources and Conservation Service (NRCS), Food and Nutrition Service (FNS), Rural Development (RD), and Agricultural Marketing Service (AMS). The top 50 journals' author instructions were consulted to see if they (a) ask or require submitters to provide supplemental data, or (b) require submitters to submit data to open repositories. Data are provided for Journals based on a 2012 and 2016 study of where USDA employees publish their research studies, ranked by number of articles, including 2015/2016 Impact Factor, Author guidelines, Supplemental Data?, Supplemental Data reviewed?, Open Data (Supplemental or in Repository) Required? and Recommended data repositories, as provided in the online author guidelines for each the top 50 journals. Evaluation We ran a series of searches on all resulting general subject databases with the designated search terms. From the results, we noted the total number of datasets in the repository, type of resource searched (datasets, data, images, components, etc.), percentage of the total database that each term comprised, any dataset with a search term that comprised at least 1% and 5% of the total collection, and any search term that returned greater than 100 and greater than 500 results. We compared domain-specific databases and repositories based on parent organization, type of institution, and whether data submissions were dependent on conditions such as funding or affiliation of some kind. Results A summary of the major findings from our data review: Over half of the top 50 ag-related journals from our profile require or encourage open data for their published authors. There are few general repositories that are both large AND contain a significant portion of ag data in their collection. GBIF (Global Biodiversity Information Facility), ICPSR, and ORNL DAAC were among those that had over 500 datasets returned with at least one ag search term and had that result comprise at least 5% of the total collection. Not even one quarter of the domain-specific repositories and datasets reviewed allow open submission by any researcher regardless of funding or affiliation. See included README file for descriptions of each individual data file in this dataset. Resources in this dataset:Resource Title: Journals. File Name: Journals.csvResource Title: Journals - Recommended repositories. File Name: Repos_from_journals.csvResource Title: TDWG presentation. File Name: TDWG_Presentation.pptxResource Title: Domain Specific ag data sources. File Name: domain_specific_ag_databases.csvResource Title: Data Dictionary for Ag Data Repository Inventory. File Name: Ag_Data_Repo_DD.csvResource Title: General repositories containing ag data. File Name: general_repos_1.csvResource Title: README and file inventory. File Name: README_InventoryPublicDBandREepAgData.txt
he set of NIST Test PIV Cards contains sixteen smart cards that are loaded with a PIV Card Application, as specified in NIST Special Publication 800-73-4. The PIV Card Applications on the smart cards are loaded with test data and keys that are similar to what might appear on actual PIV Cards, with the exception that the certificates on the test PIV Cards were issued from a test public key infrastructure. The currently available set of test PIV cards, version 2, includes examples of new, optional features that were introduced in SP 800-73-4, such as on-card biometric comparison, secure messaging, and the virtual contact interface. The set of test cards includes not only examples that are similar to cards issued today, but also examples of cards with features that are expected to appear in cards that will be issued in the future. For example, while the certificates and data objects on most, if not all, cards issued today are signed using RSA PKCS #1 v1.5, the set of test cards include examples of certificates and data objects that are signed using each of the algorithms and key sizes listed in Table 3-2 of Special Publication 800-78-4, including RSASSA-PSS and ECDSA. Similarly, the infrastructure supporting the test cards provides examples of CRLs and OCSP responses that are signed using each of these signature algorithms. The set of test cards also includes certificates with elliptic curve subject public keys in addition to RSA subject public keys, as is permitted by Table 3-1 of Special Publication 800-78-4. The set of test cards, collectively, also include all of the mandatory and optional data objects listed in Section 3 of SP 800-73-4 Part 1, except for Cardholder Iris Images. Several of the cards include a Key History object along with retired key management keys.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Denmark Family Income: Public Transfer: Others: Green Check data was reported at 4,112,051.000 DKK th in 2016. This records a decrease from the previous number of 4,124,453.000 DKK th for 2015. Denmark Family Income: Public Transfer: Others: Green Check data is updated yearly, averaging 0.000 DKK th from Dec 2000 (Median) to 2016, with 17 observations. The data reached an all-time high of 5,401,367.000 DKK th in 2014 and a record low of 0.000 DKK th in 2009. Denmark Family Income: Public Transfer: Others: Green Check data remains active status in CEIC and is reported by Statistics Denmark. The data is categorized under Global Database’s Denmark – Table DK.H009: Income Statistics: Family Income.
https://data.gov.tw/licensehttps://data.gov.tw/license
https://www.archivemarketresearch.com/privacy-policyhttps://www.archivemarketresearch.com/privacy-policy
Market Overview The global background check software market is projected to exhibit a robust CAGR during the forecast period of 2025-2033, reaching a value of XXX million by 2033. The growth is driven by increasing regulatory compliance requirements, rising concerns over data security, and the need for comprehensive screening solutions in various industries. The adoption of cloud-based solutions and the integration of advanced technologies, such as AI and blockchain, are further fueling market expansion. Key Segments and Regional Dynamics Based on type, the cloud-based segment is expected to dominate the market due to its scalability, cost-effectiveness, and ease of access. By application, the enterprise segment accounts for a significant share as businesses seek to strengthen their security measures and enhance employee vetting processes. The government segment is also growing due to the need for thorough background checks in public sector jobs. Regionally, North America holds the largest market share, followed by Europe and Asia Pacific. The growing awareness of data protection laws and the presence of major technology hubs in these regions are key factors driving regional growth.
In March 2022, the number of check-ins per minute with the OV-chipkaart in the Netherlands' public transit network amounted to 85.6 check-ins, the highest traffic of the analyzed period since the beginning of the COVID-19 pandemic.
The OV-chipkaart is a smart card system used for all public transportation in the Netherlands.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Disinformation in the medical field is a growing problem that carries a significant risk. Therefore, it is crucial to detect and combat it effectively. In this article, we provide three elements to aid in this fight: 1) a new framework that collects health-related articles from verification entities and facilitates their check-worthiness and fact-checking annotation at the sentence level; 2) a corpus generated using this framework, composed of 10335 sentences annotated in these two concepts and grouped into 327 articles, which we call KEANE (faKe nEws At seNtence lEvel); and 3) a new model for verifying fake news that combines specific identifiers of the medical domain with triplets subject-predicate-object, using Transformers and feedforward neural networks at the sentence level. This model predicts the fact-checking of sentences and evaluates the veracity of the entire article. After training this model on our corpus, we achieved remarkable results in the binary classification of sentences (check-worthiness F1: 0.749, fact-checking F1: 0.698) and in the final classification of complete articles (F1: 0.703). We also tested its performance against another public dataset and found that it performed better than most systems evaluated on that dataset. Moreover, the corpus we provide differs from other existing corpora in its duality of sentence-article annotation, which can provide an additional level of justification of the prediction of truth or untruth made by the model.
As per our latest research, the global Temperature Measuring and Face Recognition Security Check Gate market size reached USD 2.36 billion in 2024, reflecting the rapid adoption of advanced security and health screening technologies across various sectors. The market is expected to grow at a robust CAGR of 13.7% from 2025 to 2033, with the market size projected to reach USD 7.46 billion by the end of 2033. This growth is driven by increasing security concerns, the need for efficient access control solutions, and heightened awareness regarding public health safety in the wake of global pandemics.
The primary growth factor for the Temperature Measuring and Face Recognition Security Check Gate market is the heightened global emphasis on public safety and health monitoring. In the aftermath of the COVID-19 pandemic, organizations and governments worldwide have prioritized the implementation of touchless, automated screening systems to minimize human contact while maintaining stringent security protocols. These gates, equipped with advanced temperature measurement and face recognition capabilities, enable rapid, non-intrusive identification and health screening, making them indispensable in high-traffic environments such as airports, hospitals, and corporate offices. Additionally, the integration of artificial intelligence and IoT technologies has significantly enhanced the accuracy and efficiency of these systems, further propelling market demand.
Another significant driver is the increasing adoption of smart infrastructure and the digital transformation of security systems across both public and private sectors. As urbanization accelerates and smart cities initiatives gain momentum, the deployment of intelligent security check gates has become a critical component of modern infrastructure. These systems not only streamline access control but also provide real-time data analytics, enabling proactive threat detection and efficient incident response. The growing trend of integrating temperature measuring and face recognition technologies into existing security frameworks is also fostering market expansion, as organizations seek to future-proof their security investments and ensure regulatory compliance.
Furthermore, the proliferation of advanced technologies such as artificial intelligence, machine learning, and thermal imaging is reshaping the competitive landscape of the Temperature Measuring and Face Recognition Security Check Gate market. The development of more sophisticated, AI-based solutions has resulted in enhanced facial recognition accuracy, even in challenging conditions such as mask-wearing or low-light environments. This technological evolution is driving the adoption of these gates across diverse applications, from transportation hubs and educational institutions to government buildings and commercial complexes. The continuous innovation in sensor technologies and the integration of multi-modal biometric authentication are expected to open new avenues for market growth over the forecast period.
Regionally, Asia Pacific dominates the market, accounting for the largest share in 2024, followed by North America and Europe. The rapid urbanization, substantial investments in smart city projects, and stringent regulatory frameworks supporting public safety initiatives have fueled the widespread adoption of temperature measuring and face recognition security check gates across major economies such as China, Japan, and India. In North America, the focus on infrastructure modernization and the presence of leading technology providers have accelerated market growth, while Europe’s emphasis on privacy and data security is shaping the evolution of regulatory-compliant solutions. Emerging markets in Latin America and the Middle East & Africa are also witnessing increased adoption, driven by government-led security modernization programs and rising awareness regarding health and safety protocols.
This dataset contains the estimated percentages of eligible England resident persons (age 40-74 years old) who were invited and received the National Health Service (NHS) Health Check, by type of liver disease England regions, counties and unitary authorities, and the level of multiple deprivations. Comparisons to England and region level, are also available in the dataset.
https://www.promarketreports.com/privacy-policyhttps://www.promarketreports.com/privacy-policy
The global market for Temperature Measuring and Face Recognition Security Check Gates is experiencing robust growth, driven by increasing concerns over public health and security. The market, estimated at $2.5 billion in 2025, is projected to exhibit a Compound Annual Growth Rate (CAGR) of 15% from 2025 to 2033. This significant expansion is fueled by several key factors. Firstly, the heightened awareness of infectious diseases, amplified by recent pandemics, has significantly increased demand for contactless screening solutions. Secondly, the rising adoption of smart city initiatives and improved security infrastructure across various sectors, including airports, schools, and corporate offices, is further bolstering market growth. Thirdly, advancements in facial recognition technology, coupled with improved temperature sensing accuracy and integration with existing security systems, are making these gates more efficient and cost-effective. Finally, government regulations and mandates promoting public safety are creating a compelling regulatory environment that supports market expansion. Despite the promising growth trajectory, certain challenges remain. High initial investment costs for implementing these advanced security gates can be a barrier to entry for smaller organizations. Furthermore, concerns regarding data privacy and the potential for misuse of facial recognition data need careful consideration and robust regulatory frameworks. However, the overall market outlook remains positive, driven by the overwhelming benefits of improved security and public health protection. The market segmentation by gate type (Three Roller Gate, Swing Gate, Wing Gate, Translation Gate, Turnstile) and application (City Security, Sign-in Attendance, Quarantine and Epidemic Prevention, Other) indicates diverse opportunities for vendors specializing in different technologies and market segments. The competitive landscape is fairly dynamic, with several established players and emerging companies vying for market share. Growth is expected to be particularly strong in regions with high population density and developing economies, where the need for enhanced security measures is most pronounced.
This is an abstract and brief overview of a test public dataset.