Facebook
TwitterDisbursement Request Form Moto GP 2023
Facebook
TwitterIn order to request access to this data please complete the data request form.* * University of Bristol staff should use this form instead. The ASK feasibility trial: a randomised controlled feasibility trial and process evaluation of a complex multicomponent intervention to improve AccesS to living-donor Kidney transplantation This trial was a two-arm, parallel group, pragmatic, individually-randomised, controlled, feasibility trial, comparing usual care with a multicomponent intervention to increase access to living-donor kidney transplantation. The trial was based at two UK hospitals: a transplanting hospital and a non-transplanting referral hospital. 62 participants were recruited. 60 participants consented to data sharing, and their trial data is available here. 2 participants did not consent to data sharing and their data is not available. This project contains: 1. The ASK feasibility trial dataset 2. The trial questionnaire 3. An example consent form 4. Trial information sheet This dataset is part of a series: ASK feasibility trial documents: https://doi.org/10.5523/bris.1u5ooi0iqmb5c26zwim8l7e8rm The ASK feasibility trial: CONSORT documents: https://doi.org/10.5523/bris.2iq6jzfkl6e1x2j1qgfbd2kkbb The ASK feasibility trial: Wellcome Open Research CONSORT checklist: https://doi.org/10.5523/bris.1m3uhbdfdrykh27iij5xck41le The ASK feasibility trial: qualitative data: https://doi.org/10.5523/bris.1qm9yblprxuj2qh3o0a2yylgg
Facebook
Twitterhttps://www.caida.org/about/legal/aua/https://www.caida.org/about/legal/aua/
https://www.gnu.org/licenses/gpl-3.0.htmlhttps://www.gnu.org/licenses/gpl-3.0.html
CAIDA's Spoofer project provides information about deployed Source Address Validation (SAV) policy of ASes in the Internet. The Spoofer API is public data; The restricted dataset includes information that we do not provide through the public API, including the results of traceroute and tracespoof measurements. The dataset is provided in database format.
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
This dataset contains a collection of over 2,000 company documents, categorized into four main types: invoices, inventory reports, purchase orders, and shipping orders. Each document is provided in PDF format, accompanied by a CSV file that includes the text extracted from these documents, their respective labels, and the word count of each document. This dataset is ideal for various natural language processing (NLP) tasks, including text classification, information extraction, and document clustering.
PDF Documents: The dataset includes 2,677 PDF files, each representing a unique company document. These documents are derived from the Northwind dataset, which is commonly used for demonstrating database functionalities.
The document types are:
Here are a few example entries from the CSV file:
This dataset can be used for:
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Introduction
The 802.11 standard includes several management features and corresponding frame types. One of them are probe requests (PR). They are sent by mobile devices in the unassociated state to search the nearby area for existing wireless networks. The frame part of PRs consists of variable length fields called information elements (IE). IE fields represent the capabilities of a mobile device, such as data rates.
The dataset includes PRs collected in a controlled rural environment and in a semi-controlled indoor environment under different measurement scenarios.
It can be used for various use cases, e.g., analysing MAC randomization, determining the number of people in a given location at a given time or in different time periods, analysing trends in population movement (streets, shopping malls, etc.) in different time periods, etc.
Measurement setup
The system for collecting PRs consists of a Raspberry Pi 4 (RPi) with an additional WiFi dongle to capture Wi-Fi signal traffic in monitoring mode. Passive PR monitoring is performed by listening to 802.11 traffic and filtering out PR packets on a single WiFi channel.
The following information about each PR received is collected: MAC address, Supported data rates, extended supported rates, HT capabilities, extended capabilities, data under extended tag and vendor specific tag, interworking, VHT capabilities, RSSI, SSID and timestamp when PR was received.
The collected data was forwarded to a remote database via a secure VPN connection. A Python script was written using the Pyshark package for data collection, preprocessing and transmission.
Data preprocessing
The gateway collects PRs for each consecutive predefined scan interval (10 seconds). During this time interval, the data are preprocessed before being transmitted to the database.
For each detected PR in the scan interval, IEs fields are saved in the following JSON structure:
PR_IE_data =
{
'DATA_RTS': {'SUPP': DATA_supp , 'EXT': DATA_ext},
'HT_CAP': DATA_htcap,
'EXT_CAP': {'length': DATA_len, 'data': DATA_extcap},
'VHT_CAP': DATA_vhtcap,
'INTERWORKING': DATA_inter,
'EXT_TAG': {'ID_1': DATA_1_ext, 'ID_2': DATA_2_ext ...},
'VENDOR_SPEC': {VENDOR_1:{
'ID_1': DATA_1_vendor1,
'ID_2': DATA_2_vendor1
...},
VENDOR_2:{
'ID_1': DATA_1_vendor2,
'ID_2': DATA_2_vendor2
...}
...}
}
Supported data rates and extended supported rates are represented as arrays of values that encode information about the rates supported by a mobile device. The rest of the IEs data is represented in hexadecimal format. Vendor Specific Tag is structured differently than the other IEs. This field can contain multiple vendor IDs with multiple data IDs with corresponding data. Similarly, the extended tag can contain multiple data IDs with corresponding data.
Missing IE fields in the captured PR are not included in PR_IE_DATA.
When a new MAC address is detected in the current scan time interval, the data from PR is stored in the following structure:
{'MAC': MAC_address, 'SSIDs': [ SSID ], 'PROBE_REQs': [PR_data] },
where PR_data is structured as follows:
{
'TIME': [ DATA_time ],
'RSSI': [ DATA_rssi ],
'DATA': PR_IE_data
}.
This data structure allows storing only TOA and RSSI for all PRs originating from the same MAC address and containing the same PR_IE_data. All SSIDs from the same MAC address are also stored.
The data of the newly detected PR is compared with the already stored data of the same MAC in the current scan time interval.
If identical PR's IE data from the same MAC address is already stored, then only data for the keys TIME and RSSI are appended.
If no identical PR's IE data has yet been received from the same MAC address, then PR_data structure of the new PR for that MAC address is appended to PROBE_REQs key.
The preprocessing procedure is shown in Figure ./Figures/Preprocessing_procedure.png
At the end of each scan time interval, all processed data is sent to the database along with additional metadata about the collected data e.g. wireless gateway serial number and scan start and end timestamps. For an example of a single PR captured, see the ./Single_PR_capture_example.json file.
Environments description
We performed measurements in a controlled rural outdoor environment and in a semi-controlled indoor environment of the Jozef Stefan Institute.
See the Excel spreadsheet Measurement_informations.xlsx for a list of mobile devices tested.
Indoor environment
We used 3 RPi's for the acquisition of PRs in the Jozef Stefan Institute. They were placed indoors in the hallways as shown in the ./Figures/RPi_locations_JSI.png. Measurements were performed on weekend to minimize additional uncontrolled traffic from users' mobile devices. While there is some overlap in WiFi coverage between the devices at the location 2 and 3, the device at location 1 has no overlap with the other two devices.
Rural environment outdoors
The three RPi's used to collect PRs were placed at three different locations with non-overlapping WiFi coverage, as shown in ./Figures/RPi_locations_rural_env.png. Before starting the measurement campaign, all measured devices were turned off and the environment was checked for active WiFi devices. We did not detect any unknown active devices sending WiFi packets in the RPi's coverage area, so the deployment can be considered fully controlled.
All known WiFi enabled devices that were used to collect and send data to the database used a global MAC address, so they can be easily excluded in the preprocessing phase. MAC addresses of these devices can be found in the ./Measurement_informations.xlsx spreadsheet.
Note: The Huawei P20 device with ID 4.3 was not included in the test in this environment.
Scenarios description
We performed three different scenarios of measurements.
Individual device measurements
For each device, we collected PRs for one minute with the screen on, followed by PRs collected for one minute with the screen off. In the indoor environment the WiFi interfaces of the other devices not being tested were disabled. In rural environment other devices were turned off. Start and end timestamps of the recorded data for each device can be found in the ./Measurement_informations.xlsx spreadsheet under the Indoor environment of Jozef Stefan Institute sheet and the Rural environment sheet.
Three groups test
In this measurement scenario, the devices were divided into three groups. The first group contained devices from different manufacturers. The second group contained devices from only one manufacturer (Samsung). Half of the third group consisted of devices from the same manufacturer (Huawei), and the other half of devices from different manufacturers. The distribution of devices among the groups can be found in the ./Measurement_informations.xlsx spreadsheet.
The same data collection procedure was used for all three groups. Data for each group were collected in both environments at three different RPis locations, as shown in ./Figures/RPi_locations_JSI.png and ./Figures/RPi_locations_rural_env.png.
At each location, PRs were collected from each group for 10 minutes with the screen on. Then all three groups switched locations and the process was repeated. Thus, the dataset contains measurements from all three RPi locations of all three groups of devices in both measurement environments. The group movements and the timestamps for the start and end of the collection of PRs at each loacation can be found in spreadsheet ./Measurement_informations.xlsx.
One group test
In the last measurement scenario, all devices were grouped together. In rural evironement we first collected PRs for 10 minutes while the screen was on, and then for another 10 minutes while the screen was off. In indoor environment data were collected at first location with screens on for 10 minutes. Then all devices were moved to the location of the next RPi and PRs were collected for 5 minutes with the screen on and then for another 5 minutes with the screen off.
Folder structure
The root directory contains two files in JSON format for each of the environments where the measurements took place (Data_indoor_environment.json and Data_rural_environment.json). Both files contain collected PRs for the entire day that the measurements were taken (12:00 AM to 12:00 PM) to get a sense of the behaviour of the unknown devices in each environment. The spreadsheet ./Measurement_informations.xlsx. contains three sheets. Devices description contains general information about the tested devices, RPis, and the assigned group for each device. The sheets Indoor environment of Jozef Stefan Institute and Rural environment contain the corresponding timestamps for the start and end of each measurement scenario. For the scenario where the devices were divided into groups, additional information about the movements between locations is included. The location names are based on the RPi gateway ID and may differ from those on the figures showing the
Facebook
TwitterThis dataset was collected as part of the U.S. Department of Transportation (U.S. DOT) Intersection Safety Challenge (hereafter, “the Challenge”) for Stage 1B: System Assessment and Virtual Testing. Multi-sensor data were collected at a controlled test roadway intersection the Federal Highway Administration (FHWA) Turner-Fairbank Highway Research Center (TFHRC) Smart Intersection facility in McLean, VA from October 2023 through March 2024. The data include potential conflict-based and non-conflict-based experimental scenarios between vulnerable road users (e.g., pedestrians, bicyclists) and vehicles during both daytime and nighttime conditions. Note that no actual human vulnerable road users were put at risk of being involved in a collision during the data collection efforts. The provided data (hereafter, “the Challenge Dataset”) are unlabeled training data (without ground truth) that were collected to be used for intersection safety system algorithm training, refinement, tuning, and/or validation, but may have additional uses. For a summary of the Stage 1B data collection effort, please see this video: https://youtu.be/csirVHFa2Cc. The Challenge Dataset includes data at a single, signalized four-way intersection from 20 roadside sensors and traffic control devices, including eight closed-circuit television (CCTV) visual cameras, five thermal cameras, two light detection and ranging (LiDAR) sensors, and four radar sensors. Intrinsic calibration was performed for all visual and thermal cameras. Extrinsic calibration was performed for specific pairs of roadside sensors. Additionally, the traffic signal phase and timing data and vehicle and/or pedestrian calls to the traffic signal controller (if any) are also provided. The total number of unique runs in the Challenge Dataset is 1,104, bringing the total size of the dataset to approximately 1 TB. A sample of 20 unique runs from the Challenge Dataset is provided here for download, inspection, and use. If, after inspecting this sample, a potential data user would like access to download the full Challenge Dataset, a request can be made via the form here: https://its.dot.gov/data/data-request For more details about the data collection, supplemental files, organization and dictionary, and sensor calibration, see the attached “U.S. DOT ISC Stage 1B ITS DataHub Metadata_v1.0.pdf” document. For more information on the background of the Intersection Safety Challenge Stage 1B, please visit: https://www.its.dot.gov/research-areas/Intersection-Safety-Challenge/.
Facebook
TwittereBird is among the world’s largest biodiversity-related science projects, with more than 100 million bird sightings contributed annually by eBirders around the world and an average participation growth rate of approximately 20% year over year. eBird is managed by the Cornell Lab of Ornithology. Some data has been contributed under INTAROS WP4. eBird provides open data access in several formats to logged-in users, ranging from raw data to processed datasets geared toward more rigorous scientific modeling. eBird Basic Dataset (EBD) The EBD is the core dataset for accessing all raw eBird observations and associated metadata. The EBD is updated monthly (15th of each month), and is available by direct download through eBird to any logged-in user after completion of a data request form. The data request form allows us to gain some understanding of how the data will be used. Requests are typically approved within 7 days. Data are provided with documentation in spreadsheet format, which can be read by a variety of programs. Although Excel or similar programs work for basic analyses, for larger datasets (>1 million rows) or more sophisticated analyses, we recommend using programs like R. There are several R packages available for summarizing data, including one that is managed here at the Cornell Lab specifically for working with the EBD dataset. The data collection may enable a better understanding of bird population dynamics and the status of bird species including bird conservation management requirements.
Facebook
TwitterOpen Government Licence - Canada 2.0https://open.canada.ca/en/open-government-licence-canada
License information was derived automatically
Multi-Agency Ground Plot (MAGPlot) database (DB) is a pan-Canadian forest ground-plot data repository. The database synthesize forest ground plot data from various agencies, including the National Forest Inventory (NFI) and 12 Canadian jurisdictions: Alberta (AB), British Columbia (BC), Manitoba (MB), New Brunswick (NB), Newfoundland and Labrador (NL), Nova Scotia (NS), Northwest Territories (NT), Ontario (ON), Prince Edward Island (PE), Quebec (QC), Saskatchewan (SK), and Yukon Territory (YT), contributed in their original format. These datasets underwent data cleaning and quality assessment using the set of rules and standards set by the contributors and associated documentations, and were standardized, harmonized, and integrated into a single, centralized, and analysis-ready database. The primary objective of the MAGPlot project is to collate and harmonize forest ground plot data and to present the data in a findable, accessible, interoperable, and reusable (FAIR) format for pan-Canadian forest research. The current version includes both historical and contemporary forest ground plot data provided by data contributors. The standardized and harmonized dataset includes eight data tables (five site related and three tree measurement tables) in a relational database schema. Site-related tables contain information on geographical locations, treatments (e.g. stand tending, regeneration, and cutting), and disturbances caused by abiotic factors (e.g., weather, wildfires) or biotic factors (e.g., disease, insects, animals). Tree-related tables, on the other hand, focus on measured tree attributes, including biophysical and growth parameters (e.g., DBH, height, crown class), species, status, stem conditions (e.g., broken or dead tops), and health conditions. While most contributors provided large and small tree plot measurements, only NFI, AB, MB, and SK contributed datasets reported at regeneration plot level (e.g., stem count, regeneration species). Future versions are expected to include updated and/or new measurement records as well as additional tables and measured and compiled (e.g., tree volume and biomass) attributes. MAGPlot is hosted through Canada’s National Forest Information System (https://nfi.nfis.org/en/maps). --------------------------------------------------- LATEST SITE TREATMENTS LAYER: --------------------------------------------------- Shows the most recently applied treatment class for each MAGPlot site. These treatment classes are broad categories, with more specific treatment details available in the full dataset. ----------- NOTES: ----------- The MAGPlot release (v1.0 and v1.1) does not include NL and SK datasets due to pending Data Sharing Agreements, ongoing data processing, or restrictions on third-party sharing. These datasets will be included in future releases. While certain jurisdictions permit open or public data sharing, given that requestor signs and adheres the Data Use agreement, there are some jurisdictions that require a jurisdiction-specific request form to be signed in addition to the Data Use Agreement form. For the MAGPlot Data Dictionary, other metadata, datasets available for open sharing (with approximate locations), data requests (for other datasets or exact coordinates), and available data visualization products, please check all the folders in the “Data and Resources” section below. Coordinates in web services have been randomized within 5km of true location to preserve site integrity Access the WMS (Web Map Service) layers from the “Data and Resources” section below. A data request must be submitted to access historical datasets, datasets restricted by data-use agreements, or exact plot coordinates using the link below. NFI Data Request Form: https://nfi.nfis.org/en/datarequestform --------------------------------- ACKNOWLEDGEMENT: --------------------------------- We acknowledge and recognize the following agencies that have contributed data to the MAGPlot database: Government of Alberta - Ministry of Agriculture, Forestry, and Rural Economic Development - Forest Stewardship and Trade Branch Government of British Columbia - Ministry of Forests - Forest Analysis and Inventory Branch Government of Manitoba - Ministry of Economic, Development, Investment, Trade, and Natural Resources - Forestry and Peatlands Branch Government of New Brunswick - Ministry of Natural Resources and Energy Development - Forestry Division, Forest Planning and Stewardship Branch Government of Newfoundland & Labrador - Department of Fisheries, Forestry and Agriculture - Forestry Branch Government of Nova Scotia - Ministry of Natural Resources and Renewables - Department of Natural Resources and Renewables Government of Northwest Territories - Department of Environment & Climate Change - Forest Management Division Government of Ontario - Ministry of Natural Resources and Forestry - Science and Research Branch, Forest Resources Inventory Unit Government of Prince Edward Island - Department of Environment, Energy, and Climate Action - Forests, Fish, and Wildlife Division Government of Quebec - Ministry of Natural Resources and Forests - Forestry Sector Government of Saskatchewan - Ministry of Environment - Forest Service Branch Government of Yukon - Ministry of Energy, Mines, and Resources - Forest Management Branch Government of Canada - Natural Resources Canada - Canadian Forest Service - National Forest Inventory Projects Office
Facebook
TwitterIn 2014, BWSR received a grant from LCCMR (Legislative-Citizen Commission on Minnesota Resources) to produce a geospatial database template (i.e. with empty feature classes and tables) designed to contain Minnesota Statute 103E public drainage system data from local drainage authorities (e.g. Counties, Watershed Districts). The template is intended to help drainage authorities modernize and better manage their drainage system records. In addition, the template puts the data into a consistent, standardized form that makes is more readily accessible to users such as hydrologists and water managers.
However, as a result of a stipulation for receiving the grant, BWSR requires that those drainage authorities that use the template make their hydrographic data (ditch/tile centerlines, drainage structures, profile points and watershed boundaries) available annually to the public via the Geospatial Commons. The button below links to the Template Request Form that must be filled out and signed by the proper drainage authority personnel and then submitted to BWSR. Once authorized, the drainage authority will then receive the template along with its metadata and user instructions.
For guidance on how to use the template to modernize your drainage records please watch the DRM Template User Webinar (http://www.bwsr.state.mn.us/drainage/2016-12-19_DRM_Template_User_Webinar.mp4). Also, for step-by-step instructions please see the recently updated Drainage Records Modernization Guidelines (http://www.bwsr.state.mn.us/drainage/drainage_records_guidelines.pdf).
For more information on the project itself see the Drainage Records Modernization and GIS Database (DRMGD) Project section on http://www.bwsr.state.mn.us/drainage. Also, to see an interactive web map of example drainage records go to http://arcg.is/2dFK45N
Facebook
TwitterIn order to request access to this data please complete the data request form.* * University of Bristol staff should use this form instead. The PHIRST research team has worked in partnership with Hammersmith and Fulham colleagues from public health, children and adult services, to create an evaluation study that takes into account the priorities and concerns of all interested parties within the borough. It focuses on the following research questions: 1) Is UFSM feasible in secondary schools? 2) What is the impact of UFSM on student hunger, school attendance and behaviour, and food that is eaten in school? 3) What is the impact of UFSM on family finance and food security? 4) What do students, carers and school staff see as the reasons UFSM leads to these outcomes? 5) What are the things that help or prevent UFSM being delivered effectively in secondary schools? 6) Could UFSM in secondary schools be a cost-effective approach to addressing student hunger? We interviewed students, parents/carers, school staff and catering staff from the two schools receiving UFSM, and senior leaders in eight other secondary schools, ii) ran student surveys in the two UFSM schools and in two comparison schools, and iii) looked at information about student attendance, academic work and behaviour collected by the local authority and by schools before and after UFSM was introduced. We also worked with a group of student co-researchers in both UFSM schools. They advised on the content and format of our interviews and survey and helped us to plan observations of their school lunch times. These students did the observations themselves and shared their findings with the study team.
Facebook
TwitterThe USDA Agricultural Research Service (ARS) recently established SCINet , which consists of a shared high performance computing resource, Ceres, and the dedicated high-speed Internet2 network used to access Ceres. Current and potential SCINet users are using and generating very large datasets so SCINet needs to be provisioned with adequate data storage for their active computing. It is not designed to hold data beyond active research phases. At the same time, the National Agricultural Library has been developing the Ag Data Commons, a research data catalog and repository designed for public data release and professional data curation. Ag Data Commons needs to anticipate the size and nature of data it will be tasked with handling. The ARS Web-enabled Databases Working Group, organized under the SCINet initiative, conducted a study to establish baseline data storage needs and practices, and to make projections that could inform future infrastructure design, purchases, and policies. The SCINet Web-enabled Databases Working Group helped develop the survey which is the basis for an internal report. While the report was for internal use, the survey and resulting data may be generally useful and are being released publicly. From October 24 to November 8, 2016 we administered a 17-question survey (Appendix A) by emailing a Survey Monkey link to all ARS Research Leaders, intending to cover data storage needs of all 1,675 SY (Category 1 and Category 4) scientists. We designed the survey to accommodate either individual researcher responses or group responses. Research Leaders could decide, based on their unit's practices or their management preferences, whether to delegate response to a data management expert in their unit, to all members of their unit, or to themselves collate responses from their unit before reporting in the survey. Larger storage ranges cover vastly different amounts of data so the implications here could be significant depending on whether the true amount is at the lower or higher end of the range. Therefore, we requested more detail from "Big Data users," those 47 respondents who indicated they had more than 10 to 100 TB or over 100 TB total current data (Q5). All other respondents are called "Small Data users." Because not all of these follow-up requests were successful, we used actual follow-up responses to estimate likely responses for those who did not respond. We defined active data as data that would be used within the next six months. All other data would be considered inactive, or archival. To calculate per person storage needs we used the high end of the reported range divided by 1 for an individual response, or by G, the number of individuals in a group response. For Big Data users we used the actual reported values or estimated likely values. Resources in this dataset:Resource Title: Appendix A: ARS data storage survey questions. File Name: Appendix A.pdfResource Description: The full list of questions asked with the possible responses. The survey was not administered using this PDF but the PDF was generated directly from the administered survey using the Print option under Design Survey. Asterisked questions were required. A list of Research Units and their associated codes was provided in a drop down not shown here. Resource Software Recommended: Adobe Acrobat,url: https://get.adobe.com/reader/ Resource Title: CSV of Responses from ARS Researcher Data Storage Survey. File Name: Machine-readable survey response data.csvResource Description: CSV file includes raw responses from the administered survey, as downloaded unfiltered from Survey Monkey, including incomplete responses. Also includes additional classification and calculations to support analysis. Individual email addresses and IP addresses have been removed. This information is that same data as in the Excel spreadsheet (also provided).Resource Title: Responses from ARS Researcher Data Storage Survey. File Name: Data Storage Survey Data for public release.xlsxResource Description: MS Excel worksheet that Includes raw responses from the administered survey, as downloaded unfiltered from Survey Monkey, including incomplete responses. Also includes additional classification and calculations to support analysis. Individual email addresses and IP addresses have been removed.Resource Software Recommended: Microsoft Excel,url: https://products.office.com/en-us/excel
Facebook
TwitterThere is a requirement that public authorities, like Ofsted, must publish updated versions of datasets which are disclosed as a result of Freedom of Information requests.
Some information which is requested is exempt from disclosure to the public under the Freedom of Information Act; it is therefore not appropriate for this information to be made available. Examples of information which it is not appropriate to make available includes the locations of women’s refuges, some military bases and all children’s homes and the personal data of providers and staff. Ofsted also considers that the names and addresses of registered childminders are their personal data which it is not appropriate to make publicly available unless those individuals have given their explicit consent to do so. This information has therefore not been included in the datasets.
Data for both childcare and childminders are included in the excel file.
<p class="gem-c-attachment_metadata"><span class="gem-c-attachment_attribute">MS Excel Spreadsheet</span>, <span class="gem-c-attachment_attribute">16.6 MB</span></p>
<p class="gem-c-attachment_metadata">This file may not be suitable for users of assistive technology.</p>
<details data-module="ga4-event-tracker" data-ga4-event='{"event_name":"select_content","type":"detail","text":"Request an accessible format.","section":"Request an accessible format.","index_section":1}' class="gem-c-details govuk-details govuk-!-margin-bottom-0" title="Request an accessible format.">
Request an accessible format.
If you use assistive technology (such as a screen reader) and need a version of this document in a more accessible format, please email <a href="mailto:enquiries@ofsted.gov.uk" target="_blank" class="govuk-link">enquiries@ofsted.gov.uk</a>. Please tell us what format you need. It will help us if you say what assistive technology you use.
Facebook
TwitterThere is a requirement that public authorities, like Ofsted, must publish updated versions of datasets that are disclosed as a result of Freedom of Information requests.
Some information which is requested is exempt from disclosure to the public under the Freedom of Information Act; it is therefore not appropriate for this information to be made available. Examples of information which it is not appropriate to make available include the locations of women’s refuges, some military bases and all children’s homes and the personal data of providers and staff. Ofsted also considers that the names and addresses of registered childminders are their personal data, and it is not appropriate to make these publicly available unless those individuals have given their explicit consent to do so. This information has therefore not been included.
This dataset contains information on independent fostering agencies and voluntary adoption agencies in England.
MS Excel Spreadsheet, 200 KB
This file may not be suitable for users of assistive technology.
Request an accessible format.Date of next update: April 2017
Facebook
TwitterAttribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
License information was derived automatically
The Grocery Store Receipts Dataset is a collection of photos captured from various grocery store receipts. This dataset is specifically designed for tasks related to Optical Character Recognition (OCR) and is useful for retail.
Each image in the dataset is accompanied by bounding box annotations, indicating the precise locations of specific text segments on the receipts. The text segments are categorized into four classes: item, store, date_time and total.
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F12421376%2F4d5c600731265119bb28668959d5c357%2FFrame%2016.png?generation=1695111877176656&alt=media" alt="">
Each image from images folder is accompanied by an XML-annotation in the annotations.xml file indicating the coordinates of the bounding boxes and detected text . For each point, the x and y coordinates are provided.
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F12421376%2F62643adde75dd6ca4e3f26909174ae40%2Fcarbon.png?generation=1695112527839805&alt=media" alt="">
🚀 You can learn more about our high-quality unique datasets here
keywords: receipts reading, retail dataset, consumer goods dataset, grocery store dataset, supermarket dataset, deep learning, retail store management, pre-labeled dataset, annotations, text detection, text recognition, optical character recognition, document text recognition, detecting text-lines, object detection, scanned documents, deep-text-recognition, text area detection, text extraction, images dataset, image-to-text, object detection
Facebook
TwitterThere is a requirement that public authorities, like Ofsted, must publish updated versions of datasets which are disclosed as a result of Freedom of Information requests.
Some information which is requested is exempt from disclosure to the public under the Freedom of Information Act; it is therefore not appropriate for this information to be made available. Examples of information which it is not appropriate to make available includes the locations of women’s refuges, some military bases and all children’s homes and the personal data of providers and staff. Ofsted also considers that the names and addresses of registered childminders are their personal data which it is not appropriate to make publicly available unless those individuals have given their explicit consent to do so. This information has therefore not been included in the datasets.
MS Excel Spreadsheet, 297 KB
This file may not be suitable for users of assistive technology.
Request an accessible format.This information will not be updated. The majority of information included in the dataset is published in Ofsted further education and skills official statistics. Provider addresses are published by FE choices.
Facebook
TwitterODC Public Domain Dedication and Licence (PDDL) v1.0http://www.opendatacommons.org/licenses/pddl/1.0/
License information was derived automatically
Making a freedom of information request https://myaccount.northyorks.gov.uk/foi/request If the information you are requesting is not available already, and relates to one of our services, you can make a request for information using our online form. Make an freedom of information request We generally reply to requests by email, but if you would like the information in any specific format let us know when you make your request. We will respond as soon as possible and within 20 working days. After submitting a request Your request will be passed on to the relevant service area, who will be responsible for replying to you. We will always provide as much information as possible, but there are rules that allow us to withhold certain types of information. For example, if providing the information would infringe someone else's privacy or, if the information you have asked for is not environmental, would take longer than 18 hours to acquire. If we are unable to provide any information we will explain why. See additional information on the reasons why we might not be able to provide information. We publish our data so citizens can see how we work and where money is spent. The data is published in an accessible format and can be freely reused in accordance with the open data licence.
Facebook
TwitterOverview
This dataset of medical misinformation was collected and is published by Kempelen Institute of Intelligent Technologies (KInIT). It consists of approx. 317k news articles and blog posts on medical topics published between January 1, 1998 and February 1, 2022 from a total of 207 reliable and unreliable sources. The dataset contains full-texts of the articles, their original source URL and other extracted metadata. If a source has a credibility score available (e.g., from Media Bias/Fact Check), it is also included in the form of annotation. Besides the articles, the dataset contains around 3.5k fact-checks and extracted verified medical claims with their unified veracity ratings published by fact-checking organisations such as Snopes or FullFact. Lastly and most importantly, the dataset contains 573 manually and more than 51k automatically labelled mappings between previously verified claims and the articles; mappings consist of two values: claim presence (i.e., whether a claim is contained in the given article) and article stance (i.e., whether the given article supports or rejects the claim or provides both sides of the argument).
The dataset is primarily intended to be used as a training and evaluation set for machine learning methods for claim presence detection and article stance classification, but it enables a range of other misinformation related tasks, such as misinformation characterisation or analyses of misinformation spreading.
Its novelty and our main contributions lie in (1) focus on medical news article and blog posts as opposed to social media posts or political discussions; (2) providing multiple modalities (beside full-texts of the articles, there are also images and videos), thus enabling research of multimodal approaches; (3) mapping of the articles to the fact-checked claims (with manual as well as predicted labels); (4) providing source credibility labels for 95% of all articles and other potential sources of weak labels that can be mined from the articles' content and metadata.
The dataset is associated with the research paper "Monant Medical Misinformation Dataset: Mapping Articles to Fact-Checked Claims" accepted and presented at ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '22).
The accompanying Github repository provides a small static sample of the dataset and the dataset's descriptive analysis in a form of Jupyter notebooks.
In order to obtain an access to the full dataset (in the CSV format), please, request the access by following the instructions provided below.
Note: Please, check also our MultiClaim Dataset that provides a more recent, a larger, and a highly multilingual dataset of fact-checked claims, social media posts and relations between them.
References
If you use this dataset in any publication, project, tool or in any other form, please, cite the following papers:
@inproceedings{SrbaMonantPlatform,
author = {Srba, Ivan and Moro, Robert and Simko, Jakub and Sevcech, Jakub and Chuda, Daniela and Navrat, Pavol and Bielikova, Maria},
booktitle = {Proceedings of Workshop on Reducing Online Misinformation Exposure (ROME 2019)},
pages = {1--7},
title = {Monant: Universal and Extensible Platform for Monitoring, Detection and Mitigation of Antisocial Behavior},
year = {2019}
}
@inproceedings{SrbaMonantMedicalDataset,
author = {Srba, Ivan and Pecher, Branislav and Tomlein Matus and Moro, Robert and Stefancova, Elena and Simko, Jakub and Bielikova, Maria},
booktitle = {Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '22)},
numpages = {11},
title = {Monant Medical Misinformation Dataset: Mapping Articles to Fact-Checked Claims},
year = {2022},
doi = {10.1145/3477495.3531726},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
url = {https://doi.org/10.1145/3477495.3531726},
}
Dataset creation process
In order to create this dataset (and to continuously obtain new data), we used our research platform Monant. The Monant platform provides so called data providers to extract news articles/blogs from news/blog sites as well as fact-checking articles from fact-checking sites. General parsers (from RSS feeds, Wordpress sites, Google Fact Check Tool, etc.) as well as custom crawler and parsers were implemented (e.g., for fact checking site Snopes.com). All data is stored in the unified format in a central data storage.
Ethical considerations
The dataset was collected and is published for research purposes only. We collected only publicly available content of news/blog articles. The dataset contains identities of authors of the articles if they were stated in the original source; we left this information, since the presence of an author's name can be a strong credibility indicator. However, we anonymised the identities of the authors of discussion posts included in the dataset.
The main identified ethical issue related to the presented dataset lies in the risk of mislabelling of an article as supporting a false fact-checked claim and, to a lesser extent, in mislabelling an article as not containing a false claim or not supporting it when it actually does. To minimise these risks, we developed a labelling methodology and require an agreement of at least two independent annotators to assign a claim presence or article stance label to an article. It is also worth noting that we do not label an article as a whole as false or true. Nevertheless, we provide partial article-claim pair veracities based on the combination of claim presence and article stance labels.
As to the veracity labels of the fact-checked claims and the credibility (reliability) labels of the articles' sources, we take these from the fact-checking sites and external listings such as Media Bias/Fact Check as they are and refer to their methodologies for more details on how they were established.
Lastly, the dataset also contains automatically predicted labels of claim presence and article stance using our baselines described in the next section. These methods have their limitations and work with certain accuracy as reported in this paper. This should be taken into account when interpreting them.
Reporting mistakes in the dataset
The mean to report considerable mistakes in raw collected data or in manual annotations is by creating a new issue in the accompanying Github repository. Alternately, general enquiries or requests can be sent at info [at] kinit.sk.
Dataset structure
Raw data
At first, the dataset contains so called raw data (i.e., data extracted by the Web monitoring module of Monant platform and stored in exactly the same form as they appear at the original websites). Raw data consist of articles from news sites and blogs (e.g. naturalnews.com), discussions attached to such articles, fact-checking articles from fact-checking portals (e.g. snopes.com). In addition, the dataset contains feedback (number of likes, shares, comments) provided by user on social network Facebook which is regularly extracted for all news/blogs articles.
Raw data are contained in these CSV files:
Note: Personal information about discussion posts' authors (name, website, gravatar) are anonymised.
Annotations
Secondly, the dataset contains so called annotations. Entity annotations describe the individual raw data entities (e.g., article, source). Relation annotations describe relation between two of such entities.
Each annotation is described by the following attributes:
At the same time, annotations are associated with a particular object identified by:
The dataset provides specifically these entity
Facebook
TwitterAI Training Data | Annotated Checkout Flows for Retail, Restaurant, and Marketplace Websites Overview
Unlock the next generation of agentic commerce and automated shopping experiences with this comprehensive dataset of meticulously annotated checkout flows, sourced directly from leading retail, restaurant, and marketplace websites. Designed for developers, researchers, and AI labs building large language models (LLMs) and agentic systems capable of online purchasing, this dataset captures the real-world complexity of digital transactions—from cart initiation to final payment.
Key Features
Breadth of Coverage: Over 10,000 unique checkout journeys across hundreds of top e-commerce, food delivery, and service platforms, including but not limited to Walmart, Target, Kroger, Whole Foods, Uber Eats, Instacart, Shopify-powered sites, and more.
Actionable Annotation: Every flow is broken down into granular, step-by-step actions, complete with timestamped events, UI context, form field details, validation logic, and response feedback. Each step includes:
Page state (URL, DOM snapshot, and metadata)
User actions (clicks, taps, text input, dropdown selection, checkbox/radio interactions)
System responses (AJAX calls, error/success messages, cart/price updates)
Authentication and account linking steps where applicable
Payment entry (card, wallet, alternative methods)
Order review and confirmation
Multi-Vertical, Real-World Data: Flows sourced from a wide variety of verticals and real consumer environments, not just demo stores or test accounts. Includes complex cases such as multi-item carts, promo codes, loyalty integration, and split payments.
Structured for Machine Learning: Delivered in standard formats (JSONL, CSV, or your preferred schema), with every event mapped to action types, page features, and expected outcomes. Optional HAR files and raw network request logs provide an extra layer of technical fidelity for action modeling and RLHF pipelines.
Rich Context for LLMs and Agents: Every annotation includes both human-readable and model-consumable descriptions:
“What the user did” (natural language)
“What the system did in response”
“What a successful action should look like”
Error/edge case coverage (invalid forms, OOS, address/payment errors)
Privacy-Safe & Compliant: All flows are depersonalized and scrubbed of PII. Sensitive fields (like credit card numbers, user addresses, and login credentials) are replaced with realistic but synthetic data, ensuring compliance with privacy regulations.
Each flow tracks the user journey from cart to payment to confirmation, including:
Adding/removing items
Applying coupons or promo codes
Selecting shipping/delivery options
Account creation, login, or guest checkout
Inputting payment details (card, wallet, Buy Now Pay Later)
Handling validation errors or OOS scenarios
Order review and final placement
Confirmation page capture (including order summary details)
Why This Dataset?
Building LLMs, agentic shopping bots, or e-commerce automation tools demands more than just page screenshots or API logs. You need deeply contextualized, action-oriented data that reflects how real users interact with the complex, ever-changing UIs of digital commerce. Our dataset uniquely captures:
The full intent-action-outcome loop
Dynamic UI changes, modals, validation, and error handling
Nuances of cart modification, bundle pricing, delivery constraints, and multi-vendor checkouts
Mobile vs. desktop variations
Diverse merchant tech stacks (custom, Shopify, Magento, BigCommerce, native apps, etc.)
Use Cases
LLM Fine-Tuning: Teach models to reason through step-by-step transaction flows, infer next-best-actions, and generate robust, context-sensitive prompts for real-world ordering.
Agentic Shopping Bots: Train agents to navigate web/mobile checkouts autonomously, handle edge cases, and complete real purchases on behalf of users.
Action Model & RLHF Training: Provide reinforcement learning pipelines with ground truth “what happens if I do X?” data across hundreds of real merchants.
UI/UX Research & Synthetic User Studies: Identify friction points, bottlenecks, and drop-offs in modern checkout design by replaying flows and testing interventions.
Automated QA & Regression Testing: Use realistic flows as test cases for new features or third-party integrations.
What’s Included
10,000+ annotated checkout flows (retail, restaurant, marketplace)
Step-by-step event logs with metadata, DOM, and network context
Natural language explanations for each step and transition
All flows are depersonalized and privacy-compliant
Example scripts for ingesting, parsing, and analyzing the dataset
Flexible licensing for research or commercial use
Sample Categories Covered
Grocery delivery (Instacart, Walmart, Kroger, Target, etc.)
Restaurant takeout/delivery (Ub...
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
The WZDx Specification enables infrastructure owners and operators (IOOs) to make harmonized work zone data available for third party use. The intent is to make travel on public roads safer and more efficient through ubiquitous access to data on work zone activity. Specifically, the project aims to get data on work zones into vehicles to help automated driving systems (ADS) and human drivers navigate more safely.
MCDOT leads the effort to aggregate and collect work zone data from the AZTech Regional Partners. The live feed is currently compliant with WZDx specification version 3.0.
The ITS JPO has collections from 23 states including Arizona covering various parts of the time period from 10/2019 to 08/2024 depending on when the feed was active. The data is split into two archive files, the raw data contains the collection of .json or .geojson files exactly as they were on the individual state’s WZDx feed at the time of collection. The processed data is organized by work zone, so that as information about the work zone changed through feed updates they would be collected in a single file for that work zone. To request access fill out the form here.
Facebook
TwitterGround benchmark datasets are issued annually in the standard file formats Text (CSV) and XML in relation to EPSG code 25833. Depending on the file format, ground benchmark data sets are provided in full for the areas of competence of the expert committees and for the State of Brandenburg in a zipped file with a statistical indication and a description of the elements. The CSV file is based on VBORIS2. A key bridge to the old format can be extracted from the data. On request, soil benchmark datasets for municipal areas can be cut out or provided in shape format. Furthermore, the delivery of soil benchmarks in the form of web-based geoservices is possible. Ground benchmark datasets are issued annually in the standard file formats Text (CSV) and XML in relation to EPSG code 25833. Depending on the file format, ground benchmark data sets are provided in full for the areas of competence of the expert committees and for the State of Brandenburg in a zipped file with a statistical indication and a description of the elements. The CSV file is based on VBORIS2. A key bridge to the old format can be extracted from the data. On request, soil benchmark datasets for municipal areas can be cut out or provided in shape format. Furthermore, the delivery of soil benchmarks in the form of web-based geoservices is possible. Ground benchmark datasets are issued annually in the standard file formats Text (CSV) and XML in relation to EPSG code 25833. Depending on the file format, ground benchmark data sets are provided in full for the areas of competence of the expert committees and for the State of Brandenburg in a zipped file with a statistical indication and a description of the elements. The CSV file is based on VBORIS2. A key bridge to the old format can be extracted from the data. On request, soil benchmark datasets for municipal areas can be cut out or provided in shape format. Furthermore, the delivery of soil benchmarks in the form of web-based geoservices is possible. Ground benchmark datasets are issued annually in the standard file formats Text (CSV) and XML in relation to EPSG code 25833. Depending on the file format, ground benchmark data sets are provided in full for the areas of competence of the expert committees and for the State of Brandenburg in a zipped file with a statistical indication and a description of the elements. The CSV file is based on VBORIS2. A key bridge to the old format can be extracted from the data. On request, soil benchmark datasets for municipal areas can be cut out or provided in shape format. Furthermore, the delivery of soil benchmarks in the form of web-based geoservices is possible.
Facebook
TwitterDisbursement Request Form Moto GP 2023