Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
For a comprehensive guide to this data and other UCR data, please see my book at ucrbook.comVersion 17 release notes:Adds data for 2020.Please note that the FBI has retired UCR data ending in 2020 data so this will be the last Offenses Known and Clearances by Arrest data they release. Changes .rda files to .rds. Please note that in 2020 the card_actual_pt variable always returns that the month was reported. This causes 2020 to report that all months are reported for all agencies because I use the card_actual_pt variable to measure how many months were reported. This variable is almost certainly incorrect since it is extremely unlikely that all agencies suddenly always report. However, I am keeping this incorrect value to maintain a consistent definition of how many months are missing (measuring missing months through card_actual_type, for example, gives different results for previous years so I don't want to change this). Version 16 release notes:Changes release notes description, does not change data.Version 15 release notes:Adds data for 2019.Please note that in 2019 the card_actual_pt variable always returns that the month was reported. This causes 2019 to report that all months are reported for all agencies because I use the card_actual_pt variable to measure how many months were reported. This variable is almost certainly incorrect since it is extremely unlikely that all agencies suddenly always report. However, I am keeping this incorrect value to maintain a consistent definition of how many months are missing (measuring missing months through card_actual_type, for example, gives different results for previous years so I don't want to change this). Version 14 release notes:Adds arson data from the UCR's Arson dataset. This adds just the arson variables about the number of arson incidents, not the complete set of variables in that dataset (which include damages from arson and whether structures were occupied or not during the arson.As arson is an index crime, both the total index and the index property columns now include arson offenses. The "all_crimes" variables also now include arson.Adds a arson_number_of_months_missing column indicating how many months were not reporting (i.e. missing from the annual data) in the arson data. In most cases, this is the same as the normal number_of_months_missing but not always so please check if you intend to use arson data.Please note that in 2018 the card_actual_pt variable always returns that the month was reported. This causes 2018 to report that all months are reported for all agencies because I use the card_actual_pt variable to measure how many months were reported. This variable is almost certainly incorrect since it is extremely unlikely that all agencies suddenly always report. However, I am keeping this incorrect value to maintain a consistent definition of how many months are missing (measuring missing months through card_actual_type, for example, gives different results for previous years so I don't want to change this).For some reason, a small number of agencies (primarily federal agencies) had the same ORI number in 2018 and I removed these duplicate agencies. Version 13 release notes: Adds 2018 dataNew Orleans (ORI = LANPD00) data had more unfounded crimes than actual crimes in 2018 so unfounded columns for 2018 are all NA. Version 12 release notes: Adds population 1-3 columns - if an agency is in multiple counties, these variables show the population in the county with the most people in that agency in it (population_1), second largest county (population_2), and third largest county (population_3). Also adds county 1-3 columns which identify which counties the agency is in. The population column is the sum of the three population columns. Thanks to Mike Maltz for the suggestion!Fixes bug in the crosswalk data that is merged to this file that had the incorrect FIPS code for Clinton, Tennessee (ORI = TN00101). Thanks for Brooke Watson for catching this bug!Adds a last_month_reported column which says which month was reported last. This is actually how the FBI defines number_of_months_reported so is a more accurate representation of that. Removes the number_of_months_reported variable as the name is misleading. You should use the last_month_reported or the number_of_months_missing (see below) variable instead.Adds a number_of_months_missin
By US Open Data Portal, data.gov [source]
This dataset offers a closer look into the mental health care received by U.S. households in the last four weeks during the Covid-19 pandemic. The sheer scale of this crisis is inspiring people of all ages, backgrounds, and geographies to come together to tackle the problem. The Household Pulse Survey from the U.S. Census Bureau was published with federal agency collaboration in order to draw up accurate and timely estimates about how Covid-19 is impacting employment status, consumer spending, food security, housing stability, education interruption, and physical and mental wellness amongst American households. In order to deliver meaningful results from this survey data about wellbeing at various levels of society during this trying period – which includes demographic characteristics such as age gender race/ethnicity training attainment – each consulted household was randomly selected according to certain weighted criteria to maintain accuracy throughout the findings This dataset will help you explore what's it like on the ground right now for everyone affected by Covid-19 - Will it inform your decisions or point you towards new opportunities?
For more datasets, click here.
- 🚨 Your notebook can be here! 🚨!
This dataset contains information about the mental health care that U.S. households have received in the last 4 weeks, during the Covid-19 pandemic. This data is valuable when wanting to track and measure mental health needs across the country and draw comparisons between regions based on support available.
To use this dataset, it is important to understand each of its columns or variables in order to draw meaningful insights from the data. The ‘Indicator’ column indicates which type of indicator (percentage or absolute number) is being measured by this survey, while ‘Group’ and 'Subgroup' provide more specific details about who was surveyed for each indicator included in this dataset.
The Columns ‘Phase’ and 'Time Period' provide information regarding when each of these indicators was measured - whether during a certain phase or over a particular timespan - while columns such as 'Value', 'LowCI' & 'HighCI' show us how many individuals fell into what quartile range for each measurement taken (e.g., how many people reported they rarely felt lonely). Similarly, the column Suppression Flag helps us identify cases where value has been suppressed if it falls below a certain benchmark; this allows us to calculate accurate estimates more quickly without needing to sort through all suppressed values manually each time we use this dataset for analysis purposes. Finally, columns such as ‘Time Period Start Date’ & ‘Time Period End Date’ indicate which exact dates were used for measurements taken over different periods throughout those dates specified – useful when conducting time-series related analyses over longer periods of time within our research scope)
Overall, when using this dataset it's important to keep in mind exactly what indicator type you're looking at - percentage points or absolute numbers - as well its associated group/subgroup characteristics so that you can accurately interpret trends based on key findings had by interpreting any correlations drawn from these results!
- Analyzing the effects of the Covid-19 pandemic on mental health care among different subgroups such as racial and ethnic minorities, gender and age categories.
- Identifying geographical disparities in mental health services by comparing state level data for the same time period.
- Comparing changes in mental health care indicators over time to understand how the pandemic has impacted people's access to care within a quarter or over longer periods
If you use this dataset in your research, please credit the original authors. Data Source
License: Dataset copyright by authors - You are free to: - Share - copy and redistribute the material in any medium or format for any purpose, even commercially. - Adapt - remix, transform, and build upon the material for any purpose, even commercially. - You must: - Give appropriate credit - Provide a link to the license, and indicate if changes were made. - ShareAlike - You must distribute your contributions under the same license as the original. ...
By Amber Thomas [source]
This dataset provides an estimation of broadband usage in the United States, focusing on how many people have access to broadband and how many are actually using it at broadband speeds. Through data collected by Microsoft from our services, including package size and total time of download, we can estimate the throughput speed of devices connecting to the internet across zip codes and counties.
According to Federal Communications Commission (FCC) estimates, 14.5 million people don't have access to any kind of broadband connection. This data set aims to address this contrast between those with estimated availability but no actual use by providing more accurate usage numbers downscaled to county and zip code levels. Who gets counted as having access is vastly important -- it determines who gets included in public funding opportunities dedicated solely toward closing this digital divide gap. The implications can be huge: millions around this country could remain invisible if these number aren't accurately reported or used properly in decision-making processes.
This dataset includes aggregated information about these locations with less than 20 devices for increased accuracy when estimating Broadband Usage in the United States-- allowing others to use it for developing solutions that improve internet access or label problem areas accurately where no real or reliable connectivity exists among citizens within communities large and small throughout the US mainland.. Please review the license terms before using these data so that you may adhere appropriately with stipulations set forth under Microsoft's Open Use Of Data Agreement v1.0 agreement prior to utilizing this dataset for your needs-- both professional and educational endeavors alike!
For more datasets, click here.
- 🚨 Your notebook can be here! 🚨!
How to Use the US Broadband Usage Dataset
This dataset provides broadband usage estimates in the United States by county and zip code. It is ideally suited for research into how broadband connects households, towns and cities. Understanding this information is vital for closing existing disparities in access to high-speed internet, and for devising strategies for making sure all Americans can stay connected in a digital world.
The dataset contains six columns: - County – The name of the county for which usage statistics are provided. - Zip Code (5-Digit) – The 5-digit zip code from which usage data was collected from within that county or metropolitan area/micro area/divisions within states as reported by the US Census Bureau in 2018[2].
- Population (Households) – Estimated number of households defined according to [3] based on data from the US Census Bureau American Community Survey's 5 Year Estimates[4].
- Average Throughput (Mbps)- Average Mbps download speed derived from a combination of data collected anonymous devices connected through Microsoft services such as Windows Update, Office 365, Xbox Live Core Services, etc.[5]
- Percent Fast (> 25 Mbps)- Percentage of machines with throughput greater than 25 Mbps calculated using [6]. 6) Percent Slow (< 3 Mbps)- Percentage of machines with throughput less than 3Mbps calculated using [7].
- Targeting marketing campaigns based on broadband use. Companies can use the geographic and demographic data in this dataset to create targeted advertising campaigns that are tailored to individuals living in areas where broadband access is scarce or lacking.
- Creating an educational platform for those without reliable access to broadband internet. By leveraging existing technologies such as satellite internet, media streaming services like Netflix, and platforms such as Khan Academy or EdX, those with limited access could gain access to new educational options from home.
- Establishing public-private partnerships between local governments and telecom providers need better data about gaps in service coverage and usage levels in order to make decisions about investments into new infrastructure buildouts for better connectivity options for rural communities
If you use this dataset in your research, please credit the original authors. Data Source
See the dataset description for more information.
File: broadband_data_2020October.csv
If you use this dataset in your research,...
How much time do people spend on social media? As of 2025, the average daily social media usage of internet users worldwide amounted to 141 minutes per day, down from 143 minutes in the previous year. Currently, the country with the most time spent on social media per day is Brazil, with online users spending an average of 3 hours and 49 minutes on social media each day. In comparison, the daily time spent with social media in the U.S. was just 2 hours and 16 minutes. Global social media usageCurrently, the global social network penetration rate is 62.3 percent. Northern Europe had an 81.7 percent social media penetration rate, topping the ranking of global social media usage by region. Eastern and Middle Africa closed the ranking with 10.1 and 9.6 percent usage reach, respectively. People access social media for a variety of reasons. Users like to find funny or entertaining content and enjoy sharing photos and videos with friends, but mainly use social media to stay in touch with current events friends. Global impact of social mediaSocial media has a wide-reaching and significant impact on not only online activities but also offline behavior and life in general. During a global online user survey in February 2019, a significant share of respondents stated that social media had increased their access to information, ease of communication, and freedom of expression. On the flip side, respondents also felt that social media had worsened their personal privacy, increased a polarization in politics and heightened everyday distractions.
https://digital.nhs.uk/about-nhs-digital/terms-and-conditionshttps://digital.nhs.uk/about-nhs-digital/terms-and-conditions
This publication provides the most timely statistics available relating to NHS funded secondary mental health, learning disabilities and autism services in England. This information will be of use to people needing access to information quickly for operational decision making and other purposes. These statistics are derived from submissions made using version 2.0 of the Mental Health Services Dataset (MHSDS). NHS Digital review the quality and completeness of the submissions used to create these statistics on an ongoing basis. More information about this work can be found in the Accuracy and reliability section of this report. Fully detailed information on the quality and completeness of particular statistics in this release is not available due to the timescales involved in reviewing submissions and engaging with data providers. The information that has been obtained at the time of publication is made available in the Provider Feedback sections of the Data Quality Reports which accompany this release. Information gathered after publication is released in future editions of this publication series. More detailed information on the quality and completeness of these statistics and a summary of how these statistics may be interpreted is made available later in our Mental Health Bulletin: Annual Report publication series. All elements of this publication, other editions of this publication series, and related annual publication series' can be found in the Related Links below. Included for the first time in this release is an Access and Waiting Times CSV file based on final data for the period 1 - 31 October 2017. This file includes the number of children and young people receiving at least two contacts (including indirect contacts) and where their first contact occurs before their 18th birthday and their second contact occurs during the reporting period. This file has been produced in order to support greater clarity and consistency in reporting local access to mental health services for children and young people. This measure itself is not an assured waiting times indicator for these services. In the Final July, Provisional August 2017 edition of this publication NHS Digital included a new measure of referrals starting in the reporting period, aged 0-18, with any one or more SNOMED Codes and valid PERS score from MH Assessment Scale Current View (MHS68). The purpose of this measure is to understand the current levels of recording of these outcomes measurements for children and young people in order to support the future reporting of mental health services outcomes for this group. Initial feedback from users has highlighted improvements that can be made to the methodology used in this measure. Specifically the limitation that the assessment must take place in the same month that the referral is received by the service provider may exclude a number of assessments taking place at the time of the first contact between the clinician and the service user. For this reason NHS Digital propose to change this measure in future editions of this publication series to include any Current View assessments that take place as part of any referral for this group open during the reporting period. If you have any feedback on these proposed changes please send these to enquiries@nhsdigital.nhs.uk with 'MHSDS Monthly - Current View' in the subject. Correction: The statistics relating to Children and Young People Receiving a Second Contact With Services were corrected on 9 February 2018. These statistics are meant to count people as accessing services in each financial year where they have received two care contacts. The statistics released originally incorrectly excluded people who accessed services in in a previous financial year. NHS Digital apologises for any inconvenience caused. A correction has been made to this publication on 10 September 2018. This amendment relates to statistics in the monthly CSV data file; the specific measures effected are listed in the “Corrected Measures” CSV. All listed measures have now been corrected. NHS Digital apologises for any inconvenience caused.
This statistical release makes available the most recent Mental Health and Learning Disabilities Dataset (MHLDDS) final monthly data (November 2015), together with provisional information for December 2015. This publication presents a wide range of information about care delivered to users of NHS funded secondary mental health and learning disability services in England.
The scope of the Mental Health Minimum Dataset (MHMDS) was extended to cover Learning Disability services from September 2014. Many people who have a learning disability use mental health services and people in learning disability services may have a mental health problem. This means that activity included in the new MHLDDS dataset cannot be distinctly divided into mental health or learning disability spells of care – a single spell of care may include inputs from either of both types of service.
The Currencies and Payment file that forms part of this release is specifically limited to services in scope for currencies and payment in mental health services and remains unchanged.
This information will be of particular interest to organisations involved in delivering secondary mental health and learning disability care to adults and older people, as it presents timely information to support discussions between providers and commissioners of services. The MHLDS Monthly Report also includes reporting by local authority for the first time.
For patients, researchers, agencies, and the wider public it aims to provide up to date information about the numbers of people using services, spending time in hospital and subject to the Mental Health Act (MHA). Some of these measures are currently experimental analysis.
The Currency and Payment (CaP) measures can be found in a separate machine-readable data file and may also be accessed via an on-line interactive visualisation tool that supports benchmarking. This can be accessed through the related links at the bottom of the page.
During summer 2015 we undertook a consultation on Adult Mental Health Statistics, seeking users views on the existing reports and what might usefully be added to our reports when the new version of the dataset (MHSDS) is implemented in 2016. A report on this consultation can be found below.
Attribution 3.0 (CC BY 3.0)https://creativecommons.org/licenses/by/3.0/
License information was derived automatically
The actual boundaries were most recently adjusted in: January 2025.
Boundary information source: State Scientific Production Enterprise “Kartographia”.
Contributor: OCHA Field Information Services Section (FISS).
COD-AB quality level: cod-enhanced (geometry and attributes verified and standardized) and live services available.
https://codgis.itos.uga.edu/arcgis/rest/services/COD_External/UKR_pcode/FeatureServerfeature server.
Most recent COD-AB review conclusion: The COD-AB does not require any update.
OCHA country status: Operational (country office).
An edge-matched (COD-EM) dataset is available
Most recent COD-AB review date: January 2025.
Deepest administrative level: ADM4.
COD-PS HDX URL: https://data.humdata.org/dataset/cod-ps-ukr.
Deepest administrative level with complete coverage: ADM3.
COD-EM HDX link: https://data.humdata.org/dataset/cod-em-ukr.
Administrative level required by humanitarian community: ADM2.
Note: The Ukrainian government has not fully implemented its new administrative structure reforms in the Autonomous Republic of Crimea [UA01] and Sevastopol [UA85].
Consequently, 31 ADM4 features within the Sevastopol ADM3 polygon retain P-codes that do not correctly conform to [UA85]. This will be rectified in a COD-AB update once the government has regained control of the Crimean Peninsula.
Note: Some administrative feature names were adjusted in a September/October 2024 administratrive feature renaming exercise. See reference table at https://data.humdata.org/dataset/cod-ab-ukr.
Note: The Ukraine COD-AB was most recently adjusted in early 2024 - affecting administrative level 3 (hromadas) only.
Note: The population statistics common operational dataset (COD-PS-UKR) is only available via special request. See Ukraine - Subnational Population Statistics. Users should use the 'Request Data' button.
Note: Limitations of the Ukrainian-to-Russian transliteration method mean that more than one Ukrainian name may be represented by a single Russian feature name.
Note: Feature name attribute fields in this dataset incorrectly use 'UA' to represent the Ukrainian language. The correct language code is 'UK'. This will be resolved in 2026.
During the second quarter of 2025, data breaches exposed more than ** million records worldwide. Since the first quarter of 2020, the highest number of data records were exposed in the third quarter of ****, more than *** billion data sets. Data breaches remain among the biggest concerns of company leaders worldwide. The most common causes of sensitive information loss were operating system vulnerabilities on endpoint devices. Which industries see the most data breaches? Meanwhile, certain conditions make some industry sectors more prone to data breaches than others. According to the latest observations, the public administration experienced the highest number of data breaches between 2021 and 2022. The industry saw *** reported data breach incidents with confirmed data loss. The second were financial institutions, with *** data breach cases, followed by healthcare providers. Data breach cost Data breach incidents have various consequences, the most common impact being financial losses and business disruptions. As of 2023, the average data breach cost across businesses worldwide was **** million U.S. dollars. Meanwhile, a leaked data record cost about *** U.S. dollars. The United States saw the highest average breach cost globally, at **** million U.S. dollars.
List of the data tables as part of the Immigration system statistics Home Office release. Summary and detailed data tables covering the immigration system, including out-of-country and in-country visas, asylum, detention, and returns.
If you have any feedback, please email MigrationStatsEnquiries@homeoffice.gov.uk.
The Microsoft Excel .xlsx files may not be suitable for users of assistive technology.
If you use assistive technology (such as a screen reader) and need a version of these documents in a more accessible format, please email MigrationStatsEnquiries@homeoffice.gov.uk
Please tell us what format you need. It will help us if you say what assistive technology you use.
Immigration system statistics, year ending June 2025
Immigration system statistics quarterly release
Immigration system statistics user guide
Publishing detailed data tables in migration statistics
Policy and legislative changes affecting migration to the UK: timeline
Immigration statistics data archives
https://assets.publishing.service.gov.uk/media/689efececc5ef8b4c5fc448c/passenger-arrivals-summary-jun-2025-tables.ods">Passenger arrivals summary tables, year ending June 2025 (ODS, 31.3 KB)
‘Passengers refused entry at the border summary tables’ and ‘Passengers refused entry at the border detailed datasets’ have been discontinued. The latest published versions of these tables are from February 2025 and are available in the ‘Passenger refusals – release discontinued’ section. A similar data series, ‘Refused entry at port and subsequently departed’, is available within the Returns detailed and summary tables.
https://assets.publishing.service.gov.uk/media/689efd8307f2cc15c93572d8/electronic-travel-authorisation-datasets-jun-2025.xlsx">Electronic travel authorisation detailed datasets, year ending June 2025 (MS Excel Spreadsheet, 57.1 KB)
ETA_D01: Applications for electronic travel authorisations, by nationality
ETA_D02: Outcomes of applications for electronic travel authorisations, by nationality
https://assets.publishing.service.gov.uk/media/68b08043b430435c669c17a2/visas-summary-jun-2025-tables.ods">Entry clearance visas summary tables, year ending June 2025 (ODS, 56.1 KB)
https://assets.publishing.service.gov.uk/media/689efda51fedc616bb133a38/entry-clearance-visa-outcomes-datasets-jun-2025.xlsx">Entry clearance visa applications and outcomes detailed datasets, year ending June 2025 (MS Excel Spreadsheet, 29.6 MB)
Vis_D01: Entry clearance visa applications, by nationality and visa type
Vis_D02: Outcomes of entry clearance visa applications, by nationality, visa type, and outcome
Additional data relating to in country and overseas Visa applications can be fo
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Background: In Brazil, secondary data for epidemiology are largely available. However, they are insufficiently prepared for use in research, even when it comes to structured data since they were often designed for other purposes. To date, few publications focus on the process of preparing secondary data. The present findings can help in orienting future research projects that are based on secondary data.Objective: Describe the steps in the process of ensuring the adequacy of a secondary data set for a specific use and to identify the challenges of this process.Methods: The present study is qualitative and reports methodological issues about secondary data use. The study material was comprised of 6,059,454 live births and 73,735 infant death records from 2004 to 2013 of children whose mothers resided in the State of São Paulo - Brazil. The challenges and description of the procedures to ensure data adequacy were undertaken in 6 steps: (1) problem understanding, (2) resource planning, (3) data understanding, (4) data preparation, (5) data validation and (6) data distribution. For each step, procedures, and challenges encountered, and the actions to cope with them and partial results were described. To identify the most labor-intensive tasks in this process, the steps were assessed by adding the number of procedures, challenges, and coping actions. The highest values were assumed to indicate the most critical steps.Results: In total, 22 procedures and 23 actions were needed to deal with the 27 challenges encountered along the process of ensuring the adequacy of the study material for the intended use. The final product was an organized database for a historical cohort study suitable for the intended use. Data understanding and data preparation were identified as the most critical steps, accounting for about 70% of the challenges observed for data using.Conclusion: Significant challenges were encountered in the process of ensuring the adequacy of secondary health data for research use, mainly in the data understanding and data preparation steps. The use of the described steps to approach structured secondary data and the knowledge of the potential challenges along the process may contribute to planning health research.
This study investigated the cognitive processing of true and false political information. Specifically, it examined the impact of source credibility on the assessment of veracity when information comes from a polarizing source (Experiment 1), and effectiveness of explanations when they come from one's own political party or an opposition party (Experiment 2). These experiments were conducted prior to the 2016 Presidential election. Participants rated their belief in factual and incorrect statements that President Trump made on the campaign trail; facts were subsequently affirmed and misinformation retracted. Participants then re-rated their belief immediately or after a delay. Experiment 1 found that (i) if information was attributed to Trump, Republican supporters of Trump believed it more than if it was presented without attribution, whereas the opposite was true for Democrats and (ii) although Trump supporters reduced their belief in misinformation items following a correction, they did not change their voting preferences. Experiment 2 revealed that the explanation's source had relatively little impact, and belief updating was more influenced by perceived credibility of the individual initially purporting the information. These findings suggest that people use political figures as a heuristic to guide evaluation of what is true or false, yet do not necessarily insist on veracity as a prerequisite for supporting political candidates.
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
The original dataset, contribution from Arevalo et al. in their work Gated Multimodal Units for Information Fusion can be downloaded from their git repository where you can also make use of the web-scrapping scripts they used to create it. From there you can download the hdf5 file and metadata.
The main problem is that this dataset contains data that in many cases is not necessary, for example the image latent features, the words n-grams, imdb ids... Furthermore, the poster captions are already tokenized, so if you want to see the real text then you must apply the ix_to_word dictionary from the metadata, which adds an extra step if you are trying different word tokenizers. The hdf5 file ends up being 15.6GB, plus the metadata npy file which is 65MB, makes a rather big dataset to meddle with if you really want to just use the minimal information.
Simplified MM-IMDb only has two files: - data.npy (18.1MB). Stores image index, one-hot encoding of the genre, and the caption/description of the poster. - images.npz (3.2GB). Stores all dataset images as numpy arrays.
With this dataset you can start training your multimodal models for multi-class classification, modality alignment, Masked-Language-Modelling, caption-based image retrieval, visual question answering, and many more.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Zomerganzen - Summering geese management and population counts in Flanders, Belgium is a sampling event dataset published by the Research Institute for Nature and Forest (INBO). The dataset contains over 3,700 sampling events, carried out since 2009, mostly in the months June and July. The data are compiled from different summering geese related projects, but most data were collected through fieldwork within the framework of the EU co-funded Interreg projects INVEXO (http://www.invexo.eu) and RINSE (www.rinse-europe.eu). Since 2015, data collection is funded by INBO. The dataset includes close to 5,000 presence occurrences, as well as over 15,000 absence occurrences. The sampling protocol for the majority of the occurrences are simultaneous counts. Here, the number of individuals of different geese species in a fixed set of areas is determined. Counts are performed within the same weekend to avoid double counting. Simultaneous counts were organised yearly since 2008 and take place the first weekend after July 15, the best period for monitoring the summering population of geese. These counts are performed by professional INBO employees as well as experienced birdwatchers from Natuurpunt using a standardized field protocol. Data are recorded in a citizen science portal (http://waarnemingen.be/waarnemingen_projecten.php?project=231). However, The dataset also comprises opportunistic field observations from the same portal outside this period. Furthermore, data are derived from management actions, such as fertility reduction (egg shaking and pricking), the use of Larsen traps (for Egyptian goose), and the execution of moult captures. Here, the individuals in the dataset were actually removed from the environment. The aim of the data collection is management follow-up and evaluation. Consequently, caution is advised when using these data for trend analysis, distribution range calculation, niche modeling or other. Issues with the dataset can be reported at https://github.com/LifeWatchINBO/data-publication/tree/master/datasets/zomerganzen-events
We strongly believe an open attitude is essential for tackling the IAS problem (Groom et al. 2015). To allow anyone to use this dataset, we have released the data to the public domain under a Creative Commons Zero waiver (http://creativecommons.org/publicdomain/zero/1.0/). We would appreciate it however if you read and follow these norms for data use (http://www.inbo.be/en/norms-for-data-use) and provide a link to the original dataset (https://doi.org/10.15468/a5ubtp) whenever possible. If you use these data for a scientific paper, please cite the dataset following the applicable citation norms and/or consider us for co-authorship. We are always interested to know how you have used or visualized the data, or to provide more information, so please contact us via the contact information provided in the metadata, opendata@inbo.be or https://twitter.com/LifeWatchINBO.
Many existing complex space systems have a significant amount of historical maintenance and problem data bases that are stored in unstructured text forms. The problem that we address in this paper is the discovery of recurring anomalies and relationships between problem reports that may indicate larger systemic problems. We will illustrate our techniques on data from discrepancy reports regarding software anomalies in the Space Shuttle. These free text reports are written by a number of different people, thus the emphasis and wording vary considerably. With Mehran Sahami from Stanford University, I'm putting together a book on text mining called "Text Mining: Theory and Applications" to be published by Taylor and Francis.
Revision
Finalised data on government support for buses was not available when these statistics were originally published (27 November 2024). The Ministry of Housing, Communities and Local Government (MHCLG) have since published that data so the following have been revised to include it:
Revision
The following figures relating to local bus passenger journeys per head have been revised:
Table BUS01f provides figures on passenger journeys per head of population at Local Transport Authority (LTA) level. Population data for 21 counties were duplicated in error, resulting in the halving of figures in this table. This issue does not affect any other figures in the published tables, including the regional and national breakdowns.
The affected LTAs were: Cambridgeshire, Derbyshire, Devon, East Sussex, Essex, Gloucestershire, Hampshire, Hertfordshire, Kent, Lancashire, Leicestershire, Lincolnshire, Norfolk, Nottinghamshire, Oxfordshire, Staffordshire, Suffolk, Surrey, Warwickshire, West Sussex, and Worcestershire.
A minor typo in the units was also corrected in the BUS02_mi spreadsheet.
A full list of tables can be found in the table index.
BUS0415: https://assets.publishing.service.gov.uk/media/6852b8d399b009dcdcb73612/bus0415.ods">Local bus fares index by metropolitan area status and country, quarterly: Great Britain (ODS, 35.4 KB)
This spreadsheet includes breakdowns by country, region, metropolitan area status, urban-rural classification and Local Authority. It also includes data per head of population, and concessionary journeys.
BUS01: https://assets.publishing.service.gov.uk/media/67603526239b9237f0915411/bus01.ods"> Local bus passenger journeys (ODS, 145 KB)
Limited historic data is available
These spreadsheets include breakdowns by country, region, metropolitan area status, urban-rural classification and Local Authority, as well as by service type. Vehicle distance travelled is a measure of levels of service provision.
BUS02_mi: https://assets.publishing.service.gov.uk/media/6760353198302e574b91540c/bus02_mi.ods">Vehicle distance travelled (miles) (ODS, 117 KB)
For further detailed information about methodology, users should consult the Labour Force Survey User Guide, included with the APS documentation. For variable and value labelling and coding frames that are not included either in the data or in the current APS documentation, users are advised to consult the latest versions of the LFS User Guides, which are available from the ONS Labour Force Survey - User Guidance webpages.
Occupation data for 2021 and 2022
The ONS has identified an issue with the collection of some occupational data in 2021 and 2022 data files in a number of their surveys. While they estimate any impacts will be small overall, this will affect the accuracy of the breakdowns of some detailed (four-digit Standard Occupational Classification (SOC)) occupations, and data derived from them. None of ONS' headline statistics, other than those directly sourced from occupational data, are affected and you can continue to rely on their accuracy. The affected datasets have now been updated. Further information can be found in the ONS article published on 11 July 2023: Revision of miscoded occupational data in the ONS Labour Force Survey, UK: January 2021 to September 2022
APS Well-Being Datasets
From 2012-2015, the ONS published separate APS datasets aimed at providing initial estimates of subjective well-being, based on the Integrated Household Survey. In 2015 these were discontinued. A separate set of well-being variables and a corresponding weighting variable have been added to the April-March APS person datasets from A11M12 onwards. Further information on the transition can be found in the Personal well-being in the UK: 2015 to 2016 article on the ONS website.
APS disability variables
Over time, there have been some updates to disability variables in the APS. An article explaining the quality assurance investigations on these variables that have been conducted so far is available on the ONS Methodology webpage.
The Secure Access data have more restrictive access conditions than those made available under the standard EUL. Prospective users will need to gain ONS Accredited Researcher status, complete an extra application form and demonstrate to the data owners exactly why they need access to the additional variables. Users are strongly advised to first obtain the standard EUL version of the data to see if they are sufficient for their research requirements.
https://www.usa.gov/government-workshttps://www.usa.gov/government-works
Reporting of Aggregate Case and Death Count data was discontinued May 11, 2023, with the expiration of the COVID-19 public health emergency declaration. Although these data will continue to be publicly available, this dataset will no longer be updated.
This archived public use dataset has 11 data elements reflecting United States COVID-19 community levels for all available counties.
The COVID-19 community levels were developed using a combination of three metrics — new COVID-19 admissions per 100,000 population in the past 7 days, the percent of staffed inpatient beds occupied by COVID-19 patients, and total new COVID-19 cases per 100,000 population in the past 7 days. The COVID-19 community level was determined by the higher of the new admissions and inpatient beds metrics, based on the current level of new cases per 100,000 population in the past 7 days. New COVID-19 admissions and the percent of staffed inpatient beds occupied represent the current potential for strain on the health system. Data on new cases acts as an early warning indicator of potential increases in health system strain in the event of a COVID-19 surge.
Using these data, the COVID-19 community level was classified as low, medium, or high.
COVID-19 Community Levels were used to help communities and individuals make decisions based on their local context and their unique needs. Community vaccination coverage and other local information, like early alerts from surveillance, such as through wastewater or the number of emergency department visits for COVID-19, when available, can also inform decision making for health officials and individuals.
For the most accurate and up-to-date data for any county or state, visit the relevant health department website. COVID Data Tracker may display data that differ from state and local websites. This can be due to differences in how data were collected, how metrics were calculated, or the timing of web updates.
Archived Data Notes:
This dataset was renamed from "United States COVID-19 Community Levels by County as Originally Posted" to "United States COVID-19 Community Levels by County" on March 31, 2022.
March 31, 2022: Column name for county population was changed to “county_population”. No change was made to the data points previous released.
March 31, 2022: New column, “health_service_area_population”, was added to the dataset to denote the total population in the designated Health Service Area based on 2019 Census estimate.
March 31, 2022: FIPS codes for territories American Samoa, Guam, Commonwealth of the Northern Mariana Islands, and United States Virgin Islands were re-formatted to 5-digit numeric for records released on 3/3/2022 to be consistent with other records in the dataset.
March 31, 2022: Changes were made to the text fields in variables “county”, “state”, and “health_service_area” so the formats are consistent across releases.
March 31, 2022: The “%” sign was removed from the text field in column “covid_inpatient_bed_utilization”. No change was made to the data. As indicated in the column description, values in this column represent the percentage of staffed inpatient beds occupied by COVID-19 patients (7-day average).
March 31, 2022: Data values for columns, “county_population”, “health_service_area_number”, and “health_service_area” were backfilled for records released on 2/24/2022. These columns were added since the week of 3/3/2022, thus the values were previously missing for records released the week prior.
April 7, 2022: Updates made to data released on 3/24/2022 for Guam, Commonwealth of the Northern Mariana Islands, and United States Virgin Islands to correct a data mapping error.
April 21, 2022: COVID-19 Community Level (CCL) data released for counties in Nebraska for the week of April 21, 2022 have 3 counties identified in the high category and 37 in the medium category. CDC has been working with state officials to verify the data submitted, as other data systems are not providing alerts for substantial increases in disease transmission or severity in the state.
May 26, 2022: COVID-19 Community Level (CCL) data released for McCracken County, KY for the week of May 5, 2022 have been updated to correct a data processing error. McCracken County, KY should have appeared in the low community level category during the week of May 5, 2022. This correction is reflected in this update.
May 26, 2022: COVID-19 Community Level (CCL) data released for several Florida counties for the week of May 19th, 2022, have been corrected for a data processing error. Of note, Broward, Miami-Dade, Palm Beach Counties should have appeared in the high CCL category, and Osceola County should have appeared in the medium CCL category. These corrections are reflected in this update.
May 26, 2022: COVID-19 Community Level (CCL) data released for Orange County, New York for the week of May 26, 2022 displayed an erroneous case rate of zero and a CCL category of low due to a data source error. This county should have appeared in the medium CCL category.
June 2, 2022: COVID-19 Community Level (CCL) data released for Tolland County, CT for the week of May 26, 2022 have been updated to correct a data processing error. Tolland County, CT should have appeared in the medium community level category during the week of May 26, 2022. This correction is reflected in this update.
June 9, 2022: COVID-19 Community Level (CCL) data released for Tolland County, CT for the week of May 26, 2022 have been updated to correct a misspelling. The medium community level category for Tolland County, CT on the week of May 26, 2022 was misspelled as “meduim” in the data set. This correction is reflected in this update.
June 9, 2022: COVID-19 Community Level (CCL) data released for Mississippi counties for the week of June 9, 2022 should be interpreted with caution due to a reporting cadence change over the Memorial Day holiday that resulted in artificially inflated case rates in the state.
July 7, 2022: COVID-19 Community Level (CCL) data released for Rock County, Minnesota for the week of July 7, 2022 displayed an artificially low case rate and CCL category due to a data source error. This county should have appeared in the high CCL category.
July 14, 2022: COVID-19 Community Level (CCL) data released for Massachusetts counties for the week of July 14, 2022 should be interpreted with caution due to a reporting cadence change that resulted in lower than expected case rates and CCL categories in the state.
July 28, 2022: COVID-19 Community Level (CCL) data released for all Montana counties for the week of July 21, 2022 had case rates of 0 due to a reporting issue. The case rates have been corrected in this update.
July 28, 2022: COVID-19 Community Level (CCL) data released for Alaska for all weeks prior to July 21, 2022 included non-resident cases. The case rates for the time series have been corrected in this update.
July 28, 2022: A laboratory in Nevada reported a backlog of historic COVID-19 cases. As a result, the 7-day case count and rate will be inflated in Clark County, NV for the week of July 28, 2022.
August 4, 2022: COVID-19 Community Level (CCL) data was updated on August 2, 2022 in error during performance testing. Data for the week of July 28, 2022 was changed during this update due to additional case and hospital data as a result of late reporting between July 28, 2022 and August 2, 2022. Since the purpose of this data set is to provide point-in-time views of COVID-19 Community Levels on Thursdays, any changes made to the data set during the August 2, 2022 update have been reverted in this update.
August 4, 2022: COVID-19 Community Level (CCL) data for the week of July 28, 2022 for 8 counties in Utah (Beaver County, Daggett County, Duchesne County, Garfield County, Iron County, Kane County, Uintah County, and Washington County) case data was missing due to data collection issues. CDC and its partners have resolved the issue and the correction is reflected in this update.
August 4, 2022: Due to a reporting cadence change, case rates for all Alabama counties will be lower than expected. As a result, the CCL levels published on August 4, 2022 should be interpreted with caution.
August 11, 2022: COVID-19 Community Level (CCL) data for the week of August 4, 2022 for South Carolina have been updated to correct a data collection error that resulted in incorrect case data. CDC and its partners have resolved the issue and the correction is reflected in this update.
August 18, 2022: COVID-19 Community Level (CCL) data for the week of August 11, 2022 for Connecticut have been updated to correct a data ingestion error that inflated the CT case rates. CDC, in collaboration with CT, has resolved the issue and the correction is reflected in this update.
August 25, 2022: A laboratory in Tennessee reported a backlog of historic COVID-19 cases. As a result, the 7-day case count and rate may be inflated in many counties and the CCLs published on August 25, 2022 should be interpreted with caution.
August 25, 2022: Due to a data source error, the 7-day case rate for St. Louis County, Missouri, is reported as zero in the COVID-19 Community Level data released on August 25, 2022. Therefore, the COVID-19 Community Level for this county should be interpreted with caution.
September 1, 2022: Due to a reporting issue, case rates for all Nebraska counties will include 6 days of data instead of 7 days in the COVID-19 Community Level (CCL) data released on September 1, 2022. Therefore, the CCLs for all Nebraska counties should be interpreted with caution.
September 8, 2022: Due to a data processing error, the case rate for Philadelphia County, Pennsylvania,
https://github.com/nytimes/covid-19-data/blob/master/LICENSEhttps://github.com/nytimes/covid-19-data/blob/master/LICENSE
The New York Times is releasing a series of data files with cumulative counts of coronavirus cases in the United States, at the state and county level, over time. We are compiling this time series data from state and local governments and health departments in an attempt to provide a complete record of the ongoing outbreak.
Since the first reported coronavirus case in Washington State on Jan. 21, 2020, The Times has tracked cases of coronavirus in real time as they were identified after testing. Because of the widespread shortage of testing, however, the data is necessarily limited in the picture it presents of the outbreak.
We have used this data to power our maps and reporting tracking the outbreak, and it is now being made available to the public in response to requests from researchers, scientists and government officials who would like access to the data to better understand the outbreak.
The data begins with the first reported coronavirus case in Washington State on Jan. 21, 2020. We will publish regular updates to the data in this repository.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Please note this dataset is the most recent version of the Administrative Boundaries (AB). For previous versions of the AB please go to this url: https://data.gov.au/data/dataset/previous-versions-of-the-geoscape-administrative-boundaries
Geoscape Administrative Boundaries is Australia’s most comprehensive national collection of boundaries, including government, statistical and electoral boundaries. It is built and maintained by Geoscape Australia using authoritative government data. Further information about contributors to Administrative Boundaries is available here.
This dataset comprises seven Geoscape products:
Updated versions of Administrative Boundaries are published on a quarterly basis.
Users have the option to download datasets with feature coordinates referencing either GDA94 or GDA2020 datums.
Notable changes in the May 2025 release
Victorian Wards have seen almost half of the dataset change now reflecting the boundaries from the 2024 subdivision review. https://www.vec.vic.gov.au/electoral-boundaries/council-reviews/ subdivision-reviews.
One new locality ‘Kenwick Island’ has been added to the local Government area ‘Mackay Regional’ in Queensland.
There have been spatial changes(area) greater than 1 km2 to the localities ‘Nicholson’, ‘Lawn Hill’ and ‘Coral Sea’ in Queensland and ‘Calguna’, ‘Israelite Bay’ and ‘Balladonia’ in Western Australia.
An update to the NT Commonwealth Electoral Boundaries has been applied to reflect the redistribution of the boundaries gazetted on 4 March 2025.
Geoscape has become aware that the DATE_CREATED and DATE_RETIRED attributes in the commonwealth_electoral_polygon MapInfo TAB tables were incorrectly ordered and did not match the product data model. These attributes have been re-ordered to match the data model for the May 2025 release.
IMPORTANT NOTE: correction of issues with the 22 November 2022 release
Further information on Administrative Boundaries, including FAQs on the data, is available here or through Geoscape Australia’s network of partners. They provide a range of commercial products based on Administrative Boundaries, including software solutions, consultancy and support.
Note: On 1 October 2020, PSMA Australia Limited began trading as Geoscape Australia.
The Australian Government has negotiated the release of Administrative Boundaries to the whole economy under an open CCBY 4.0 licence.
Users must only use the data in ways that are consistent with the Australian Privacy Principles issued under the Privacy Act 1988 (Cth).
Users must also note the following attribution requirements:
Preferred attribution for the Licensed Material:
Administrative Boundaries © Geoscape Australia licensed by the Commonwealth of Australia under Creative Commons Attribution 4.0 International license (CC BY 4.0).
Preferred attribution for Adapted Material:
Incorporates or developed using Administrative Boundaries © Geoscape Australia licensed by the Commonwealth of Australia under Creative Commons Attribution 4.0 International licence (CC BY 4.0).
Administrative Boundaries is large dataset (around 1.5GB unpacked), made up of seven themes each containing multiple layers.
Users are advised to read the technical documentation including the product change notices and the individual product descriptions before downloading and using the product.
Please note this dataset is the most recent version of the Administrative Boundaries (AB). For previous versions of the AB please go to this url: https://data.gov.au/dataset/ds-dga-b4ad5702-ea2b-4f04-833c-d0229bfd689e/details?q=previous
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Feature selection is an important solution for dealing with high-dimensional data in the fields of machine learning and data mining. In this paper, we present an improved mountain gazelle optimizer (IMGO) based on the newly proposed mountain gazelle optimizer (MGO) and design a binary version of IMGO (BIMGO) to solve the feature selection problem for medical data. First, the gazelle population is initialized using iterative chaotic map with infinite collapses (ICMIC) mapping, which increases the diversity of the population. Second, a nonlinear control factor is introduced to balance the exploration and exploitation components of the algorithm. Individuals in the population are perturbed using a spiral perturbation mechanism to enhance the local search capability of the algorithm. Finally, a neighborhood search strategy is used for the optimal individuals to enhance the exploitation and convergence capabilities of the algorithm. The superior ability of the IMGO algorithm to solve continuous problems is demonstrated on 23 benchmark datasets. Then, BIMGO is evaluated on 16 medical datasets of different dimensions and compared with 8 well-known metaheuristic algorithms. The experimental results indicate that BIMGO outperforms the competing algorithms in terms of the fitness value, number of selected features and sensitivity. In addition, the statistical results of the experiments demonstrate the significantly superior ability of BIMGO to select the most effective features in medical datasets.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
For a comprehensive guide to this data and other UCR data, please see my book at ucrbook.comVersion 17 release notes:Adds data for 2020.Please note that the FBI has retired UCR data ending in 2020 data so this will be the last Offenses Known and Clearances by Arrest data they release. Changes .rda files to .rds. Please note that in 2020 the card_actual_pt variable always returns that the month was reported. This causes 2020 to report that all months are reported for all agencies because I use the card_actual_pt variable to measure how many months were reported. This variable is almost certainly incorrect since it is extremely unlikely that all agencies suddenly always report. However, I am keeping this incorrect value to maintain a consistent definition of how many months are missing (measuring missing months through card_actual_type, for example, gives different results for previous years so I don't want to change this). Version 16 release notes:Changes release notes description, does not change data.Version 15 release notes:Adds data for 2019.Please note that in 2019 the card_actual_pt variable always returns that the month was reported. This causes 2019 to report that all months are reported for all agencies because I use the card_actual_pt variable to measure how many months were reported. This variable is almost certainly incorrect since it is extremely unlikely that all agencies suddenly always report. However, I am keeping this incorrect value to maintain a consistent definition of how many months are missing (measuring missing months through card_actual_type, for example, gives different results for previous years so I don't want to change this). Version 14 release notes:Adds arson data from the UCR's Arson dataset. This adds just the arson variables about the number of arson incidents, not the complete set of variables in that dataset (which include damages from arson and whether structures were occupied or not during the arson.As arson is an index crime, both the total index and the index property columns now include arson offenses. The "all_crimes" variables also now include arson.Adds a arson_number_of_months_missing column indicating how many months were not reporting (i.e. missing from the annual data) in the arson data. In most cases, this is the same as the normal number_of_months_missing but not always so please check if you intend to use arson data.Please note that in 2018 the card_actual_pt variable always returns that the month was reported. This causes 2018 to report that all months are reported for all agencies because I use the card_actual_pt variable to measure how many months were reported. This variable is almost certainly incorrect since it is extremely unlikely that all agencies suddenly always report. However, I am keeping this incorrect value to maintain a consistent definition of how many months are missing (measuring missing months through card_actual_type, for example, gives different results for previous years so I don't want to change this).For some reason, a small number of agencies (primarily federal agencies) had the same ORI number in 2018 and I removed these duplicate agencies. Version 13 release notes: Adds 2018 dataNew Orleans (ORI = LANPD00) data had more unfounded crimes than actual crimes in 2018 so unfounded columns for 2018 are all NA. Version 12 release notes: Adds population 1-3 columns - if an agency is in multiple counties, these variables show the population in the county with the most people in that agency in it (population_1), second largest county (population_2), and third largest county (population_3). Also adds county 1-3 columns which identify which counties the agency is in. The population column is the sum of the three population columns. Thanks to Mike Maltz for the suggestion!Fixes bug in the crosswalk data that is merged to this file that had the incorrect FIPS code for Clinton, Tennessee (ORI = TN00101). Thanks for Brooke Watson for catching this bug!Adds a last_month_reported column which says which month was reported last. This is actually how the FBI defines number_of_months_reported so is a more accurate representation of that. Removes the number_of_months_reported variable as the name is misleading. You should use the last_month_reported or the number_of_months_missing (see below) variable instead.Adds a number_of_months_missin