78 datasets found

o
Jacob Kaplan's Concatenated Files: Uniform Crime Reporting Program Data:...
openicpsr.org
Updated Jun 5, 2017
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jacob Kaplan (2017). Jacob Kaplan's Concatenated Files: Uniform Crime Reporting Program Data: Offenses Known and Clearances by Arrest (Return A), 1960-2020 [Dataset]. http://doi.org/10.3886/E100707V17
Explore at:
Unique identifier
https://doi.org/10.3886/E100707V17
Dataset updated
Jun 5, 2017
Dataset provided by
Princeton University
Authors
Jacob Kaplan
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
1960 - 2020
Area covered
United States
Description
For a comprehensive guide to this data and other UCR data, please see my book at ucrbook.comVersion 17 release notes:Adds data for 2020.Please note that the FBI has retired UCR data ending in 2020 data so this will be the last Offenses Known and Clearances by Arrest data they release. Changes .rda files to .rds. Please note that in 2020 the card_actual_pt variable always returns that the month was reported. This causes 2020 to report that all months are reported for all agencies because I use the card_actual_pt variable to measure how many months were reported. This variable is almost certainly incorrect since it is extremely unlikely that all agencies suddenly always report. However, I am keeping this incorrect value to maintain a consistent definition of how many months are missing (measuring missing months through card_actual_type, for example, gives different results for previous years so I don't want to change this). Version 16 release notes:Changes release notes description, does not change data.Version 15 release notes:Adds data for 2019.Please note that in 2019 the card_actual_pt variable always returns that the month was reported. This causes 2019 to report that all months are reported for all agencies because I use the card_actual_pt variable to measure how many months were reported. This variable is almost certainly incorrect since it is extremely unlikely that all agencies suddenly always report. However, I am keeping this incorrect value to maintain a consistent definition of how many months are missing (measuring missing months through card_actual_type, for example, gives different results for previous years so I don't want to change this). Version 14 release notes:Adds arson data from the UCR's Arson dataset. This adds just the arson variables about the number of arson incidents, not the complete set of variables in that dataset (which include damages from arson and whether structures were occupied or not during the arson.As arson is an index crime, both the total index and the index property columns now include arson offenses. The "all_crimes" variables also now include arson.Adds a arson_number_of_months_missing column indicating how many months were not reporting (i.e. missing from the annual data) in the arson data. In most cases, this is the same as the normal number_of_months_missing but not always so please check if you intend to use arson data.Please note that in 2018 the card_actual_pt variable always returns that the month was reported. This causes 2018 to report that all months are reported for all agencies because I use the card_actual_pt variable to measure how many months were reported. This variable is almost certainly incorrect since it is extremely unlikely that all agencies suddenly always report. However, I am keeping this incorrect value to maintain a consistent definition of how many months are missing (measuring missing months through card_actual_type, for example, gives different results for previous years so I don't want to change this).For some reason, a small number of agencies (primarily federal agencies) had the same ORI number in 2018 and I removed these duplicate agencies. Version 13 release notes: Adds 2018 dataNew Orleans (ORI = LANPD00) data had more unfounded crimes than actual crimes in 2018 so unfounded columns for 2018 are all NA. Version 12 release notes: Adds population 1-3 columns - if an agency is in multiple counties, these variables show the population in the county with the most people in that agency in it (population_1), second largest county (population_2), and third largest county (population_3). Also adds county 1-3 columns which identify which counties the agency is in. The population column is the sum of the three population columns. Thanks to Mike Maltz for the suggestion!Fixes bug in the crosswalk data that is merged to this file that had the incorrect FIPS code for Clinton, Tennessee (ORI = TN00101). Thanks for Brooke Watson for catching this bug!Adds a last_month_reported column which says which month was reported last. This is actually how the FBI defines number_of_months_reported so is a more accurate representation of that. Removes the number_of_months_reported variable as the name is misleading. You should use the last_month_reported or the number_of_months_missing (see below) variable instead.Adds a number_of_months_missin
U.S. Household Mental Health & Covid-19
kaggle.com
Updated Jan 21, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The Devastator (2023). U.S. Household Mental Health & Covid-19 [Dataset]. https://www.kaggle.com/datasets/thedevastator/u-s-household-mental-health-covid-19/discussion
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jan 21, 2023
Dataset provided by
Kaggle
Authors
The Devastator
Description
U.S. Household Mental Health & Covid-19

Assessing the Impact of the Pandemic

By US Open Data Portal, data.gov [source]

About this dataset

This dataset offers a closer look into the mental health care received by U.S. households in the last four weeks during the Covid-19 pandemic. The sheer scale of this crisis is inspiring people of all ages, backgrounds, and geographies to come together to tackle the problem. The Household Pulse Survey from the U.S. Census Bureau was published with federal agency collaboration in order to draw up accurate and timely estimates about how Covid-19 is impacting employment status, consumer spending, food security, housing stability, education interruption, and physical and mental wellness amongst American households. In order to deliver meaningful results from this survey data about wellbeing at various levels of society during this trying period – which includes demographic characteristics such as age gender race/ethnicity training attainment – each consulted household was randomly selected according to certain weighted criteria to maintain accuracy throughout the findings This dataset will help you explore what's it like on the ground right now for everyone affected by Covid-19 - Will it inform your decisions or point you towards new opportunities?

More Datasets

For more datasets, click here.

Featured Notebooks

🚨 Your notebook can be here! 🚨!

How to use the dataset

This dataset contains information about the mental health care that U.S. households have received in the last 4 weeks, during the Covid-19 pandemic. This data is valuable when wanting to track and measure mental health needs across the country and draw comparisons between regions based on support available.

To use this dataset, it is important to understand each of its columns or variables in order to draw meaningful insights from the data. The ‘Indicator’ column indicates which type of indicator (percentage or absolute number) is being measured by this survey, while ‘Group’ and 'Subgroup' provide more specific details about who was surveyed for each indicator included in this dataset.

The Columns ‘Phase’ and 'Time Period' provide information regarding when each of these indicators was measured - whether during a certain phase or over a particular timespan - while columns such as 'Value', 'LowCI' & 'HighCI' show us how many individuals fell into what quartile range for each measurement taken (e.g., how many people reported they rarely felt lonely). Similarly, the column Suppression Flag helps us identify cases where value has been suppressed if it falls below a certain benchmark; this allows us to calculate accurate estimates more quickly without needing to sort through all suppressed values manually each time we use this dataset for analysis purposes. Finally, columns such as ‘Time Period Start Date’ & ‘Time Period End Date’ indicate which exact dates were used for measurements taken over different periods throughout those dates specified – useful when conducting time-series related analyses over longer periods of time within our research scope)

Overall, when using this dataset it's important to keep in mind exactly what indicator type you're looking at - percentage points or absolute numbers - as well its associated group/subgroup characteristics so that you can accurately interpret trends based on key findings had by interpreting any correlations drawn from these results!

Research Ideas

Analyzing the effects of the Covid-19 pandemic on mental health care among different subgroups such as racial and ethnic minorities, gender and age categories.

Identifying geographical disparities in mental health services by comparing state level data for the same time period.

Comparing changes in mental health care indicators over time to understand how the pandemic has impacted people's access to care within a quarter or over longer periods

Acknowledgements

If you use this dataset in your research, please credit the original authors. Data Source

License

License: Dataset copyright by authors - You are free to: - Share - copy and redistribute the material in any medium or format for any purpose, even commercially. - Adapt - remix, transform, and build upon the material for any purpose, even commercially. - You must: - Give appropriate credit - Provide a link to the license, and indicate if changes were made. - ShareAlike - You must distribute your contributions under the same license as the original. ...
US Broadband Usage Across Counties
kaggle.com
Updated Jan 6, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The Devastator (2023). US Broadband Usage Across Counties [Dataset]. https://www.kaggle.com/datasets/thedevastator/us-broadband-usage-across-counties-and-zip-codes
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jan 6, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
The Devastator
Area covered
United States
Description
US Broadband Usage Across Counties

Utilizing Microsoft's Data to Estimate Access

By Amber Thomas [source]

About this dataset

This dataset provides an estimation of broadband usage in the United States, focusing on how many people have access to broadband and how many are actually using it at broadband speeds. Through data collected by Microsoft from our services, including package size and total time of download, we can estimate the throughput speed of devices connecting to the internet across zip codes and counties.

According to Federal Communications Commission (FCC) estimates, 14.5 million people don't have access to any kind of broadband connection. This data set aims to address this contrast between those with estimated availability but no actual use by providing more accurate usage numbers downscaled to county and zip code levels. Who gets counted as having access is vastly important -- it determines who gets included in public funding opportunities dedicated solely toward closing this digital divide gap. The implications can be huge: millions around this country could remain invisible if these number aren't accurately reported or used properly in decision-making processes.

This dataset includes aggregated information about these locations with less than 20 devices for increased accuracy when estimating Broadband Usage in the United States-- allowing others to use it for developing solutions that improve internet access or label problem areas accurately where no real or reliable connectivity exists among citizens within communities large and small throughout the US mainland.. Please review the license terms before using these data so that you may adhere appropriately with stipulations set forth under Microsoft's Open Use Of Data Agreement v1.0 agreement prior to utilizing this dataset for your needs-- both professional and educational endeavors alike!

More Datasets

For more datasets, click here.

Featured Notebooks

🚨 Your notebook can be here! 🚨!

How to use the dataset

How to Use the US Broadband Usage Dataset

This dataset provides broadband usage estimates in the United States by county and zip code. It is ideally suited for research into how broadband connects households, towns and cities. Understanding this information is vital for closing existing disparities in access to high-speed internet, and for devising strategies for making sure all Americans can stay connected in a digital world.

The dataset contains six columns: - County – The name of the county for which usage statistics are provided. - Zip Code (5-Digit) – The 5-digit zip code from which usage data was collected from within that county or metropolitan area/micro area/divisions within states as reported by the US Census Bureau in 2018[2].
- Population (Households) – Estimated number of households defined according to [3] based on data from the US Census Bureau American Community Survey's 5 Year Estimates[4].
- Average Throughput (Mbps)- Average Mbps download speed derived from a combination of data collected anonymous devices connected through Microsoft services such as Windows Update, Office 365, Xbox Live Core Services, etc.[5]
- Percent Fast (> 25 Mbps)- Percentage of machines with throughput greater than 25 Mbps calculated using [6]. 6) Percent Slow (< 3 Mbps)- Percentage of machines with throughput less than 3Mbps calculated using [7].

Research Ideas

Targeting marketing campaigns based on broadband use. Companies can use the geographic and demographic data in this dataset to create targeted advertising campaigns that are tailored to individuals living in areas where broadband access is scarce or lacking.

Creating an educational platform for those without reliable access to broadband internet. By leveraging existing technologies such as satellite internet, media streaming services like Netflix, and platforms such as Khan Academy or EdX, those with limited access could gain access to new educational options from home.

Establishing public-private partnerships between local governments and telecom providers need better data about gaps in service coverage and usage levels in order to make decisions about investments into new infrastructure buildouts for better connectivity options for rural communities

Acknowledgements

If you use this dataset in your research, please credit the original authors. Data Source

License

See the dataset description for more information.

Columns

File: broadband_data_2020October.csv

Acknowledgements

If you use this dataset in your research,...
Average daily time spent on social media worldwide 2012-2025
statista.com
Updated Jun 19, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2025). Average daily time spent on social media worldwide 2012-2025 [Dataset]. https://www.statista.com/statistics/433871/daily-social-media-usage-worldwide/
Explore at:
Dataset updated
Jun 19, 2025
Dataset authored and provided by
Statistahttp://statista.com/
Area covered
Worldwide
Description
How much time do people spend on social media? As of 2025, the average daily social media usage of internet users worldwide amounted to 141 minutes per day, down from 143 minutes in the previous year. Currently, the country with the most time spent on social media per day is Brazil, with online users spending an average of 3 hours and 49 minutes on social media each day. In comparison, the daily time spent with social media in the U.S. was just 2 hours and 16 minutes. Global social media usageCurrently, the global social network penetration rate is 62.3 percent. Northern Europe had an 81.7 percent social media penetration rate, topping the ranking of global social media usage by region. Eastern and Middle Africa closed the ranking with 10.1 and 9.6 percent usage reach, respectively. People access social media for a variety of reasons. Users like to find funny or entertaining content and enjoy sharing photos and videos with friends, but mainly use social media to stay in touch with current events friends. Global impact of social mediaSocial media has a wide-reaching and significant impact on not only online activities but also offline behavior and life in general. During a global online user survey in February 2019, a significant share of respondents stated that social media had increased their access to information, ease of communication, and freedom of expression. On the flip side, respondents also felt that social media had worsened their personal privacy, increased a polarization in politics and heightened everyday distractions.
d
Mental Health Services Monthly Statistics
digital.nhs.uk
csv, pdf, xls, xlsx
Updated Jan 19, 2018
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2018). Mental Health Services Monthly Statistics [Dataset]. https://digital.nhs.uk/data-and-information/publications/statistical/mental-health-services-monthly-statistics
Explore at:
csv(1.8 MB), pdf(686.2 kB), csv(813.9 kB), xlsx(225.3 kB), csv(546.8 kB), csv(350.5 kB), csv(1.6 MB), csv(124.9 kB), xls(3.6 MB), csv(799.7 kB), csv(5.4 kB), csv(810.7 kB), csv(1.9 MB), xls(398.3 kB), xlsx(75.9 kB), xls(3.7 MB), pdf(225.2 kB)Available download formats
Dataset updated
Jan 19, 2018
License
https://digital.nhs.uk/about-nhs-digital/terms-and-conditionshttps://digital.nhs.uk/about-nhs-digital/terms-and-conditions
Time period covered
Aug 1, 2017 - Nov 30, 2017
Area covered
England
Description
This publication provides the most timely statistics available relating to NHS funded secondary mental health, learning disabilities and autism services in England. This information will be of use to people needing access to information quickly for operational decision making and other purposes. These statistics are derived from submissions made using version 2.0 of the Mental Health Services Dataset (MHSDS). NHS Digital review the quality and completeness of the submissions used to create these statistics on an ongoing basis. More information about this work can be found in the Accuracy and reliability section of this report. Fully detailed information on the quality and completeness of particular statistics in this release is not available due to the timescales involved in reviewing submissions and engaging with data providers. The information that has been obtained at the time of publication is made available in the Provider Feedback sections of the Data Quality Reports which accompany this release. Information gathered after publication is released in future editions of this publication series. More detailed information on the quality and completeness of these statistics and a summary of how these statistics may be interpreted is made available later in our Mental Health Bulletin: Annual Report publication series. All elements of this publication, other editions of this publication series, and related annual publication series' can be found in the Related Links below. Included for the first time in this release is an Access and Waiting Times CSV file based on final data for the period 1 - 31 October 2017. This file includes the number of children and young people receiving at least two contacts (including indirect contacts) and where their first contact occurs before their 18th birthday and their second contact occurs during the reporting period. This file has been produced in order to support greater clarity and consistency in reporting local access to mental health services for children and young people. This measure itself is not an assured waiting times indicator for these services. In the Final July, Provisional August 2017 edition of this publication NHS Digital included a new measure of referrals starting in the reporting period, aged 0-18, with any one or more SNOMED Codes and valid PERS score from MH Assessment Scale Current View (MHS68). The purpose of this measure is to understand the current levels of recording of these outcomes measurements for children and young people in order to support the future reporting of mental health services outcomes for this group. Initial feedback from users has highlighted improvements that can be made to the methodology used in this measure. Specifically the limitation that the assessment must take place in the same month that the referral is received by the service provider may exclude a number of assessments taking place at the time of the first contact between the clinician and the service user. For this reason NHS Digital propose to change this measure in future editions of this publication series to include any Current View assessments that take place as part of any referral for this group open during the reporting period. If you have any feedback on these proposed changes please send these to enquiries@nhsdigital.nhs.uk with 'MHSDS Monthly - Current View' in the subject. Correction: The statistics relating to Children and Young People Receiving a Second Contact With Services were corrected on 9 February 2018. These statistics are meant to count people as accessing services in each financial year where they have received two care contacts. The statistics released originally incorrectly excluded people who accessed services in in a previous financial year. NHS Digital apologises for any inconvenience caused. A correction has been made to this publication on 10 September 2018. This amendment relates to statistics in the monthly CSV data file; the specific measures effected are listed in the “Corrected Measures” CSV. All listed measures have now been corrected. NHS Digital apologises for any inconvenience caused.
w
Mental health and learning disabilities statistics monthly report: final...
gov.uk
Updated Feb 23, 2016
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Health and Social Care Information Centre (2016). Mental health and learning disabilities statistics monthly report: final November 2015 and provisional December 2015 [Dataset]. https://www.gov.uk/government/statistics/mental-health-and-learning-disabilities-statistics-monthly-report-final-november-2015-and-provisional-december-2015
Explore at:
Dataset updated
Feb 23, 2016
Dataset provided by
GOV.UK
Authors
Health and Social Care Information Centre
Description
This statistical release makes available the most recent Mental Health and Learning Disabilities Dataset (MHLDDS) final monthly data (November 2015), together with provisional information for December 2015. This publication presents a wide range of information about care delivered to users of NHS funded secondary mental health and learning disability services in England.

The scope of the Mental Health Minimum Dataset (MHMDS) was extended to cover Learning Disability services from September 2014. Many people who have a learning disability use mental health services and people in learning disability services may have a mental health problem. This means that activity included in the new MHLDDS dataset cannot be distinctly divided into mental health or learning disability spells of care – a single spell of care may include inputs from either of both types of service.

The Currencies and Payment file that forms part of this release is specifically limited to services in scope for currencies and payment in mental health services and remains unchanged.

This information will be of particular interest to organisations involved in delivering secondary mental health and learning disability care to adults and older people, as it presents timely information to support discussions between providers and commissioners of services. The MHLDS Monthly Report also includes reporting by local authority for the first time.

For patients, researchers, agencies, and the wider public it aims to provide up to date information about the numbers of people using services, spending time in hospital and subject to the Mental Health Act (MHA). Some of these measures are currently experimental analysis.

The Currency and Payment (CaP) measures can be found in a separate machine-readable data file and may also be accessed via an on-line interactive visualisation tool that supports benchmarking. This can be accessed through the related links at the bottom of the page.

During summer 2015 we undertook a consultation on Adult Mental Health Statistics, seeking users views on the existing reports and what might usefully be added to our reports when the new version of the dataset (MHSDS) is implemented in 2016. A report on this consultation can be found below.
H
Ukraine - Subnational Administrative Boundaries - enhanced metadata...
data.humdata.org
shp, xlsx
Updated Apr 15, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
OCHA Field Information Services Section (FISS) (2025). Ukraine - Subnational Administrative Boundaries - enhanced metadata demonstration [Dataset]. https://data.humdata.org/dataset/ukraine-subnational-administrative-boundaries-enhanced-metadata-demonstration
Explore at:
shp(71173699), shp(103361587), xlsx(4457660)Available download formats
Dataset updated
Apr 15, 2025
Dataset provided by
OCHA Field Information Services Section (FISS)
License
Attribution 3.0 (CC BY 3.0)https://creativecommons.org/licenses/by/3.0/
License information was derived automatically
Area covered
Ukraine
Description
The actual boundaries were most recently adjusted in: January 2025.
Boundary information source: State Scientific Production Enterprise “Kartographia”.
Contributor: OCHA Field Information Services Section (FISS).
COD-AB quality level: cod-enhanced (geometry and attributes verified and standardized) and live services available.
https://codgis.itos.uga.edu/arcgis/rest/services/COD_External/UKR_pcode/FeatureServer feature server.
Most recent COD-AB review conclusion: The COD-AB does not require any update.
OCHA country status: Operational (country office).
An edge-matched (COD-EM) dataset is available
Most recent COD-AB review date: January 2025.
Deepest administrative level: ADM4.
COD-PS HDX URL: https://data.humdata.org/dataset/cod-ps-ukr.
Deepest administrative level with complete coverage: ADM3.
COD-EM HDX link: https://data.humdata.org/dataset/cod-em-ukr.
Administrative level required by humanitarian community: ADM2.
Note: The Ukrainian government has not fully implemented its new administrative structure reforms in the Autonomous Republic of Crimea [UA01] and Sevastopol [UA85].
Consequently, 31 ADM4 features within the Sevastopol ADM3 polygon retain P-codes that do not correctly conform to [UA85]. This will be rectified in a COD-AB update once the government has regained control of the Crimean Peninsula.
Note: Some administrative feature names were adjusted in a September/October 2024 administratrive feature renaming exercise. See reference table at https://data.humdata.org/dataset/cod-ab-ukr.
Note: The Ukraine COD-AB was most recently adjusted in early 2024 - affecting administrative level 3 (hromadas) only.
Note: The population statistics common operational dataset (COD-PS-UKR) is only available via special request. See Ukraine - Subnational Population Statistics. Users should use the 'Request Data' button.
Note: Limitations of the Ukrainian-to-Russian transliteration method mean that more than one Ukrainian name may be represented by a single Russian feature name.
Note: Feature name attribute fields in this dataset incorrectly use 'UA' to represent the Ukrainian language. The correct language code is 'UK'. This will be resolved in 2026.
Global number of breached user accounts Q1 2020-Q2 2025
statista.com
Updated Aug 29, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2025). Global number of breached user accounts Q1 2020-Q2 2025 [Dataset]. https://www.statista.com/statistics/1307426/number-of-data-breaches-worldwide/
Explore at:
Dataset updated
Aug 29, 2025
Dataset authored and provided by
Statistahttp://statista.com/
Area covered
Worldwide
Description
During the second quarter of 2025, data breaches exposed more than ** million records worldwide. Since the first quarter of 2020, the highest number of data records were exposed in the third quarter of ****, more than *** billion data sets. Data breaches remain among the biggest concerns of company leaders worldwide. The most common causes of sensitive information loss were operating system vulnerabilities on endpoint devices. Which industries see the most data breaches? Meanwhile, certain conditions make some industry sectors more prone to data breaches than others. According to the latest observations, the public administration experienced the highest number of data breaches between 2021 and 2022. The industry saw *** reported data breach incidents with confirmed data loss. The second were financial institutions, with *** data breach cases, followed by healthcare providers. Data breach cost Data breach incidents have various consequences, the most common impact being financial losses and business disruptions. As of 2023, the average data breach cost across businesses worldwide was **** million U.S. dollars. Meanwhile, a leaked data record cost about *** U.S. dollars. The United States saw the highest average breach cost globally, at **** million U.S. dollars.
w
Immigration system statistics data tables
gov.uk
Updated Aug 21, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Home Office (2025). Immigration system statistics data tables [Dataset]. https://www.gov.uk/government/statistical-data-sets/immigration-system-statistics-data-tables
Explore at:
Dataset updated
Aug 21, 2025
Dataset provided by
GOV.UK
Authors
Home Office
Description
List of the data tables as part of the Immigration system statistics Home Office release. Summary and detailed data tables covering the immigration system, including out-of-country and in-country visas, asylum, detention, and returns.

If you have any feedback, please email MigrationStatsEnquiries@homeoffice.gov.uk.

Accessible file formats

The Microsoft Excel .xlsx files may not be suitable for users of assistive technology.
If you use assistive technology (such as a screen reader) and need a version of these documents in a more accessible format, please email MigrationStatsEnquiries@homeoffice.gov.uk
Please tell us what format you need. It will help us if you say what assistive technology you use.

Related content

Immigration system statistics, year ending June 2025
Immigration system statistics quarterly release
Immigration system statistics user guide
Publishing detailed data tables in migration statistics
Policy and legislative changes affecting migration to the UK: timeline
Immigration statistics data archives

Passenger arrivals

https://assets.publishing.service.gov.uk/media/689efececc5ef8b4c5fc448c/passenger-arrivals-summary-jun-2025-tables.ods">Passenger arrivals summary tables, year ending June 2025 (ODS, 31.3 KB)

‘Passengers refused entry at the border summary tables’ and ‘Passengers refused entry at the border detailed datasets’ have been discontinued. The latest published versions of these tables are from February 2025 and are available in the ‘Passenger refusals – release discontinued’ section. A similar data series, ‘Refused entry at port and subsequently departed’, is available within the Returns detailed and summary tables.

Electronic travel authorisation

https://assets.publishing.service.gov.uk/media/689efd8307f2cc15c93572d8/electronic-travel-authorisation-datasets-jun-2025.xlsx">Electronic travel authorisation detailed datasets, year ending June 2025 (MS Excel Spreadsheet, 57.1 KB)
ETA_D01: Applications for electronic travel authorisations, by nationality ETA_D02: Outcomes of applications for electronic travel authorisations, by nationality

Entry clearance visas granted outside the UK

https://assets.publishing.service.gov.uk/media/68b08043b430435c669c17a2/visas-summary-jun-2025-tables.ods">Entry clearance visas summary tables, year ending June 2025 (ODS, 56.1 KB)

https://assets.publishing.service.gov.uk/media/689efda51fedc616bb133a38/entry-clearance-visa-outcomes-datasets-jun-2025.xlsx">Entry clearance visa applications and outcomes detailed datasets, year ending June 2025 (MS Excel Spreadsheet, 29.6 MB)
Vis_D01: Entry clearance visa applications, by nationality and visa type
Vis_D02: Outcomes of entry clearance visa applications, by nationality, visa type, and outcome

Additional data relating to in country and overseas Visa applications can be fo
f
Table_2_Operational Challenges in the Use of Structured Secondary Data for...
frontiersin.figshare.com
datasetcatalog.nlm.nih.gov
docx
Updated Jun 1, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kelsy N. Areco; Tulio Konstantyner; Paulo Bandiera-Paiva; Rita C. X. Balda; Daniela T. Costa-Nobre; Adriana Sanudo; Carlos Roberto V. Kiffer; Mandira D. Kawakami; Milton H. Miyoshi; Ana Sílvia Scavacini Marinonio; Rosa M. V. Freitas; Liliam C. C. Morais; Monica L. P. Teixeira; Bernadette Waldvogel; Maria Fernanda B. Almeida; Ruth Guinsburg (2023). Table_2_Operational Challenges in the Use of Structured Secondary Data for Health Research.DOCX [Dataset]. http://doi.org/10.3389/fpubh.2021.642163.s002
Explore at:
docxAvailable download formats
Unique identifier
https://doi.org/10.3389/fpubh.2021.642163.s002
Dataset updated
Jun 1, 2023
Dataset provided by
Frontiers
Authors
Kelsy N. Areco; Tulio Konstantyner; Paulo Bandiera-Paiva; Rita C. X. Balda; Daniela T. Costa-Nobre; Adriana Sanudo; Carlos Roberto V. Kiffer; Mandira D. Kawakami; Milton H. Miyoshi; Ana Sílvia Scavacini Marinonio; Rosa M. V. Freitas; Liliam C. C. Morais; Monica L. P. Teixeira; Bernadette Waldvogel; Maria Fernanda B. Almeida; Ruth Guinsburg
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Background: In Brazil, secondary data for epidemiology are largely available. However, they are insufficiently prepared for use in research, even when it comes to structured data since they were often designed for other purposes. To date, few publications focus on the process of preparing secondary data. The present findings can help in orienting future research projects that are based on secondary data.Objective: Describe the steps in the process of ensuring the adequacy of a secondary data set for a specific use and to identify the challenges of this process.Methods: The present study is qualitative and reports methodological issues about secondary data use. The study material was comprised of 6,059,454 live births and 73,735 infant death records from 2004 to 2013 of children whose mothers resided in the State of São Paulo - Brazil. The challenges and description of the procedures to ensure data adequacy were undertaken in 6 steps: (1) problem understanding, (2) resource planning, (3) data understanding, (4) data preparation, (5) data validation and (6) data distribution. For each step, procedures, and challenges encountered, and the actions to cope with them and partial results were described. To identify the most labor-intensive tasks in this process, the steps were assessed by adding the number of procedures, challenges, and coping actions. The highest values were assumed to indicate the most critical steps.Results: In total, 22 procedures and 23 actions were needed to deal with the 27 challenges encountered along the process of ensuring the adequacy of the study material for the intended use. The final product was an organized database for a historical cohort study suitable for the intended use. Data understanding and data preparation were identified as the most critical steps, accounting for about 70% of the challenges observed for data using.Conclusion: Significant challenges were encountered in the process of ensuring the adequacy of secondary health data for research use, mainly in the data understanding and data preparation steps. The use of the described steps to approach structured secondary data and the knowledge of the potential challenges along the process may contribute to planning health research.
b
Data from: Processing political misinformation: comprehending the Trump...
data.bris.ac.uk
Updated Apr 22, 2017
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2017). Data from: Processing political misinformation: comprehending the Trump phenomenon - Datasets - data.bris [Dataset]. https://data.bris.ac.uk/data/dataset/8001384ef9ab38dd90710ba227c8f7e3
Explore at:
Dataset updated
Apr 22, 2017
Description
This study investigated the cognitive processing of true and false political information. Specifically, it examined the impact of source credibility on the assessment of veracity when information comes from a polarizing source (Experiment 1), and effectiveness of explanations when they come from one's own political party or an opposition party (Experiment 2). These experiments were conducted prior to the 2016 Presidential election. Participants rated their belief in factual and incorrect statements that President Trump made on the campaign trail; facts were subsequently affirmed and misinformation retracted. Participants then re-rated their belief immediately or after a delay. Experiment 1 found that (i) if information was attributed to Trump, Republican supporters of Trump believed it more than if it was presented without attribution, whereas the opposite was true for Democrats and (ii) although Trump supporters reduced their belief in misinformation items following a correction, they did not change their voting preferences. Experiment 2 revealed that the explanation's source had relatively little impact, and belief updating was more influenced by perceived credibility of the individual initially purporting the information. These findings suggest that people use political figures as a heuristic to guide evaluation of what is true or false, yet do not necessarily insist on veracity as a prerequisite for supporting political candidates.
Simplified MM-IMDb
kaggle.com
Updated Dec 5, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Javier Ureña (2024). Simplified MM-IMDb [Dataset]. https://www.kaggle.com/datasets/javierurea/simplified-mm-imdb
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Dec 5, 2024
Dataset provided by
Kaggle
Authors
Javier Ureña
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
The original dataset, contribution from Arevalo et al. in their work Gated Multimodal Units for Information Fusion can be downloaded from their git repository where you can also make use of the web-scrapping scripts they used to create it. From there you can download the hdf5 file and metadata.

The main problem is that this dataset contains data that in many cases is not necessary, for example the image latent features, the words n-grams, imdb ids... Furthermore, the poster captions are already tokenized, so if you want to see the real text then you must apply the ix_to_word dictionary from the metadata, which adds an extra step if you are trying different word tokenizers. The hdf5 file ends up being 15.6GB, plus the metadata npy file which is 65MB, makes a rather big dataset to meddle with if you really want to just use the minimal information.

Simplified MM-IMDb only has two files: - data.npy (18.1MB). Stores image index, one-hot encoding of the genre, and the caption/description of the poster. - images.npz (3.2GB). Stores all dataset images as numpy arrays.

With this dataset you can start training your multimodal models for multi-class classification, modality alignment, Masked-Language-Modelling, caption-based image retrieval, visual question answering, and many more.
Data from: Zomerganzen - Summering geese management and population counts in...
gbif.org
demo.gbif.org
Updated Aug 20, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sander Devisscher; Tim Adriaens; Dimitri Brosens; Frank Huysentruyt; Gerald Driessens; Peter Desmet; Sander Devisscher; Tim Adriaens; Dimitri Brosens; Frank Huysentruyt; Gerald Driessens; Peter Desmet (2025). Zomerganzen - Summering geese management and population counts in Flanders, Belgium [Dataset]. http://doi.org/10.15468/a5ubtp
Explore at:
Unique identifier
https://doi.org/10.15468/a5ubtp
Dataset updated
Aug 20, 2025
Dataset provided by
Global Biodiversity Information Facilityhttps://www.gbif.org/
Research Institute for Nature and Forest (INBO)
Authors
Sander Devisscher; Tim Adriaens; Dimitri Brosens; Frank Huysentruyt; Gerald Driessens; Peter Desmet; Sander Devisscher; Tim Adriaens; Dimitri Brosens; Frank Huysentruyt; Gerald Driessens; Peter Desmet
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Area covered

Description
Zomerganzen - Summering geese management and population counts in Flanders, Belgium is a sampling event dataset published by the Research Institute for Nature and Forest (INBO). The dataset contains over 3,700 sampling events, carried out since 2009, mostly in the months June and July. The data are compiled from different summering geese related projects, but most data were collected through fieldwork within the framework of the EU co-funded Interreg projects INVEXO (http://www.invexo.eu) and RINSE (www.rinse-europe.eu). Since 2015, data collection is funded by INBO. The dataset includes close to 5,000 presence occurrences, as well as over 15,000 absence occurrences. The sampling protocol for the majority of the occurrences are simultaneous counts. Here, the number of individuals of different geese species in a fixed set of areas is determined. Counts are performed within the same weekend to avoid double counting. Simultaneous counts were organised yearly since 2008 and take place the first weekend after July 15, the best period for monitoring the summering population of geese. These counts are performed by professional INBO employees as well as experienced birdwatchers from Natuurpunt using a standardized field protocol. Data are recorded in a citizen science portal (http://waarnemingen.be/waarnemingen_projecten.php?project=231). However, The dataset also comprises opportunistic field observations from the same portal outside this period. Furthermore, data are derived from management actions, such as fertility reduction (egg shaking and pricking), the use of Larsen traps (for Egyptian goose), and the execution of moult captures. Here, the individuals in the dataset were actually removed from the environment. The aim of the data collection is management follow-up and evaluation. Consequently, caution is advised when using these data for trend analysis, distribution range calculation, niche modeling or other. Issues with the dataset can be reported at https://github.com/LifeWatchINBO/data-publication/tree/master/datasets/zomerganzen-events

We strongly believe an open attitude is essential for tackling the IAS problem (Groom et al. 2015). To allow anyone to use this dataset, we have released the data to the public domain under a Creative Commons Zero waiver (http://creativecommons.org/publicdomain/zero/1.0/). We would appreciate it however if you read and follow these norms for data use (http://www.inbo.be/en/norms-for-data-use) and provide a link to the original dataset (https://doi.org/10.15468/a5ubtp) whenever possible. If you use these data for a scientific paper, please cite the dataset following the applicable citation norms and/or consider us for co-authorship. We are always interested to know how you have used or visualized the data, or to provide more information, so please contact us via the contact information provided in the metadata, opendata@inbo.be or https://twitter.com/LifeWatchINBO.
d
Anomaly Detection with Text Mining
catalog.data.gov
res1catalogd-o-tdatad-o-tgov.vcapture.xyz
+3more
Updated Apr 11, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dashlink (2025). Anomaly Detection with Text Mining [Dataset]. https://catalog.data.gov/dataset/anomaly-detection-with-text-mining
Explore at:
Dataset updated
Apr 11, 2025
Dataset provided by
Dashlink
Description
Many existing complex space systems have a significant amount of historical maintenance and problem data bases that are stored in unstructured text forms. The problem that we address in this paper is the discovery of recurring anomalies and relationships between problem reports that may indicate larger systemic problems. We will illustrate our techniques on data from discrepancy reports regarding software anomalies in the Space Shuttle. These free text reports are written by a number of different people, thus the emphasis and wording vary considerably. With Mehran Sahami from Stanford University, I'm putting together a book on text mining called "Text Mining: Theory and Applications" to be published by Taylor and Francis.
Bus statistics data tables
gov.uk
Updated Jun 19, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Department for Transport (2025). Bus statistics data tables [Dataset]. https://www.gov.uk/government/statistical-data-sets/bus-statistics-data-tables
Explore at:
Dataset updated
Jun 19, 2025
Dataset provided by
GOV.UKhttp://gov.uk/
Authors
Department for Transport
Description
Revision

Finalised data on government support for buses was not available when these statistics were originally published (27 November 2024). The Ministry of Housing, Communities and Local Government (MHCLG) have since published that data so the following have been revised to include it:

tables BUS04i and BUS04ii covering costs, fares and revenue

tables BUS05i and BUS05ii covering government support

table BUS08 covering concessionary travel

Revision

The following figures relating to local bus passenger journeys per head have been revised:

table BUS01f

chart 5

map 1

Table BUS01f provides figures on passenger journeys per head of population at Local Transport Authority (LTA) level. Population data for 21 counties were duplicated in error, resulting in the halving of figures in this table. This issue does not affect any other figures in the published tables, including the regional and national breakdowns.

The affected LTAs were: Cambridgeshire, Derbyshire, Devon, East Sussex, Essex, Gloucestershire, Hampshire, Hertfordshire, Kent, Lancashire, Leicestershire, Lincolnshire, Norfolk, Nottinghamshire, Oxfordshire, Staffordshire, Suffolk, Surrey, Warwickshire, West Sussex, and Worcestershire.

A minor typo in the units was also corrected in the BUS02_mi spreadsheet.

A full list of tables can be found in the table index.

Quarterly bus fares statistics

BUS0415: https://assets.publishing.service.gov.uk/media/6852b8d399b009dcdcb73612/bus0415.ods">Local bus fares index by metropolitan area status and country, quarterly: Great Britain (ODS, 35.4 KB)

Local bus passenger journeys (BUS01)

This spreadsheet includes breakdowns by country, region, metropolitan area status, urban-rural classification and Local Authority. It also includes data per head of population, and concessionary journeys.

BUS01: https://assets.publishing.service.gov.uk/media/67603526239b9237f0915411/bus01.ods"> Local bus passenger journeys (ODS, 145 KB)

Limited historic data is available

Local bus vehicle distance travelled (BUS02)

These spreadsheets include breakdowns by country, region, metropolitan area status, urban-rural classification and Local Authority, as well as by service type. Vehicle distance travelled is a measure of levels of service provision.

BUS02_mi: https://assets.publishing.service.gov.uk/media/6760353198302e574b91540c/bus02_mi.ods">Vehicle distance travelled (miles) (ODS, 117 KB)
Annual Population Survey Three-Year Pooled Dataset, January 2021 - December...
beta.ukdataservice.ac.uk
Updated 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Office For National Statistics (2024). Annual Population Survey Three-Year Pooled Dataset, January 2021 - December 2023 [Dataset]. http://doi.org/10.5255/ukda-sn-9291-1
Explore at:
Unique identifier
https://doi.org/10.5255/ukda-sn-9291-1
Dataset updated
2024
Dataset provided by
DataCitehttps://www.datacite.org/
UK Data Servicehttps://ukdataservice.ac.uk/
Authors
Office For National Statistics
Description
The Annual Population Survey (APS) is a major survey series, which aims to provide data that can produce reliable estimates at the local authority level. Key topics covered in the survey include education, employment, health and ethnicity. The APS comprises key variables from the Labour Force Survey (LFS), all its associated LFS boosts and the APS boost. The APS aims to provide enhanced annual data for England, covering a target sample of at least 510 economically active persons for each Unitary Authority (UA)/Local Authority District (LAD) and at least 450 in each Greater London Borough. In combination with local LFS boost samples, the survey provides estimates for a range of indicators down to Local Education Authority (LEA) level across the United Kingdom.
For further detailed information about methodology, users should consult the Labour Force Survey User Guide, included with the APS documentation. For variable and value labelling and coding frames that are not included either in the data or in the current APS documentation, users are advised to consult the latest versions of the LFS User Guides, which are available from the ONS Labour Force Survey - User Guidance webpages.
Occupation data for 2021 and 2022
The ONS has identified an issue with the collection of some occupational data in 2021 and 2022 data files in a number of their surveys. While they estimate any impacts will be small overall, this will affect the accuracy of the breakdowns of some detailed (four-digit Standard Occupational Classification (SOC)) occupations, and data derived from them. None of ONS' headline statistics, other than those directly sourced from occupational data, are affected and you can continue to rely on their accuracy. The affected datasets have now been updated. Further information can be found in the ONS article published on 11 July 2023: Revision of miscoded occupational data in the ONS Labour Force Survey, UK: January 2021 to September 2022
APS Well-Being Datasets
From 2012-2015, the ONS published separate APS datasets aimed at providing initial estimates of subjective well-being, based on the Integrated Household Survey. In 2015 these were discontinued. A separate set of well-being variables and a corresponding weighting variable have been added to the April-March APS person datasets from A11M12 onwards. Further information on the transition can be found in the Personal well-being in the UK: 2015 to 2016 article on the ONS website.

APS disability variables
Over time, there have been some updates to disability variables in the APS. An article explaining the quality assurance investigations on these variables that have been conducted so far is available on the ONS Methodology webpage.
End User Licence and Secure Access APS data
Users should note that there are two versions of each APS dataset. One is available under the standard End User Licence (EUL) agreement, and the other is a Secure Access version. The EUL version includes Government Office Region geography, banded age, 3-digit SOC and industry sector for main, second and last job. The Secure Access version contains more detailed variables relating to:

age: single year of age, year and month of birth, age completed full-time education and age obtained highest qualification, age of oldest dependent child and age of youngest dependent child

family unit and household: including a number of variables concerning the number of dependent children in the family according to their ages, relationship to head of household and relationship to head of family

nationality and country of origin

geography: including county, unitary/local authority, place of work, Nomenclature of Territorial Units for Statistics 2 (NUTS2) and NUTS3 regions, and whether lives and works in same local authority district

health: including main health problem, and current and past health problems

education and apprenticeship: including numbers and subjects of various qualifications and variables concerning apprenticeships

industry: including industry, industry class and industry group for main, second and last job, and industry made redundant from

occupation: including 4-digit Standard Occupational Classification (SOC) for main, second and last job and job made redundant from

system variables: including week number when interview took place and number of households at address

The Secure Access data have more restrictive access conditions than those made available under the standard EUL. Prospective users will need to gain ONS Accredited Researcher status, complete an extra application form and demonstrate to the data owners exactly why they need access to the additional variables. Users are strongly advised to first obtain the standard EUL version of the data to see if they are sufficient for their research requirements.
United States COVID-19 Community Levels by County
data.cdc.gov
healthdata.gov
+1more
csv, xlsx, xml
Updated Nov 2, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
CDC COVID-19 Response (2023). United States COVID-19 Community Levels by County [Dataset]. https://data.cdc.gov/Public-Health-Surveillance/United-States-COVID-19-Community-Levels-by-County/3nnm-4jni
Explore at:
csv, xlsx, xmlAvailable download formats
Dataset updated
Nov 2, 2023
Dataset provided by
Centers for Disease Control and Preventionhttp://www.cdc.gov/
Authors
CDC COVID-19 Response
License
https://www.usa.gov/government-workshttps://www.usa.gov/government-works
Area covered
United States
Description
Reporting of Aggregate Case and Death Count data was discontinued May 11, 2023, with the expiration of the COVID-19 public health emergency declaration. Although these data will continue to be publicly available, this dataset will no longer be updated.

This archived public use dataset has 11 data elements reflecting United States COVID-19 community levels for all available counties.

The COVID-19 community levels were developed using a combination of three metrics — new COVID-19 admissions per 100,000 population in the past 7 days, the percent of staffed inpatient beds occupied by COVID-19 patients, and total new COVID-19 cases per 100,000 population in the past 7 days. The COVID-19 community level was determined by the higher of the new admissions and inpatient beds metrics, based on the current level of new cases per 100,000 population in the past 7 days. New COVID-19 admissions and the percent of staffed inpatient beds occupied represent the current potential for strain on the health system. Data on new cases acts as an early warning indicator of potential increases in health system strain in the event of a COVID-19 surge.

Using these data, the COVID-19 community level was classified as low, medium, or high.

COVID-19 Community Levels were used to help communities and individuals make decisions based on their local context and their unique needs. Community vaccination coverage and other local information, like early alerts from surveillance, such as through wastewater or the number of emergency department visits for COVID-19, when available, can also inform decision making for health officials and individuals.

For the most accurate and up-to-date data for any county or state, visit the relevant health department website. COVID Data Tracker may display data that differ from state and local websites. This can be due to differences in how data were collected, how metrics were calculated, or the timing of web updates.

Archived Data Notes:

This dataset was renamed from "United States COVID-19 Community Levels by County as Originally Posted" to "United States COVID-19 Community Levels by County" on March 31, 2022.

March 31, 2022: Column name for county population was changed to “county_population”. No change was made to the data points previous released.

March 31, 2022: New column, “health_service_area_population”, was added to the dataset to denote the total population in the designated Health Service Area based on 2019 Census estimate.

March 31, 2022: FIPS codes for territories American Samoa, Guam, Commonwealth of the Northern Mariana Islands, and United States Virgin Islands were re-formatted to 5-digit numeric for records released on 3/3/2022 to be consistent with other records in the dataset.

March 31, 2022: Changes were made to the text fields in variables “county”, “state”, and “health_service_area” so the formats are consistent across releases.

March 31, 2022: The “%” sign was removed from the text field in column “covid_inpatient_bed_utilization”. No change was made to the data. As indicated in the column description, values in this column represent the percentage of staffed inpatient beds occupied by COVID-19 patients (7-day average).

March 31, 2022: Data values for columns, “county_population”, “health_service_area_number”, and “health_service_area” were backfilled for records released on 2/24/2022. These columns were added since the week of 3/3/2022, thus the values were previously missing for records released the week prior.

April 7, 2022: Updates made to data released on 3/24/2022 for Guam, Commonwealth of the Northern Mariana Islands, and United States Virgin Islands to correct a data mapping error.

April 21, 2022: COVID-19 Community Level (CCL) data released for counties in Nebraska for the week of April 21, 2022 have 3 counties identified in the high category and 37 in the medium category. CDC has been working with state officials to verify the data submitted, as other data systems are not providing alerts for substantial increases in disease transmission or severity in the state.

May 26, 2022: COVID-19 Community Level (CCL) data released for McCracken County, KY for the week of May 5, 2022 have been updated to correct a data processing error. McCracken County, KY should have appeared in the low community level category during the week of May 5, 2022. This correction is reflected in this update.

May 26, 2022: COVID-19 Community Level (CCL) data released for several Florida counties for the week of May 19th, 2022, have been corrected for a data processing error. Of note, Broward, Miami-Dade, Palm Beach Counties should have appeared in the high CCL category, and Osceola County should have appeared in the medium CCL category. These corrections are reflected in this update.

May 26, 2022: COVID-19 Community Level (CCL) data released for Orange County, New York for the week of May 26, 2022 displayed an erroneous case rate of zero and a CCL category of low due to a data source error. This county should have appeared in the medium CCL category.

June 2, 2022: COVID-19 Community Level (CCL) data released for Tolland County, CT for the week of May 26, 2022 have been updated to correct a data processing error. Tolland County, CT should have appeared in the medium community level category during the week of May 26, 2022. This correction is reflected in this update.

June 9, 2022: COVID-19 Community Level (CCL) data released for Tolland County, CT for the week of May 26, 2022 have been updated to correct a misspelling. The medium community level category for Tolland County, CT on the week of May 26, 2022 was misspelled as “meduim” in the data set. This correction is reflected in this update.

June 9, 2022: COVID-19 Community Level (CCL) data released for Mississippi counties for the week of June 9, 2022 should be interpreted with caution due to a reporting cadence change over the Memorial Day holiday that resulted in artificially inflated case rates in the state.

July 7, 2022: COVID-19 Community Level (CCL) data released for Rock County, Minnesota for the week of July 7, 2022 displayed an artificially low case rate and CCL category due to a data source error. This county should have appeared in the high CCL category.

July 14, 2022: COVID-19 Community Level (CCL) data released for Massachusetts counties for the week of July 14, 2022 should be interpreted with caution due to a reporting cadence change that resulted in lower than expected case rates and CCL categories in the state.

July 28, 2022: COVID-19 Community Level (CCL) data released for all Montana counties for the week of July 21, 2022 had case rates of 0 due to a reporting issue. The case rates have been corrected in this update.

July 28, 2022: COVID-19 Community Level (CCL) data released for Alaska for all weeks prior to July 21, 2022 included non-resident cases. The case rates for the time series have been corrected in this update.

July 28, 2022: A laboratory in Nevada reported a backlog of historic COVID-19 cases. As a result, the 7-day case count and rate will be inflated in Clark County, NV for the week of July 28, 2022.

August 4, 2022: COVID-19 Community Level (CCL) data was updated on August 2, 2022 in error during performance testing. Data for the week of July 28, 2022 was changed during this update due to additional case and hospital data as a result of late reporting between July 28, 2022 and August 2, 2022. Since the purpose of this data set is to provide point-in-time views of COVID-19 Community Levels on Thursdays, any changes made to the data set during the August 2, 2022 update have been reverted in this update.

August 4, 2022: COVID-19 Community Level (CCL) data for the week of July 28, 2022 for 8 counties in Utah (Beaver County, Daggett County, Duchesne County, Garfield County, Iron County, Kane County, Uintah County, and Washington County) case data was missing due to data collection issues. CDC and its partners have resolved the issue and the correction is reflected in this update.

August 4, 2022: Due to a reporting cadence change, case rates for all Alabama counties will be lower than expected. As a result, the CCL levels published on August 4, 2022 should be interpreted with caution.

August 11, 2022: COVID-19 Community Level (CCL) data for the week of August 4, 2022 for South Carolina have been updated to correct a data collection error that resulted in incorrect case data. CDC and its partners have resolved the issue and the correction is reflected in this update.

August 18, 2022: COVID-19 Community Level (CCL) data for the week of August 11, 2022 for Connecticut have been updated to correct a data ingestion error that inflated the CT case rates. CDC, in collaboration with CT, has resolved the issue and the correction is reflected in this update.

August 25, 2022: A laboratory in Tennessee reported a backlog of historic COVID-19 cases. As a result, the 7-day case count and rate may be inflated in many counties and the CCLs published on August 25, 2022 should be interpreted with caution.

August 25, 2022: Due to a data source error, the 7-day case rate for St. Louis County, Missouri, is reported as zero in the COVID-19 Community Level data released on August 25, 2022. Therefore, the COVID-19 Community Level for this county should be interpreted with caution.

September 1, 2022: Due to a reporting issue, case rates for all Nebraska counties will include 6 days of data instead of 7 days in the COVID-19 Community Level (CCL) data released on September 1, 2022. Therefore, the CCLs for all Nebraska counties should be interpreted with caution.

September 8, 2022: Due to a data processing error, the case rate for Philadelphia County, Pennsylvania,
g
Coronavirus (Covid-19) Data in the United States
github.com
openicpsr.org
+2more
csv
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
New York Times, Coronavirus (Covid-19) Data in the United States [Dataset]. https://github.com/nytimes/covid-19-data
Explore at:
csvAvailable download formats
Dataset provided by
New York Times
License
https://github.com/nytimes/covid-19-data/blob/master/LICENSEhttps://github.com/nytimes/covid-19-data/blob/master/LICENSE
Description
The New York Times is releasing a series of data files with cumulative counts of coronavirus cases in the United States, at the state and county level, over time. We are compiling this time series data from state and local governments and health departments in an attempt to provide a complete record of the ongoing outbreak.
Since the first reported coronavirus case in Washington State on Jan. 21, 2020, The Times has tracked cases of coronavirus in real time as they were identified after testing. Because of the widespread shortage of testing, however, the data is necessarily limited in the picture it presents of the outbreak.
We have used this data to power our maps and reporting tracking the outbreak, and it is now being made available to the public in response to requests from researchers, scientists and government officials who would like access to the data to better understand the outbreak.
The data begins with the first reported coronavirus case in Washington State on Jan. 21, 2020. We will publish regular updates to the data in this repository.
d
Geoscape Administrative Boundaries
data.gov.au
researchdata.edu.au
zip
Updated Jul 15, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Department of Industry, Science and Resources (DISR) (2025). Geoscape Administrative Boundaries [Dataset]. https://data.gov.au/data/dataset/geoscape-administrative-boundaries
Explore at:
zip(1897457552), zip(1051292340), zip(1844909540), zip(1069165202)Available download formats
Dataset updated
Jul 15, 2025
Dataset authored and provided by
Department of Industry, Science and Resources (DISR)
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Please note this dataset is the most recent version of the Administrative Boundaries (AB). For previous versions of the AB please go to this url: https://data.gov.au/data/dataset/previous-versions-of-the-geoscape-administrative-boundaries

Geoscape Administrative Boundaries is Australia’s most comprehensive national collection of boundaries, including government, statistical and electoral boundaries. It is built and maintained by Geoscape Australia using authoritative government data. Further information about contributors to Administrative Boundaries is available here.

This dataset comprises seven Geoscape products:

Localities

Local Government Areas (LGAs)

Wards

Australian Bureau of Statistics (ABS) Boundaries

Electoral Boundaries

State Boundaries and

Town Points

Updated versions of Administrative Boundaries are published on a quarterly basis.

Users have the option to download datasets with feature coordinates referencing either GDA94 or GDA2020 datums.

Notable changes in the May 2025 release

Victorian Wards have seen almost half of the dataset change now reflecting the boundaries from the 2024 subdivision review. https://www.vec.vic.gov.au/electoral-boundaries/council-reviews/ subdivision-reviews.

There have been spatial changes (area) greater than 1 km2 to 66 wards in Victoria.

One new locality ‘Kenwick Island’ has been added to the local Government area ‘Mackay Regional’ in Queensland.

There have been spatial changes(area) greater than 1 km2 to the local government areas 'Burke Shire' and 'Mount Isa City' in Queensland.

There have been spatial changes(area) greater than 1 km2 to the localities ‘Nicholson’, ‘Lawn Hill’ and ‘Coral Sea’ in Queensland and ‘Calguna’, ‘Israelite Bay’ and ‘Balladonia’ in Western Australia.

An update to the NT Commonwealth Electoral Boundaries has been applied to reflect the redistribution of the boundaries gazetted on 4 March 2025.

Geoscape has become aware that the DATE_CREATED and DATE_RETIRED attributes in the commonwealth_electoral_polygon MapInfo TAB tables were incorrectly ordered and did not match the product data model. These attributes have been re-ordered to match the data model for the May 2025 release.

IMPORTANT NOTE: correction of issues with the 22 November 2022 release

On 28 November 2022, the Administrative Boundaries dataset originally released on 22 November 2022 was amended and re-uploaded after Geoscape identified some issues with the original data for 'Electoral Boundaries'.

As a result of the error, some shapefiles were published in 3D rather than 2D, which may affect some users when importing data into GIS applications.

The error affected the Electoral Boundaries dataset, specifically the Commonwealth boundary data for Victoria and Western Australia, including 'All States'.

Only the ESRI Shapefile formats were affected (both GDA94 and GDA2020). The MapInfo TAB format was not affected.

Because the datasets are zipped into a single file, once the error was fixed by Geoscape all of Administrative Boundaries shapefiles had to be re-uploaded, rather than only the affected files.

If you downloaded either of the two Administrative Boundary ESRI Shapefiles between 22 November and 28 November 2022 and plan to use the Electoral Boundary component, you are advised to download the revised version dated 28 November 2022. Apologies for any inconvenience.

Further information on Administrative Boundaries, including FAQs on the data, is available here or through Geoscape Australia’s network of partners. They provide a range of commercial products based on Administrative Boundaries, including software solutions, consultancy and support.

Note: On 1 October 2020, PSMA Australia Limited began trading as Geoscape Australia.

The Australian Government has negotiated the release of Administrative Boundaries to the whole economy under an open CCBY 4.0 licence.

Users must only use the data in ways that are consistent with the Australian Privacy Principles issued under the Privacy Act 1988 (Cth).

Users must also note the following attribution requirements:

Preferred attribution for the Licensed Material:

Administrative Boundaries © Geoscape Australia licensed by the Commonwealth of Australia under Creative Commons Attribution 4.0 International license (CC BY 4.0).

Preferred attribution for Adapted Material:

Incorporates or developed using Administrative Boundaries © Geoscape Australia licensed by the Commonwealth of Australia under Creative Commons Attribution 4.0 International licence (CC BY 4.0).

What to Expect When You Download Administrative Boundaries

Administrative Boundaries is large dataset (around 1.5GB unpacked), made up of seven themes each containing multiple layers.

Users are advised to read the technical documentation including the product change notices and the individual product descriptions before downloading and using the product.

Please note this dataset is the most recent version of the Administrative Boundaries (AB). For previous versions of the AB please go to this url: https://data.gov.au/dataset/ds-dga-b4ad5702-ea2b-4f04-833c-d0229bfd689e/details?q=previous

License Information
f
Data from: S1 Dataset -
plos.figshare.com
zip
Updated Jul 16, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ying Li; Yanyu Geng; Huankun Sheng (2024). S1 Dataset - [Dataset]. http://doi.org/10.1371/journal.pone.0307288.s001
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0307288.s001
Dataset updated
Jul 16, 2024
Dataset provided by
PLOS ONE
Authors
Ying Li; Yanyu Geng; Huankun Sheng
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Feature selection is an important solution for dealing with high-dimensional data in the fields of machine learning and data mining. In this paper, we present an improved mountain gazelle optimizer (IMGO) based on the newly proposed mountain gazelle optimizer (MGO) and design a binary version of IMGO (BIMGO) to solve the feature selection problem for medical data. First, the gazelle population is initialized using iterative chaotic map with infinite collapses (ICMIC) mapping, which increases the diversity of the population. Second, a nonlinear control factor is introduced to balance the exploration and exploitation components of the algorithm. Individuals in the population are perturbed using a spiral perturbation mechanism to enhance the local search capability of the algorithm. Finally, a neighborhood search strategy is used for the optimal individuals to enhance the exploitation and convergence capabilities of the algorithm. The superior ability of the IMGO algorithm to solve continuous problems is demonstrated on 23 benchmark datasets. Then, BIMGO is evaluated on 16 medical datasets of different dimensions and compared with 8 well-known metaheuristic algorithms. The experimental results indicate that BIMGO outperforms the competing algorithms in terms of the fitness value, number of selected features and sensitivity. In addition, the statistical results of the experiments demonstrate the significantly superior ability of BIMGO to select the most effective features in medical datasets.

Facebook

Twitter

Click to copy link

Link copied

Cite

Jacob Kaplan (2017). Jacob Kaplan's Concatenated Files: Uniform Crime Reporting Program Data: Offenses Known and Clearances by Arrest (Return A), 1960-2020 [Dataset]. http://doi.org/10.3886/E100707V17

Jacob Kaplan's Concatenated Files: Uniform Crime Reporting Program Data: Offenses Known and Clearances by Arrest (Return A), 1960-2020

Explore at:

32 scholarly articles cite this dataset (View in Google Scholar)

Unique identifier

https://doi.org/10.3886/E100707V17

Dataset updated

Jun 5, 2017

Dataset provided by

Princeton University

Authors

Jacob Kaplan

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Time period covered

1960 - 2020

Area covered

United States

Description

For a comprehensive guide to this data and other UCR data, please see my book at ucrbook.comVersion 17 release notes:Adds data for 2020.Please note that the FBI has retired UCR data ending in 2020 data so this will be the last Offenses Known and Clearances by Arrest data they release. Changes .rda files to .rds. Please note that in 2020 the card_actual_pt variable always returns that the month was reported. This causes 2020 to report that all months are reported for all agencies because I use the card_actual_pt variable to measure how many months were reported. This variable is almost certainly incorrect since it is extremely unlikely that all agencies suddenly always report. However, I am keeping this incorrect value to maintain a consistent definition of how many months are missing (measuring missing months through card_actual_type, for example, gives different results for previous years so I don't want to change this). Version 16 release notes:Changes release notes description, does not change data.Version 15 release notes:Adds data for 2019.Please note that in 2019 the card_actual_pt variable always returns that the month was reported. This causes 2019 to report that all months are reported for all agencies because I use the card_actual_pt variable to measure how many months were reported. This variable is almost certainly incorrect since it is extremely unlikely that all agencies suddenly always report. However, I am keeping this incorrect value to maintain a consistent definition of how many months are missing (measuring missing months through card_actual_type, for example, gives different results for previous years so I don't want to change this). Version 14 release notes:Adds arson data from the UCR's Arson dataset. This adds just the arson variables about the number of arson incidents, not the complete set of variables in that dataset (which include damages from arson and whether structures were occupied or not during the arson.As arson is an index crime, both the total index and the index property columns now include arson offenses. The "all_crimes" variables also now include arson.Adds a arson_number_of_months_missing column indicating how many months were not reporting (i.e. missing from the annual data) in the arson data. In most cases, this is the same as the normal number_of_months_missing but not always so please check if you intend to use arson data.Please note that in 2018 the card_actual_pt variable always returns that the month was reported. This causes 2018 to report that all months are reported for all agencies because I use the card_actual_pt variable to measure how many months were reported. This variable is almost certainly incorrect since it is extremely unlikely that all agencies suddenly always report. However, I am keeping this incorrect value to maintain a consistent definition of how many months are missing (measuring missing months through card_actual_type, for example, gives different results for previous years so I don't want to change this).For some reason, a small number of agencies (primarily federal agencies) had the same ORI number in 2018 and I removed these duplicate agencies. Version 13 release notes: Adds 2018 dataNew Orleans (ORI = LANPD00) data had more unfounded crimes than actual crimes in 2018 so unfounded columns for 2018 are all NA. Version 12 release notes: Adds population 1-3 columns - if an agency is in multiple counties, these variables show the population in the county with the most people in that agency in it (population_1), second largest county (population_2), and third largest county (population_3). Also adds county 1-3 columns which identify which counties the agency is in. The population column is the sum of the three population columns. Thanks to Mike Maltz for the suggestion!Fixes bug in the crosswalk data that is merged to this file that had the incorrect FIPS code for Clinton, Tennessee (ORI = TN00101). Thanks for Brooke Watson for catching this bug!Adds a last_month_reported column which says which month was reported last. This is actually how the FBI defines number_of_months_reported so is a more accurate representation of that. Removes the number_of_months_reported variable as the name is misleading. You should use the last_month_reported or the number_of_months_missing (see below) variable instead.Adds a number_of_months_missin

Clear search

Close search

Google apps

Main menu

Jacob Kaplan's Concatenated Files: Uniform Crime Reporting Program Data:...

U.S. Household Mental Health & Covid-19

U.S. Household Mental Health & Covid-19

Assessing the Impact of the Pandemic

About this dataset

More Datasets

Featured Notebooks

How to use the dataset

Research Ideas

Acknowledgements

License

US Broadband Usage Across Counties

US Broadband Usage Across Counties

Utilizing Microsoft's Data to Estimate Access

About this dataset

More Datasets

Featured Notebooks

How to use the dataset

How to Use the US Broadband Usage Dataset

Research Ideas

Acknowledgements

License

Columns

Acknowledgements

Average daily time spent on social media worldwide 2012-2025

Mental Health Services Monthly Statistics

Mental health and learning disabilities statistics monthly report: final...

Ukraine - Subnational Administrative Boundaries - enhanced metadata...

Global number of breached user accounts Q1 2020-Q2 2025

Immigration system statistics data tables

Accessible file formats

Related content

Passenger arrivals

Electronic travel authorisation

Entry clearance visas granted outside the UK

Table_2_Operational Challenges in the Use of Structured Secondary Data for...

Data from: Processing political misinformation: comprehending the Trump...

Simplified MM-IMDb

Data from: Zomerganzen - Summering geese management and population counts in...

Anomaly Detection with Text Mining

Bus statistics data tables

Quarterly bus fares statistics

Local bus passenger journeys (BUS01)

Local bus vehicle distance travelled (BUS02)

Annual Population Survey Three-Year Pooled Dataset, January 2021 - December...

United States COVID-19 Community Levels by County

Coronavirus (Covid-19) Data in the United States

Geoscape Administrative Boundaries

What to Expect When You Download Administrative Boundaries

License Information

Data from: S1 Dataset -

Jacob Kaplan's Concatenated Files: Uniform Crime Reporting Program Data: Offenses Known and Clearances by Arrest (Return A), 1960-2020See More Versions

Jacob Kaplan's Concatenated Files: Uniform Crime Reporting Program Data: Offenses Known and Clearances by Arrest (Return A), 1960-2020