100+ datasets found
  1. English Wikipedia People Dataset

    • kaggle.com
    zip
    Updated Jul 31, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Wikimedia (2025). English Wikipedia People Dataset [Dataset]. https://www.kaggle.com/datasets/wikimedia-foundation/english-wikipedia-people-dataset
    Explore at:
    zip(4293465577 bytes)Available download formats
    Dataset updated
    Jul 31, 2025
    Dataset provided by
    Wikimedia Foundationhttp://www.wikimedia.org/
    Authors
    Wikimedia
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    Summary

    This dataset contains biographical information derived from articles on English Wikipedia as it stood in early June 2024. It was created as part of the Structured Contents initiative at Wikimedia Enterprise and is intended for evaluation and research use.

    The beta sample dataset is a subset of the Structured Contents Snapshot focusing on people with infoboxes in EN wikipedia; outputted as json files (compressed in tar.gz).

    We warmly welcome any feedback you have. Please share your thoughts, suggestions, and any issues you encounter on the discussion page for this dataset here on Kaggle.

    Data Structure

    • File name: wme_people_infobox.tar.gz
    • Size of compressed file: 4.12 GB
    • Size of uncompressed file: 21.28 GB

    Noteworthy Included Fields: - name - title of the article. - identifier - ID of the article. - image - main image representing the article's subject. - description - one-sentence description of the article for quick reference. - abstract - lead section, summarizing what the article is about. - infoboxes - parsed information from the side panel (infobox) on the Wikipedia article. - sections - parsed sections of the article, including links. Note: excludes other media/images, lists, tables and references or similar non-prose sections.

    The Wikimedia Enterprise Data Dictionary explains all of the fields in this dataset.

    Stats

    Infoboxes - Compressed: 2GB - Uncompressed: 11GB

    Infoboxes + sections + short description - Size of compressed file: 4.12 GB - Size of uncompressed file: 21.28 GB

    Article analysis and filtering breakdown: - total # of articles analyzed: 6,940,949 - # people found with QID: 1,778,226 - # people found with Category: 158,996 - people found with Biography Project: 76,150 - Total # of people articles found: 2,013,372 - Total # people articles with infoboxes: 1,559,985 End stats - Total number of people articles in this dataset: 1,559,985 - that have a short description: 1,416,701 - that have an infobox: 1,559,985 - that have article sections: 1,559,921

    This dataset includes 235,146 people articles that exist on Wikipedia but aren't yet tagged on Wikidata as instance of:human.

    Maintenance and Support

    This dataset was originally extracted from the Wikimedia Enterprise APIs on June 5, 2024. The information in this dataset may therefore be out of date. This dataset isn't being actively updated or maintained, and has been shared for community use and feedback. If you'd like to retrieve up-to-date Wikipedia articles or data from other Wikiprojects, get started with Wikimedia Enterprise's APIs

    Initial Data Collection and Normalization

    The dataset is built from the Wikimedia Enterprise HTML “snapshots”: https://enterprise.wikimedia.com/docs/snapshot/ and focuses on the Wikipedia article namespace (namespace 0 (main)).

    Who are the source language producers?

    Wikipedia is a human generated corpus of free knowledge, written, edited, and curated by a global community of editors since 2001. It is the largest and most accessed educational resource in history, accessed over 20 billion times by half a billion people each month. Wikipedia represents almost 25 years of work by its community; the creation, curation, and maintenance of millions of articles on distinct topics. This dataset includes the biographical contents of English Wikipedia language editions: English https://en.wikipedia.org/, written by the community.

    Attribution

    Terms and conditions

    Wikimedia Enterprise provides this dataset under the assumption that downstream users will adhere to the relevant free culture licenses when the data is reused. In situations where attribution is required, reusers should identify the Wikimedia project from which the content was retrieved as the source of the content. Any attribution should adhere to Wikimedia’s trademark policy (available at https://foundation.wikimedia.org/wiki/Trademark_policy) and visual identity guidelines (ava...

  2. d

    Traffic Crashes - People

    • datasets.ai
    • data.cityofchicago.org
    • +2more
    23, 40, 55, 8
    Updated Nov 10, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    City of Chicago (2020). Traffic Crashes - People [Dataset]. https://datasets.ai/datasets/traffic-crashes-people
    Explore at:
    55, 23, 8, 40Available download formats
    Dataset updated
    Nov 10, 2020
    Dataset authored and provided by
    City of Chicago
    Description

    This data contains information about people involved in a crash and if any injuries were sustained. This dataset should be used in combination with the traffic Crash and Vehicle dataset. Each record corresponds to an occupant in a vehicle listed in the Crash dataset. Some people involved in a crash may not have been an occupant in a motor vehicle, but may have been a pedestrian, bicyclist, or using another non-motor vehicle mode of transportation. Injuries reported are reported by the responding police officer. Fatalities that occur after the initial reports are typically updated in these records up to 30 days after the date of the crash. Person data can be linked with the Crash and Vehicle dataset using the “CRASH_RECORD_ID” field. A vehicle can have multiple occupants and hence have a one to many relationship between Vehicle and Person dataset. However, a pedestrian is a “unit” by itself and have a one to one relationship between the Vehicle and Person table.

    The Chicago Police Department reports crashes on IL Traffic Crash Reporting form SR1050. The crash data published on the Chicago data portal mostly follows the data elements in SR1050 form. The current version of the SR1050 instructions manual with detailed information on each data elements is available here.

    Change 11/21/2023: We have removed the RD_NO (Chicago Police Department report number) for privacy reasons.

  3. Novel Covid-19 Dataset

    • kaggle.com
    Updated Sep 18, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    GHOST5612 (2025). Novel Covid-19 Dataset [Dataset]. https://www.kaggle.com/datasets/ghost5612/novel-covid-19-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Sep 18, 2025
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    GHOST5612
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Context:

    From World Health Organization - On 31 December 2019, WHO was alerted to several cases of pneumonia in Wuhan City, Hubei Province of China. The virus did not match any other known virus. This raised concern because when a virus is new, we do not know how it affects people.

    So daily level information on the affected people can give some interesting insights when it is made available to the broader data science community.

    Johns Hopkins University has made an excellent dashboard using the affected cases data. Data is extracted from the google sheets associated and made available here.

    Edited:

    Now data is available as csv files in the Johns Hopkins Github repository. Please refer to the github repository for the Terms of Use details. Uploading it here for using it in Kaggle kernels and getting insights from the broader DS community.

    Content

    2019 Novel Coronavirus (2019-nCoV) is a virus (more specifically, a coronavirus) identified as the cause of an outbreak of respiratory illness first detected in Wuhan, China. Early on, many of the patients in the outbreak in Wuhan, China reportedly had some link to a large seafood and animal market, suggesting animal-to-person spread. However, a growing number of patients reportedly have not had exposure to animal markets, indicating person-to-person spread is occurring. At this time, it’s unclear how easily or sustainably this virus is spreading between people - CDC

    This dataset has daily level information on the number of affected cases, deaths and recovery from 2019 novel coronavirus. Please note that this is a time series data and so the number of cases on any given day is the cumulative number.

    The data is available from 22 Jan, 2020.

    Here’s a polished version suitable for a professional Kaggle dataset description:

    Dataset Description

    This dataset contains time-series and case-level records of the COVID-19 pandemic. The primary file is covid_19_data.csv, with supporting files for earlier records and individual-level line list data.

    Files and Columns

    1. covid_19_data.csv (Main File)

    This is the primary dataset and contains aggregated COVID-19 statistics by location and date.

    • Sno – Serial number of the record
    • ObservationDate – Date of the observation (MM/DD/YYYY)
    • Province/State – Province or state of the observation (may be missing for some entries)
    • Country/Region – Country of the observation
    • Last Update – Timestamp (UTC) when the record was last updated (not standardized, requires cleaning before use)
    • Confirmed – Cumulative number of confirmed cases on that date
    • Deaths – Cumulative number of deaths on that date
    • Recovered – Cumulative number of recoveries on that date

    2. 2019_ncov_data.csv (Legacy File)

    This file contains earlier COVID-19 records. It is no longer updated and is provided only for historical reference. For current analysis, please use covid_19_data.csv.

    3. COVID_open_line_list_data.csv

    This file provides individual-level case information, obtained from an open data source. It includes patient demographics, travel history, and case outcomes.

    4. COVID19_line_list_data.csv

    Another individual-level case dataset, also obtained from public sources, with detailed patient-level information useful for micro-level epidemiological analysis.

    ✅ Use covid_19_data.csv for up-to-date aggregated global trends.

    ✅ Use the line list datasets for detailed, individual-level case analysis.

    Country level datasets:

    If you are interested in knowing country level data, please refer to the following Kaggle datasets:

    India - https://www.kaggle.com/sudalairajkumar/covid19-in-india

    South Korea - https://www.kaggle.com/kimjihoo/coronavirusdataset

    Italy - https://www.kaggle.com/sudalairajkumar/covid19-in-italy

    Brazil - https://www.kaggle.com/unanimad/corona-virus-brazil

    USA - https://www.kaggle.com/sudalairajkumar/covid19-in-usa

    Switzerland - https://www.kaggle.com/daenuprobst/covid19-cases-switzerland

    Indonesia - https://www.kaggle.com/ardisragen/indonesia-coronavirus-cases

    Acknowledgements :

    Johns Hopkins University for making the data available for educational and academic research purposes

    MoBS lab - https://www.mobs-lab.org/2019ncov.html

    World Health Organization (WHO): https://www.who.int/

    DXY.cn. Pneumonia. 2020. http://3g.dxy.cn/newh5/view/pneumonia.

    BNO News: https://bnonews.com/index.php/2020/02/the-latest-coronavirus-cases/

    National Health Commission of the People’s Republic of China (NHC): http://www.nhc.gov.cn/xcs/yqtb/list_gzbd.shtml

    China CDC (CCDC): http://weekly.chinacdc.cn/news/TrackingtheEpidemic.htm

    Hong Kong Department of Health: https://www.chp.gov.hk/en/features/102465.html

    Macau Government: https://www.ssm.gov.mo/portal/

    Taiwan CDC: https://sites.google....

  4. d

    Learning Disability Services Monthly Statistics, AT: November 2024, MHSDS:...

    • digital.nhs.uk
    Updated Nov 15, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). Learning Disability Services Monthly Statistics, AT: November 2024, MHSDS: October 2024 [Dataset]. https://digital.nhs.uk/data-and-information/publications/statistical/learning-disability-services-statistics
    Explore at:
    Dataset updated
    Nov 15, 2024
    License

    https://digital.nhs.uk/about-nhs-digital/terms-and-conditionshttps://digital.nhs.uk/about-nhs-digital/terms-and-conditions

    Time period covered
    Nov 1, 2024 - Nov 30, 2024
    Description

    Latest monthly statistics on Learning Disabilities and Autism (LDA) patients from the Assuring Transformation (AT) collection and Mental Health Services Data Set (MHSDS). Data on inpatients with learning disabilities and/or autism are being collected both within the AT collection and MHSDS. There are differences in the inpatient figures between the AT and MHSDS data sets and work has been ongoing to better understand these. LDA data from MHSDS are experimental statistics, however, while impacts from the cyber incident are still present they will be considered to be management information. From April 2024, LDA MHSDS data has been collected under MHSDS version 6. From 1 July 2022, Integrated Care Boards were established within Integrated Care Systems data and replaced Sustainability and Transformation Plans (STPs). Clinical Commissioning Groups have been replaced by sub-Integrated Care Boards. Data for the AT collection is now submitted by sub-Integrated Care Boards. This has resulted in some renaming within tables and the inclusion of a new Table 5.1b with a patient breakdown by submitting organisation. Patients by originating organisation and commissioning type are still available in Table 5.1a. Data in the tables are now presented by the current organisational structures. Old organisational structures have been mapped to new structures in any time series.

  5. d

    COVID-19 case rate per 100,000 population and percent test positivity in the...

    • catalog.data.gov
    • data.ct.gov
    • +1more
    Updated Aug 12, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    data.ct.gov (2023). COVID-19 case rate per 100,000 population and percent test positivity in the last 7 days by town - ARCHIVE [Dataset]. https://catalog.data.gov/dataset/covid-19-case-rate-per-100000-population-and-percent-test-positivity-in-the-last-7-days-by
    Explore at:
    Dataset updated
    Aug 12, 2023
    Dataset provided by
    data.ct.gov
    Description

    DPH note about change from 7-day to 14-day metrics: As of 10/15/2020, this dataset is no longer being updated. Starting on 10/15/2020, these metrics will be calculated using a 14-day average rather than a 7-day average. The new dataset using 14-day averages can be accessed here: https://data.ct.gov/Health-and-Human-Services/COVID-19-case-rate-per-100-000-population-and-perc/hree-nys2 As you know, we are learning more about COVID-19 all the time, including the best ways to measure COVID-19 activity in our communities. CT DPH has decided to shift to 14-day rates because these are more stable, particularly at the town level, as compared to 7-day rates. In addition, since the school indicators were initially published by DPH last summer, CDC has recommended 14-day rates and other states (e.g., Massachusetts) have started to implement 14-day metrics for monitoring COVID transmission as well. With respect to geography, we also have learned that many people are looking at the town-level data to inform decision making, despite emphasis on the county-level metrics in the published addenda. This is understandable as there has been variation within counties in COVID-19 activity (for example, rates that are higher in one town than in most other towns in the county). This dataset includes a weekly count and weekly rate per 100,000 population for COVID-19 cases, a weekly count of COVID-19 PCR diagnostic tests, and a weekly percent positivity rate for tests among people living in community settings. Dates are based on date of specimen collection (cases and positivity). A person is considered a new case only upon their first COVID-19 testing result because a case is defined as an instance or bout of illness. If they are tested again subsequently and are still positive, it still counts toward the test positivity metric but they are not considered another case. These case and test counts do not include cases or tests among people residing in congregate settings, such as nursing homes, assisted living facilities, or correctional facilities. These data are updated weekly; the previous week period for each dataset is the previous Sunday-Saturday, known as an MMWR week (https://wwwn.cdc.gov/nndss/document/MMWR_week_overview.pdf). The date listed is the date the dataset was last updated and corresponds to a reporting period of the previous MMWR week. For instance, the data for 8/20/2020 corresponds to a reporting period of 8/9/2020-8/15/2020. Notes: 9/25/2020: Data for Mansfield and Middletown for the week of Sept 13-19 were unavailable at the time of reporting due to delays in lab reporting.

  6. d

    Mental Health and Learning Disabilities Statistics

    • digital.nhs.uk
    csv, pdf, xls, xlsx
    Updated Jan 22, 2015
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2015). Mental Health and Learning Disabilities Statistics [Dataset]. https://digital.nhs.uk/data-and-information/publications/statistical/mental-health-and-learning-disabilities-statistics
    Explore at:
    csv(7.1 MB), csv(1.4 MB), csv(1.3 MB), pdf(182.7 kB), pdf(569.4 kB), xls(483.8 kB), xlsx(423.1 kB), csv(7.0 MB), xls(466.9 kB), xls(484.9 kB), pdf(633.9 kB)Available download formats
    Dataset updated
    Jan 22, 2015
    License

    https://digital.nhs.uk/about-nhs-digital/terms-and-conditionshttps://digital.nhs.uk/about-nhs-digital/terms-and-conditions

    Time period covered
    Sep 1, 2014 - Nov 30, 2014
    Area covered
    England
    Description

    This statistical release makes available the most recent Mental Health and Learning Disabilities Dataset (MHLDDS) final monthly data (October 2014) along with final data from September 2014. This publication presents a wide range of information about care delivered to users of NHS funded secondary mental health and learning disability services in England. The scope of the Mental Health Minimum Dataset (MHMDS) was extended to cover Learning Disability services from September 2014. Many people who have a learning disability use mental health services and people in learning disability services may have a mental health problem. This means that activity included in the new MHLDDS dataset cannot be distinctly divided into mental health or learning disability spells of care - a single spell of care may include inputs from either of both types of service. We will be working with stakeholders to define specific information and reporting requirements relating to specific services or groups of patients. Four new measures have been added to this release to help with interpretation of the data. At local level these contextual figures will provide an indication of the increased caseload that could be attributed to the extension of the dataset to cover LD services. Information on these measures can found in the Announcement of Change paper which accompanies this release. The Currencies and Payment file that forms part of this release is specifically limited to services in scope for currencies and payment in mental health services and remains unchanged. This information will be of particular interest to organisations involved in delivering secondary mental health and learning disability care to adults and older people, as it presents timely information to support discussions between providers and commissioners of services. The MHLDS Monthly Report also includes reporting by local authority for the first time. For patients, researchers, agencies, and the wider public it aims to provide up to date information about the numbers of people using services, spending time in hospital and subject to the Mental Health Act (MHA). Some of these measures are currently experimental analysis. The Currency and Payment (CaP) measures can be found in a separate machine-readable data file and may also be accessed via an on-line interactive visualisation tool that supports benchmarking. This can be accessed through the related links at the bottom of the page.

  7. Human Resource Data Set (The Company)

    • kaggle.com
    zip
    Updated Nov 12, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Koluit (2025). Human Resource Data Set (The Company) [Dataset]. https://www.kaggle.com/datasets/koluit/human-resource-data-set-the-company
    Explore at:
    zip(401322 bytes)Available download formats
    Dataset updated
    Nov 12, 2025
    Authors
    Koluit
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    Context

    Similar to others who have created HR data sets, we felt that the lack of data out there for HR was limiting. It is very hard for someone to test new systems or learn People Analytics in the HR space. The only dataset most HR practitioners have is their real employee data and there are a lot of reasons why you would not want to use that when experimenting. We hope that by providing this dataset with an evergrowing variation of data points, others can learn and grow their HR data analytics and systems knowledge.

    Some example test cases where someone might use this dataset:

    HR Technology Testing and Mock-Ups Engagement survey tools HCM tools BI Tools Learning To Code For People Analytics Python/R/SQL HR Tech and People Analytics Educational Courses/Tools

    Content

    The core data CompanyData.txt has the basic demographic data about a worker. We treat this as the core data that you can join future data sets to.

    Please read the Readme.md for additional information about this along with the Changelog for additional updates as they are made.

    Acknowledgements

    Initial names, addresses, and ages were generated using FakenameGenerator.com. All additional details including Job, compensation, and additional data sets were created by the Koluit team using random generation in Excel.

    Inspiration

    Our hope is this data is used in the HR or Research space to experiment and learn using HR data. Some examples that we hope this data will be used are listed above.

    Contact Us

    Have any suggestions for additions to the data? See any issues with our data? Want to use it for your project? Please reach out to us! https://koluit.com/ ryan@koluit.com

  8. p

    Ghana Number Dataset

    • listtodata.com
    .csv, .xls, .txt
    Updated Jul 17, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    List to Data (2025). Ghana Number Dataset [Dataset]. https://listtodata.com/ghana-dataset
    Explore at:
    .csv, .xls, .txtAvailable download formats
    Dataset updated
    Jul 17, 2025
    Authors
    List to Data
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Time period covered
    Jan 1, 2025 - Dec 31, 2025
    Area covered
    Ghana
    Variables measured
    phone numbers, Email Address, full name, Address, City, State, gender,age,income,ip address,
    Description

    Ghana number dataset has accurate numbers attached with verified through our team. These client contact data belong to active users only. In fact, these things make it a valuable marketing resource. Whether your business is new or old, you can boost your reach and connect to a large audience with this database. Again, you will find many people who have an interest in your products and will accept from you. Moreover, the Ghana number dataset will support you make your brand more renowned. In other words, by becoming a known brand in the market, you can increase your brand value greatly. Similarly, many people will show interest in your products and services. However, the contacts on this mobile number list are active and real. Yet, you will benefit greatly if you purchase this cheap but valuable database. Ghana phone data can be a great solution for SMS and telemarketing. Anyone can use the contact lead here to reach different people in this area. Ghana phone data allows you to give product details with your messages to make them more appealing and reliable. Your product quality and content will catch the attention of the interested audience. This will create more traffic and you can reach sales from there. Likewise, the Ghana phone data is an opt-in and permission-based contact list. In addition, with an affordable yet fresh list like ours, your marketing will be more effective. People can now relate to your business more after you successfully use this tool. Thus, order the contact library now from List To Data to promote your goods and services everywhere inside the country. Ghana phone number list is a massive database. Our team promises you sincere service and active support. In general, you can contact us anytime on our website if you face any problems with our list. Our support team will solve the problem for you, thus you don’t have to worry about not obtaining the worth of your money. Further, the Ghana phone number list will aid your business in many new ways. The benefits of marketing on SMS marketing are enormous as we all know very well. Moreover, no one wants to miss out on such a huge and versatile audience in Ghana. Hence, purchasing this contact number package will be a gem for any business any day.

  9. O

    COVID-19 case rate per 100,000 population and percent test positivity in the...

    • data.ct.gov
    • catalog.data.gov
    csv, xlsx, xml
    Updated Jun 23, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Department of Public Health (2022). COVID-19 case rate per 100,000 population and percent test positivity in the last 14 days by town - ARCHIVE [Dataset]. https://data.ct.gov/widgets/hree-nys2
    Explore at:
    csv, xml, xlsxAvailable download formats
    Dataset updated
    Jun 23, 2022
    Dataset authored and provided by
    Department of Public Health
    License

    U.S. Government Workshttps://www.usa.gov/government-works
    License information was derived automatically

    Description

    Note: DPH is updating and streamlining the COVID-19 cases, deaths, and testing data. As of 6/27/2022, the data will be published in four tables instead of twelve.

    The COVID-19 Cases, Deaths, and Tests by Day dataset contains cases and test data by date of sample submission. The death data are by date of death. This dataset is updated daily and contains information back to the beginning of the pandemic. The data can be found at https://data.ct.gov/Health-and-Human-Services/COVID-19-Cases-Deaths-and-Tests-by-Day/g9vi-2ahj.

    The COVID-19 State Metrics dataset contains over 93 columns of data. This dataset is updated daily and currently contains information starting June 21, 2022 to the present. The data can be found at https://data.ct.gov/Health-and-Human-Services/COVID-19-State-Level-Data/qmgw-5kp6 .

    The COVID-19 County Metrics dataset contains 25 columns of data. This dataset is updated daily and currently contains information starting June 16, 2022 to the present. The data can be found at https://data.ct.gov/Health-and-Human-Services/COVID-19-County-Level-Data/ujiq-dy22 .

    The COVID-19 Town Metrics dataset contains 16 columns of data. This dataset is updated daily and currently contains information starting June 16, 2022 to the present. The data can be found at https://data.ct.gov/Health-and-Human-Services/COVID-19-Town-Level-Data/icxw-cada . To protect confidentiality, if a town has fewer than 5 cases or positive NAAT tests over the past 7 days, those data will be suppressed.

    This dataset includes a count and rate per 100,000 population for COVID-19 cases, a count of COVID-19 molecular diagnostic tests, and a percent positivity rate for tests among people living in community settings for the previous two-week period. Dates are based on date of specimen collection (cases and positivity).

    A person is considered a new case only upon their first COVID-19 testing result because a case is defined as an instance or bout of illness. If they are tested again subsequently and are still positive, it still counts toward the test positivity metric but they are not considered another case.

    Percent positivity is calculated as the number of positive tests among community residents conducted during the 14 days divided by the total number of positive and negative tests among community residents during the same period. If someone was tested more than once during that 14 day period, then those multiple test results (regardless of whether they were positive or negative) are included in the calculation.

    These case and test counts do not include cases or tests among people residing in congregate settings, such as nursing homes, assisted living facilities, or correctional facilities.

    These data are updated weekly and reflect the previous two full Sunday-Saturday (MMWR) weeks (https://wwwn.cdc.gov/nndss/document/MMWR_week_overview.pdf).

    DPH note about change from 7-day to 14-day metrics: Prior to 10/15/2020, these metrics were calculated using a 7-day average rather than a 14-day average. The 7-day metrics are no longer being updated as of 10/15/2020 but the archived dataset can be accessed here: https://data.ct.gov/Health-and-Human-Services/COVID-19-case-rate-per-100-000-population-and-perc/s22x-83rd

    As you know, we are learning more about COVID-19 all the time, including the best ways to measure COVID-19 activity in our communities. CT DPH has decided to shift to 14-day rates because these are more stable, particularly at the town level, as compared to 7-day rates. In addition, since the school indicators were initially published by DPH last summer, CDC has recommended 14-day rates and other states (e.g., Massachusetts) have started to implement 14-day metrics for monitoring COVID transmission as well.

    With respect to geography, we also have learned that many people are looking at the town-level data to inform decision making, despite emphasis on the county-level metrics in the published addenda. This is understandable as there has been variation within counties in COVID-19 activity (for example, rates that are higher in one town than in most other towns in the county).

    Additional notes: As of 11/5/2020, CT DPH has added antigen testing for SARS-CoV-2 to reported test counts in this dataset. The tests included in this dataset include both molecular and antigen datasets. Molecular tests reported include polymerase chain reaction (PCR) and nucleic acid amplicfication (NAAT) tests.

    The population data used to calculate rates is based on the CT DPH population statistics for 2019, which is available online here: https://portal.ct.gov/DPH/Health-Information-Systems--Reporting/Population/Population-Statistics. Prior to 5/10/2021, the population estimates from 2018 were used.

    Data suppression is applied when the rate is <5 cases per 100,000 or if there are <5 cases within the town. Information on why data suppression rules are applied can be found online here: https://www.cdc.gov/cancer/uscs/technical_notes/stat_methods/suppression.htm

  10. d

    COVID-19 Vaccination Coverage, Citywide

    • catalog.data.gov
    • data.cityofchicago.org
    • +2more
    Updated Sep 20, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    data.cityofchicago.org (2025). COVID-19 Vaccination Coverage, Citywide [Dataset]. https://catalog.data.gov/dataset/covid-19-vaccination-coverage-citywide
    Explore at:
    Dataset updated
    Sep 20, 2025
    Dataset provided by
    data.cityofchicago.org
    Description

    NOTE: This dataset replaces two previous ones. Please see below. Chicago residents who are up to date with COVID-19 vaccines, based on the reported address, race-ethnicity, sex, and age group of the person vaccinated, as provided by the medical provider in the Illinois Comprehensive Automated Immunization Registry Exchange (I-CARE). “Up to date” refers to individuals who meet the CDC’s updated COVID-19 vaccination criteria based on their age and prior vaccination history. For surveillance purposes, up to date is defined based on the following criteria: People ages 5 years and older: · Are up to date when they receive 1+ doses of a COVID-19 vaccine during the current season. Children ages 6 months to 4 years: · Children who have received at least two prior COVID-19 vaccine doses are up to date when they receive one additional dose of COVID-19 vaccine during the current season, regardless of vaccine product. · Children who have received only one prior COVID-19 vaccine dose are up to date when they receive one additional dose of the current season's Moderna COVID-19 vaccine or two additional doses of the current season's Pfizer-BioNTech COVID-19 vaccine. · Children who have never received a COVID-19 vaccination are up to date when they receive either two doses of the current season's Moderna vaccine or three doses of the current season's Pfizer-BioNTech vaccine. This dataset takes the place of two previous datasets, which cover doses administered from December 15, 2020 through September 13, 2023 and are marked has historical: - https://data.cityofchicago.org/Health-Human-Services/COVID-19-Daily-Vaccinations-Chicago-Residents/2vhs-cf6b - https://data.cityofchicago.org/Health-Human-Services/COVID-19-Vaccinations-by-Age-and-Race-Ethnicity/37ac-bbe3. Data Notes: Weekly cumulative totals of people up to date are shown for each combination of race-ethnicity, sex, and age group. Note that race-ethnicity, age, and sex all have an option for “All” so care should be taken when summing rows. Coverage percentages are calculated based on the cumulative number of people in each race-ethnicity/age/sex population subgroup who are considered up to date as of the week ending date divided by the estimated number of people in that subgroup. Population counts are obtained from the 2020 U.S. Decennial Census. Actual counts may exceed population estimates and lead to coverage estimates that are greater than 100%, especially in smaller demographic groupings with smaller populations. Additionally, the medical provider may report incorrect demographic information for the person receiving the vaccination, which may lead to over- or underestimation of vaccination coverage. All coverage percentages are capped at 99%. Weekly cumulative counts and coverage percentages are reported from the week ending Saturday, September 16, 2023 onward through the Saturday prior to the dataset being updated. All data are provisional and subject to change. Information is updated as additional details are received and it is, in fact, very common for recent dates to be incomplete and to be updated as time goes on. At any given time, this dataset reflects data currently known to CDPH. Numbers in this dataset may differ from other public sources due to when data are reported and how City of Chicago boundaries are defined. The Chicago Department of Public Health uses the most complete data available to estimate COVID-19 vaccination coverage among Chicagoans, but there are several limitations that impact our estimates. Individuals may receive vaccinations that are not recorded in the Illinois immunization registry, I-CARE, such as those administered in another state, causing underestimation of the number individuals who are up to date. Inconsistencies in records of separate doses administered to the same person, such as slight variations in dates of birth, can result in duplicate records for a person and underestimate the number of people who are up to date.

  11. O

    CT School Learning Model Indicators by County (14-day metrics) - ARCHIVE

    • data.ct.gov
    • s.cnmilf.com
    • +1more
    csv, xlsx, xml
    Updated Aug 5, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    CT DPH (2021). CT School Learning Model Indicators by County (14-day metrics) - ARCHIVE [Dataset]. https://data.ct.gov/Health-and-Human-Services/CT-School-Learning-Model-Indicators-by-County-14-d/e4bh-ax24
    Explore at:
    csv, xml, xlsxAvailable download formats
    Dataset updated
    Aug 5, 2021
    Dataset authored and provided by
    CT DPH
    License

    U.S. Government Workshttps://www.usa.gov/government-works
    License information was derived automatically

    Area covered
    Connecticut
    Description

    NOTE: This dataset pertains only to the 2020-2021 school year and is no longer being updated. For additional data on COVID-19, visit data.ct.gov/coronavirus.

    This dataset includes the leading and secondary metrics identified by the Connecticut Department of Health (DPH) and the Department of Education (CSDE) to support local district decision-making on the level of in-person, hybrid (blended), and remote learning model for Pre K-12 education.

    Data represent daily averages for two-week periods by date of specimen collection (cases and positivity), date of hospital admission, or date of ED visit. Hospitalization data come from the Connecticut Hospital Association and are based on hospital location, not county of patient residence. COVID-19-like illness includes fever and cough or shortness of breath or difficulty breathing or the presence of coronavirus diagnosis code and excludes patients with influenza-like illness. All data are preliminary.

    These data are updated weekly and reflect the previous two full Sunday-Saturday (MMWR) weeks (https://wwwn.cdc.gov/nndss/document/MMWR_week_overview.pdf).

    These metrics were adapted from recommendations by the Harvard Global Institute and supplemented by existing DPH measures.

    For national data on COVID-19, see COVID View, the national weekly surveillance summary of U.S. COVID-19 activity, at https://www.cdc.gov/coronavirus/2019-ncov/covid-data/covidview/index.html

    DPH note about change from 7-day to 14-day metrics: Prior to 10/15/2020, these metrics were calculated using a 7-day average rather than a 14-day average. The 7-day metrics are no longer being updated as of 10/15/2020 but the archived dataset can be accessed here: https://data.ct.gov/Health-and-Human-Services/CT-School-Learning-Model-Indicators-by-County/rpph-4ysy

    As you know, we are learning more about COVID-19 all the time, including the best ways to measure COVID-19 activity in our communities. CT DPH has decided to shift to 14-day rates because these are more stable, particularly at the town level, as compared to 7-day rates. In addition, since the school indicators were initially published by DPH last summer, CDC has recommended 14-day rates and other states (e.g., Massachusetts) have started to implement 14-day metrics for monitoring COVID transmission as well.

    With respect to geography, we also have learned that many people are looking at the town-level data to inform decision making, despite emphasis on the county-level metrics in the published addenda. This is understandable as there has been variation within counties in COVID-19 activity (for example, rates that are higher in one town than in most other towns in the county).

  12. Immigration system statistics data tables

    • gov.uk
    Updated Nov 27, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Home Office (2025). Immigration system statistics data tables [Dataset]. https://www.gov.uk/government/statistical-data-sets/immigration-system-statistics-data-tables
    Explore at:
    Dataset updated
    Nov 27, 2025
    Dataset provided by
    GOV.UKhttp://gov.uk/
    Authors
    Home Office
    Description

    List of the data tables as part of the Immigration system statistics Home Office release. Summary and detailed data tables covering the immigration system, including out-of-country and in-country visas, asylum, detention, and returns.

    If you have any feedback, please email MigrationStatsEnquiries@homeoffice.gov.uk.

    Accessible file formats

    The Microsoft Excel .xlsx files may not be suitable for users of assistive technology.
    If you use assistive technology (such as a screen reader) and need a version of these documents in a more accessible format, please email MigrationStatsEnquiries@homeoffice.gov.uk
    Please tell us what format you need. It will help us if you say what assistive technology you use.

    Related content

    Immigration system statistics, year ending September 2025
    Immigration system statistics quarterly release
    Immigration system statistics user guide
    Publishing detailed data tables in migration statistics
    Policy and legislative changes affecting migration to the UK: timeline
    Immigration statistics data archives

    Passenger arrivals

    https://assets.publishing.service.gov.uk/media/691afc82e39a085bda43edd8/passenger-arrivals-summary-sep-2025-tables.ods">Passenger arrivals summary tables, year ending September 2025 (ODS, 31.5 KB)

    ‘Passengers refused entry at the border summary tables’ and ‘Passengers refused entry at the border detailed datasets’ have been discontinued. The latest published versions of these tables are from February 2025 and are available in the ‘Passenger refusals – release discontinued’ section. A similar data series, ‘Refused entry at port and subsequently departed’, is available within the Returns detailed and summary tables.

    Electronic travel authorisation

    https://assets.publishing.service.gov.uk/media/691b03595a253e2c40d705b9/electronic-travel-authorisation-datasets-sep-2025.xlsx">Electronic travel authorisation detailed datasets, year ending September 2025 (MS Excel Spreadsheet, 58.6 KB)
    ETA_D01: Applications for electronic travel authorisations, by nationality ETA_D02: Outcomes of applications for electronic travel authorisations, by nationality

    Entry clearance visas granted outside the UK

    https://assets.publishing.service.gov.uk/media/6924812a367485ea116a56bd/visas-summary-sep-2025-tables.ods">Entry clearance visas summary tables, year ending September 2025 (ODS, 53.3 KB)

    https://assets.publishing.service.gov.uk/media/691aebbf5a253e2c40d70598/entry-clearance-visa-outcomes-datasets-sep-2025.xlsx">Entry clearance visa applications and outcomes detailed datasets, year ending September 2025 (MS Excel Spreadsheet, 30.2 MB)
    Vis_D01: Entry clearance visa applications, by nationality and visa type
    Vis_D02: Outcomes of entry clearance visa applications, by nationality, visa type, and outcome

    Additional data relating to in country and overse

  13. Asthma Prevalence

    • data.ca.gov
    • data.chhs.ca.gov
    • +4more
    csv, pdf, zip
    Updated Nov 6, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    California Department of Public Health (2025). Asthma Prevalence [Dataset]. https://data.ca.gov/dataset/asthma-prevalence
    Explore at:
    csv, zip, pdfAvailable download formats
    Dataset updated
    Nov 6, 2025
    Dataset authored and provided by
    California Department of Public Healthhttps://www.cdph.ca.gov/
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset contains the estimated percentage of Californians with asthma (asthma prevalence). Two types of asthma prevalence are included: 1) lifetime asthma prevalence describes the percentage of people who have ever been diagnosed with asthma by a health care provider, 2) current asthma prevalence describes the percentage of people who have ever been diagnosed with asthma by a health care provider AND report they still have asthma and/or had an asthma episode or attack within the past 12 months. The tables “Lifetime Asthma Prevalence by County” and “Current Asthma Prevalence by County” are derived from the California Health Interview Survey (CHIS) and include data stratified by county and age group (all ages, 0-17, 18+, 0-4, 5-17, 18-64, 65+) reported for 2-year periods. The table “Asthma Prevalence, Adults (18 and older)” is derived from the California Behavioral Risk Factor Surveillance System (BRFSS) and includes statewide data on adults reported by year.

  14. d

    Johns Hopkins COVID-19 Case Tracker

    • data.world
    • kaggle.com
    csv, zip
    Updated Dec 3, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Associated Press (2025). Johns Hopkins COVID-19 Case Tracker [Dataset]. https://data.world/associatedpress/johns-hopkins-coronavirus-case-tracker
    Explore at:
    zip, csvAvailable download formats
    Dataset updated
    Dec 3, 2025
    Authors
    The Associated Press
    Time period covered
    Jan 22, 2020 - Mar 9, 2023
    Area covered
    Description

    Updates

    • Notice of data discontinuation: Since the start of the pandemic, AP has reported case and death counts from data provided by Johns Hopkins University. Johns Hopkins University has announced that they will stop their daily data collection efforts after March 10. As Johns Hopkins stops providing data, the AP will also stop collecting daily numbers for COVID cases and deaths. The HHS and CDC now collect and visualize key metrics for the pandemic. AP advises using those resources when reporting on the pandemic going forward.

    • April 9, 2020

      • The population estimate data for New York County, NY has been updated to include all five New York City counties (Kings County, Queens County, Bronx County, Richmond County and New York County). This has been done to match the Johns Hopkins COVID-19 data, which aggregates counts for the five New York City counties to New York County.
    • April 20, 2020

      • Johns Hopkins death totals in the US now include confirmed and probable deaths in accordance with CDC guidelines as of April 14. One significant result of this change was an increase of more than 3,700 deaths in the New York City count. This change will likely result in increases for death counts elsewhere as well. The AP does not alter the Johns Hopkins source data, so probable deaths are included in this dataset as well.
    • April 29, 2020

      • The AP is now providing timeseries data for counts of COVID-19 cases and deaths. The raw counts are provided here unaltered, along with a population column with Census ACS-5 estimates and calculated daily case and death rates per 100,000 people. Please read the updated caveats section for more information.
    • September 1st, 2020

      • Johns Hopkins is now providing counts for the five New York City counties individually.
    • February 12, 2021

      • The Ohio Department of Health recently announced that as many as 4,000 COVID-19 deaths may have been underreported through the state’s reporting system, and that the "daily reported death counts will be high for a two to three-day period."
      • Because deaths data will be anomalous for consecutive days, we have chosen to freeze Ohio's rolling average for daily deaths at the last valid measure until Johns Hopkins is able to back-distribute the data. The raw daily death counts, as reported by Johns Hopkins and including the backlogged death data, will still be present in the new_deaths column.
    • February 16, 2021

      - Johns Hopkins has reconciled Ohio's historical deaths data with the state.

      Overview

    The AP is using data collected by the Johns Hopkins University Center for Systems Science and Engineering as our source for outbreak caseloads and death counts for the United States and globally.

    The Hopkins data is available at the county level in the United States. The AP has paired this data with population figures and county rural/urban designations, and has calculated caseload and death rates per 100,000 people. Be aware that caseloads may reflect the availability of tests -- and the ability to turn around test results quickly -- rather than actual disease spread or true infection rates.

    This data is from the Hopkins dashboard that is updated regularly throughout the day. Like all organizations dealing with data, Hopkins is constantly refining and cleaning up their feed, so there may be brief moments where data does not appear correctly. At this link, you’ll find the Hopkins daily data reports, and a clean version of their feed.

    The AP is updating this dataset hourly at 45 minutes past the hour.

    To learn more about AP's data journalism capabilities for publishers, corporations and financial institutions, go here or email kromano@ap.org.

    Queries

    Use AP's queries to filter the data or to join to other datasets we've made available to help cover the coronavirus pandemic

    Interactive

    The AP has designed an interactive map to track COVID-19 cases reported by Johns Hopkins.

    @(https://datawrapper.dwcdn.net/nRyaf/15/)

    Interactive Embed Code

    <iframe title="USA counties (2018) choropleth map Mapping COVID-19 cases by county" aria-describedby="" id="datawrapper-chart-nRyaf" src="https://datawrapper.dwcdn.net/nRyaf/10/" scrolling="no" frameborder="0" style="width: 0; min-width: 100% !important;" height="400"></iframe><script type="text/javascript">(function() {'use strict';window.addEventListener('message', function(event) {if (typeof event.data['datawrapper-height'] !== 'undefined') {for (var chartId in event.data['datawrapper-height']) {var iframe = document.getElementById('datawrapper-chart-' + chartId) || document.querySelector("iframe[src*='" + chartId + "']");if (!iframe) {continue;}iframe.style.height = event.data['datawrapper-height'][chartId] + 'px';}}});})();</script>
    

    Caveats

    • This data represents the number of cases and deaths reported by each state and has been collected by Johns Hopkins from a number of sources cited on their website.
    • In some cases, deaths or cases of people who've crossed state lines -- either to receive treatment or because they became sick and couldn't return home while traveling -- are reported in a state they aren't currently in, because of state reporting rules.
    • In some states, there are a number of cases not assigned to a specific county -- for those cases, the county name is "unassigned to a single county"
    • This data should be credited to Johns Hopkins University's COVID-19 tracking project. The AP is simply making it available here for ease of use for reporters and members.
    • Caseloads may reflect the availability of tests -- and the ability to turn around test results quickly -- rather than actual disease spread or true infection rates.
    • Population estimates at the county level are drawn from 2014-18 5-year estimates from the American Community Survey.
    • The Urban/Rural classification scheme is from the Center for Disease Control and Preventions's National Center for Health Statistics. It puts each county into one of six categories -- from Large Central Metro to Non-Core -- according to population and other characteristics. More details about the classifications can be found here.

    Johns Hopkins timeseries data - Johns Hopkins pulls data regularly to update their dashboard. Once a day, around 8pm EDT, Johns Hopkins adds the counts for all areas they cover to the timeseries file. These counts are snapshots of the latest cumulative counts provided by the source on that day. This can lead to inconsistencies if a source updates their historical data for accuracy, either increasing or decreasing the latest cumulative count. - Johns Hopkins periodically edits their historical timeseries data for accuracy. They provide a file documenting all errors in their timeseries files that they have identified and fixed here

    Attribution

    This data should be credited to Johns Hopkins University COVID-19 tracking project

  15. q

    SAIVT-Campus Dataset

    • researchdatafinder.qut.edu.au
    Updated Jun 30, 2016
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dr Simon Denman (2016). SAIVT-Campus Dataset [Dataset]. https://researchdatafinder.qut.edu.au/individual/n2531
    Explore at:
    Dataset updated
    Jun 30, 2016
    Dataset provided by
    Queensland University of Technology (QUT)
    Authors
    Dr Simon Denman
    Description

    SAIVT-Campus Dataset

    Overview

    The SAIVT-Campus Database is an abnormal event detection database captured on a university campus, where the abnormal events are caused by the onset of a storm. Contact Dr Simon Denman or Dr Jingxin Xu for more information.

    Licensing

    The SAIVT-Campus database is © 2012 QUT and is licensed under the Creative Commons Attribution-ShareAlike 3.0 Australia License.

    Attribution

    To attribute this database, please include the following citation: Xu, Jingxin, Denman, Simon, Fookes, Clinton B., & Sridharan, Sridha (2012) Activity analysis in complicated scenes using DFT coefficients of particle trajectories. In 9th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS 2012), 18-21 September 2012, Beijing, China. available at eprints.

    Acknowledging the Database in your Publications

    In addition to citing our paper, we kindly request that the following text be included in an acknowledgements section at the end of your publications: We would like to thank the SAIVT Research Labs at Queensland University of Technology (QUT) for freely supplying us with the SAIVT-Campus database for our research.

    Installing the SAIVT-Campus database

    After downloading and unpacking the archive, you should have the following structure:

    SAIVT-Campus +-- LICENCE.txt +-- README.txt +-- test_dataset.avi +-- training_dataset.avi +-- Xu2012 - Activity analysis in complicated scenes using DFT coefficients of particle trajectories.pdf

    Notes

    The SAIVT-Campus dataset is captured at the Queensland University of Technology, Australia.

    It contains two video files from real-world surveillance footage without any actors:

    training_dataset.avi (the training dataset)
    test_dataset.avi (the test dataset).
    

    This dataset contains a mixture of crowd densities and it has been used in the following paper for abnormal event detection:

    Xu, Jingxin, Denman, Simon, Fookes, Clinton B., & Sridharan, Sridha (2012) Activity analysis in complicated scenes using DFT coefficients of particle trajectories. In 9th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS 2012), 18-21 September 2012, Beijing, China. Available at eprints. 
    This paper is also included with the database (Xu2012 - Activity analysis in complicated scenes using DFT coefficients of particle trajectories.pdf) Both video files are one hour in duration.
    

    The normal activities include pedestrians entering or exiting the building, entering or exiting a lecture theatre (yellow door), and going to the counter at the bottom right. The abnormal events are caused by a heavy rain outside, and include people running in from the rain, people walking towards the door to exit and turning back, wearing raincoats, loitering and standing near the door and overcrowded scenes. The rain happens only in the later part of the test dataset.

    As a result, we assume that the training dataset only contains the normal activities. We have manually made an annotation as below:

    the training dataset does not have abnormal scenes
    the test dataset separates into two parts: only normal activities occur from 00:00:00 to 00:47:16 abnormalities are present from 00:47:17 to 01:00:00. We annotate the time 00:47:17 as the start time for the abnormal events, as from this time on we have begun to observe people stop walking or turn back from walking towards the door to exit, which indicates that the rain outside the building has influenced the activities inside the building. Should you have any questions, please do not hesitate to contact Dr Jingxin Xu.
    
  16. COVID-19 Vaccine Progress Dashboard Data by ZIP Code

    • data.chhs.ca.gov
    • healthdata.gov
    • +1more
    csv, xlsx, zip
    Updated Nov 24, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    California Department of Public Health (2025). COVID-19 Vaccine Progress Dashboard Data by ZIP Code [Dataset]. https://data.chhs.ca.gov/dataset/covid-19-vaccine-progress-dashboard-data-by-zip-code
    Explore at:
    csv(21567128), csv(5478164), xlsx(7800), csv(27663424), csv(9320174), xlsx(10933), zipAvailable download formats
    Dataset updated
    Nov 24, 2025
    Dataset authored and provided by
    California Department of Public Healthhttps://www.cdph.ca.gov/
    Description

    Note: In these datasets, a person is defined as up to date if they have received at least one dose of an updated COVID-19 vaccine. The Centers for Disease Control and Prevention (CDC) recommends that certain groups, including adults ages 65 years and older, receive additional doses.

    Starting on July 13, 2022, the denominator for calculating vaccine coverage has been changed from age 5+ to all ages to reflect new vaccine eligibility criteria. Previously the denominator was changed from age 16+ to age 12+ on May 18, 2021, then changed from age 12+ to age 5+ on November 10, 2021, to reflect previous changes in vaccine eligibility criteria. The previous datasets based on age 12+ and age 5+ denominators have been uploaded as archived tables.

    Starting June 30, 2021, the dataset has been reconfigured so that all updates are appended to one dataset to make it easier for API and other interfaces. In addition, historical data has been extended back to January 5, 2021.

    This dataset shows full, partial, and at least 1 dose coverage rates by zip code tabulation area (ZCTA) for the state of California. Data sources include the California Immunization Registry and the American Community Survey’s 2015-2019 5-Year data.

    This is the data table for the LHJ Vaccine Equity Performance dashboard. However, this data table also includes ZTCAs that do not have a VEM score.

    This dataset also includes Vaccine Equity Metric score quartiles (when applicable), which combine the Public Health Alliance of Southern California’s Healthy Places Index (HPI) measure with CDPH-derived scores to estimate factors that impact health, like income, education, and access to health care. ZTCAs range from less healthy community conditions in Quartile 1 to more healthy community conditions in Quartile 4.

    The Vaccine Equity Metric is for weekly vaccination allocation and reporting purposes only. CDPH-derived quartiles should not be considered as indicative of the HPI score for these zip codes. CDPH-derived quartiles were assigned to zip codes excluded from the HPI score produced by the Public Health Alliance of Southern California due to concerns with statistical reliability and validity in populations smaller than 1,500 or where more than 50% of the population resides in a group setting.

    These data do not include doses administered by the following federal agencies who received vaccine allocated directly from CDC: Indian Health Service, Veterans Health Administration, Department of Defense, and the Federal Bureau of Prisons.

    For some ZTCAs, vaccination coverage may exceed 100%. This may be a result of many people from outside the county coming to that ZTCA to get their vaccine and providers reporting the county of administration as the county of residence, and/or the DOF estimates of the population in that ZTCA are too low. Please note that population numbers provided by DOF are projections and so may not be accurate, especially given unprecedented shifts in population as a result of the pandemic.

  17. g

    Coronavirus (Covid-19) Data in the United States

    • github.com
    • openicpsr.org
    • +4more
    csv
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    New York Times, Coronavirus (Covid-19) Data in the United States [Dataset]. https://github.com/nytimes/covid-19-data
    Explore at:
    csvAvailable download formats
    Dataset provided by
    New York Times
    License

    https://github.com/nytimes/covid-19-data/blob/master/LICENSEhttps://github.com/nytimes/covid-19-data/blob/master/LICENSE

    Description

    The New York Times is releasing a series of data files with cumulative counts of coronavirus cases in the United States, at the state and county level, over time. We are compiling this time series data from state and local governments and health departments in an attempt to provide a complete record of the ongoing outbreak.

    Since the first reported coronavirus case in Washington State on Jan. 21, 2020, The Times has tracked cases of coronavirus in real time as they were identified after testing. Because of the widespread shortage of testing, however, the data is necessarily limited in the picture it presents of the outbreak.

    We have used this data to power our maps and reporting tracking the outbreak, and it is now being made available to the public in response to requests from researchers, scientists and government officials who would like access to the data to better understand the outbreak.

    The data begins with the first reported coronavirus case in Washington State on Jan. 21, 2020. We will publish regular updates to the data in this repository.

  18. Clothing Dataset for Second-Hand Fashion

    • zenodo.org
    • data.europa.eu
    zip
    Updated Jun 24, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Farrukh Nauman; Farrukh Nauman (2024). Clothing Dataset for Second-Hand Fashion [Dataset]. http://doi.org/10.5281/zenodo.12518734
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jun 24, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Farrukh Nauman; Farrukh Nauman
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Second-Hand Fashion Dataset

    Overview

    The dataset originates from projects focused on the sorting of used clothes within a sorting facility. The primary objective is to classify each garment into one of several categories to determine its ultimate destination: reuse, reuse outside Sweden (export), recycling, repair, remake, or thermal waste.

    The dataset has 31,997 clothing items, a massive update from the 3,000 items in version 1. The dataset collection started under the Vinnova funded project "AI for resource-efficient circular fashion" in Spring, 2022 and involves collaboration among three institutions: RISE Research Institutes of Sweden AB, Wargön Innovation AB, and Myrorna AB. The dataset has received further support through the EU project, CISUTAC (cisutac.eu).

    Project page

    - Webpage: https://fnauman.github.io/second-hand-fashion/">second-hand-fashion
    - Contact: farrukh.nauman@ri.se

    Dataset Details

    - The dataset contains 31,997 clothing items, each with a unique item ID in a datetime format. The items are divided into three stations: `station1`, `station2`, and `station3`. The `station1` and `station2` folders contain images and annotations from Wargön Innovation AB, while the `station3` folder contains data from Myrorna AB. Each clothing item has three images and a JSON file containing annotations.

    - Three images are provided for each clothing item:
    1. Front view.
    2. Back view.
    3. Brand label close-up. About 4000-5000 brand images are missing because of privacy concerns: people's hands, faces, etc. Some clothing items did not have a brand label to begin with.

    - Image resolutions are primarily in two sizes: `1280x720` and `1920x1080`. The background of the images is a table that used a measuring tape prior to January 2023, but later images have a square grid pattern with each square measuring `10x10` cm.

    - Each JSON file contains a list of annotations, some of which require nuanced interpretation (see `labels.py` for the options):
    - `usage`: Arguably the most critical label, usage indicates the garment's intended pathway. Options include 'Reuse,' 'Repair,' 'Remake,' 'Recycle,' 'Export' (reuse outside Sweden), and 'Energy recovery' (thermal waste). About 99% of the garments fall into the 'Reuse,' 'Export,' or 'Recycle' categories.
    - `price`: The price field should be viewed as suggestive rather than definitive. Pricing models in the second-hand industry vary widely, including pricing by weight, brand, demand, or fixed value. Wargön Innovation AB does not determine actual pricing.
    - `trend`: This field refers to the general style of the garment, not a time-dependent trend as in some other datasets (e.g., Visuelle 2.0). It might be more accurately labeled as 'style.'
    - `material`: Material annotations are mostly based on the readings from a Near Infrared (NIR) scanner and in some cases from the garment's brand label.
    - Damage-related attributes include:
    - `condition` (1-5 scale, 5 being the best)
    - `pilling` (1-5 scale, 5 meaning no pilling)
    - `stains`, `holes`, `smell` (each with options 'None,' 'Minor,' 'Major').

    Note: 'holes' and 'smell' were introduced after November 17th, 2022, and stains previously only had 'Yes'/'No' options. For `station1` and `station2`, we introduced additional damage location labels to assist in damage detection:

          "damageimage": "back",
          "damageloc": "bottom left",
          "damage": "stain ",
          "damage2image": "front",
          "damage2loc": "None",
          "damage2": "",
          "damage3image": "back",
          "damage3loc": "bottom right",
          "damage3": "stain"

    Taken from `labels_2024_04_05_08_47_35.json` file. Additionally, we annotated a few hundred images with bounding box annotations that we aim to release at a later date.
    - `comments`: The comments field is mostly empty, but sometimes contains important information about the garment, such as a detailed text description of the damage.

    - Whenever possible, ISO standards have been followed to define these attributes on a 1-5 scale (e.g., `pilling`).

    - Gold dataset: `Test` inside the comments field is meant for garments that were annotated multiple times by different annotators for annotator agreement comparisons. These 100 garments were annotated twice at Wargön Innovation AB (search within `station1/[dec2022,feb2023]`)and once at Myrorna AB (see `station3/test100` folder for JSON files containing their annotations).

    - The data has been annotated by a group of expert second-hand sorters at Wargön Innovation AB and Myrorna AB.

    - Some attributes, such as `price`, should be considered with caution. Many distinct pricing models exist in the second-hand industry:
    - Price by weight
    - Price by brand and demand (similar to first-hand fashion)
    - Generic pricing at a fixed value (e.g., 1 Euro or 10 SEK)

    Wargön Innovation AB does not set the prices in practice and their prices are suggestive only (`station1` and `station2`). Myrorna AB (`station3`), in contrast, does resale and sets the prices.

    Comments

    - We received feedback on our version 1 that some images were too blurry or had poor lighting. The image quality has slightly improved, but largely remains similar to release 1.
    - We further learned that a handful of data items were duplicates. Several duplicate images were removed, but about 400 still remain.
    - Some users did not prefer a `tar.gz` format that we uploaded in version 1 of the dataset. We have now switched to `.zip` for convenience.
    - Most JSON files parse fine using any standard JSON reader, but a handful that are problematic have been set aside in the `json_errors` folder.
    - Extra care was taken not to leak personal information. This is why you will not see any entries for `annotator` attribute in the JSON files in station1/sep2023 since people used their real names. Since then, we used internally assigned IDs.
    - Many brand images contained people's hands, faces, or other personal information. We have removed about 4000-5000 brand images for privacy reasons.
    - Please inform us immediately if you find any personal information revelations in the dataset:
    - Farrukh Nauman (RISE AB): `farrukh.nauman@ri.se`,
    - Susanne Eriksson (Wargön Innovation AB): `susanne.eriksson@wargoninnovation.se`,
    - Gabriella Engstrom (Wargön Innovation AB): `gabriella.engstrom@wargoninnovation.se`.

    We went through 100k images three times to ensure no personal information is leaked, but we are human and can make mistakes.

    Partners

    The data collection for this dataset has been carried out in collaboration with the following partners:

    1. RISE Research Institutes of Sweden AB: RISE is a leading research institute dedicated to advancing innovation and sustainability across various sectors, including fashion and textiles.

    2. Wargön Innovation AB: Wargön Innovation is an expert in sustainable and circular fashion solutions, contributing valuable insights and expertise to the dataset creation.

    3. Myrorna AB: Myrorna is Sweden's oldest chain of stores for collecting clothes and furnishings that can be reused.

    License

    CC-BY 4.0. Please refer to the LICENSE file for more details.

    Acknowledgments

    This dataset was made possible through the collaborative efforts of RISE Research Institutes of Sweden AB, Wargön Innovation AB, and Myrorna AB, with funding from Vinnova and support from the EU project CISUTAC. We extend our gratitude to all the expert second-hand sorters and annotators who contributed their expertise to this project.

  19. MultiSocial

    • zenodo.org
    • data.niaid.nih.gov
    • +1more
    Updated Aug 20, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dominik Macko; Dominik Macko; Jakub Kopal; Robert Moro; Robert Moro; Ivan Srba; Ivan Srba; Jakub Kopal (2025). MultiSocial [Dataset]. http://doi.org/10.5281/zenodo.13846152
    Explore at:
    Dataset updated
    Aug 20, 2025
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Dominik Macko; Dominik Macko; Jakub Kopal; Robert Moro; Robert Moro; Ivan Srba; Ivan Srba; Jakub Kopal
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    MultiSocial is a dataset (described in a paper) for multilingual (22 languages) machine-generated text detection benchmark in social-media domain (5 platforms). It contains 472,097 texts, of which about 58k are human-written and approximately the same amount is generated by each of 7 multilingual large language models by using 3 iterations of paraphrasing. The dataset has been anonymized to minimize amount of sensitive data by hiding email addresses, usernames, and phone numbers.

    If you use this dataset in any publication, project, tool or in any other form, please, cite the paper.

    Disclaimer

    Due to data source (described below), the dataset may contain harmful, disinformation, or offensive content. Based on a multilingual toxicity detector, about 8% of the text samples are probably toxic (from 5% in WhatsApp to 10% in Twitter). Although we have used data sources of older date (lower probability to include machine-generated texts), the labeling (of human-written text) might not be 100% accurate. The anonymization procedure might not successfully hiden all the sensitive/personal content; thus, use the data cautiously (if feeling affected by such content, report the found issues in this regard to dpo[at]kinit.sk). The intended use if for non-commercial research purpose only.

    Data Source

    The human-written part consists of a pseudo-randomly selected subset of social media posts from 6 publicly available datasets:

    1. Telegram data originated in Pushshift Telegram, containing 317M messages (Baumgartner et al., 2020). It contains messages from 27k+ channels. The collection started with a set of right-wing extremist and cryptocurrency channels (about 300 in total) and was expanded based on occurrence of forwarded messages from other channels. In the end, it thus contains a wide variety of topics and societal movements reflecting the data collection time.

    2. Twitter data originated in CLEF2022-CheckThat! Task 1, containing 34k tweets on COVID-19 and politics (Nakov et al., 2022, combined with Sentiment140, containing 1.6M tweets on various topics (Go et al., 2009).

    3. Gab data originated in the dataset containing 22M posts from Gab social network. The authors of the dataset (Zannettou et al., 2018) found out that “Gab is predominantly used for the dissemination and discussion of news and world events, and that it attracts alt-right users, conspiracy theorists, and other trolls.” They also found out that hate speech is much more prevalent there compared to Twitter, but lower than 4chan's Politically Incorrect board.

    4. Discord data originated in Discord-Data, containing 51M messages. This is a long-context, anonymized, clean, multi-turn and single-turn conversational dataset based on Discord data scraped from a large variety of servers, big and small. According to the dataset authors, it contains around 0.1% of potentially toxic comments (based on the applied heuristic/classifier).

    5. WhatsApp data originated in whatsapp-public-groups, containing 300k messages (Garimella & Tyson, 2018). The public dataset contains the anonymised data, collected for around 5 months from around 178 groups. Original messages were made available to us on request to dataset authors for research purposes.

    From these datasets, we have pseudo-randomly sampled up to 1300 texts (up to 300 for test split and the remaining up to 1000 for train split if available) for each of the selected 22 languages (using a combination of automated approaches to detect the language) and platform. This process resulted in 61,592 human-written texts, which were further filtered out based on occurrence of some characters or their length, resulting in about 58k human-written texts.

    The machine-generated part contains texts generated by 7 LLMs (Aya-101, Gemini-1.0-pro, GPT-3.5-Turbo-0125, Mistral-7B-Instruct-v0.2, opt-iml-max-30b, v5-Eagle-7B-HF, vicuna-13b). All these models were self-hosted except for GPT and Gemini, where we used the publicly available APIs. We generated the texts using 3 paraphrases of the original human-written data and then preprocessed the generated texts (filtered out cases when the generation obviously failed).

    The dataset has the following fields:

    • 'text' - a text sample,

    • 'label' - 0 for human-written text, 1 for machine-generated text,

    • 'multi_label' - a string representing a large language model that generated the text or the string "human" representing a human-written text,

    • 'split' - a string identifying train or test split of the dataset for the purpose of training and evaluation respectively,

    • 'language' - the ISO 639-1 language code identifying the detected language of the given text,

    • 'length' - word count of the given text,

    • 'source' - a string identifying the source dataset / platform of the given text,

    • 'potential_noise' - 0 for text without identified noise, 1 for text with potential noise.

    ToDo Statistics (under construction)

  20. C

    COVID-19 Vaccinations by Age and Race-Ethnicity - Historical

    • data.cityofchicago.org
    • catalog.data.gov
    csv, xlsx, xml
    Updated Dec 13, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    City of Chicago (2023). COVID-19 Vaccinations by Age and Race-Ethnicity - Historical [Dataset]. https://data.cityofchicago.org/Health-Human-Services/COVID-19-Vaccinations-by-Age-and-Race-Ethnicity-Hi/37ac-bbe3
    Explore at:
    xlsx, csv, xmlAvailable download formats
    Dataset updated
    Dec 13, 2023
    Dataset authored and provided by
    City of Chicago
    Description

    NOTE: This dataset has been retired and marked as historical-only. The recommended dataset to use in its place is https://data.cityofchicago.org/Health-Human-Services/COVID-19-Vaccination-Coverage-Citywide/6859-spec.

    COVID-19 vaccinations administered to Chicago residents based on the reported race-ethnicity and age group of the person vaccinated, as provided by the medical provider in the Illinois Comprehensive Automated Immunization Registry Exchange (I-CARE).

    Vaccination Status Definitions:

    ·People with at least one vaccine dose: Number of people who have received at least one dose of any COVID-19 vaccine, including the single-dose Johnson & Johnson COVID-19 vaccine.

    ·People with a completed vaccine series: Number of people who have completed a primary COVID-19 vaccine series. Requirements vary depending on age and type of primary vaccine series received.

    ··People with an original booster dose: Number of people who have a completed vaccine series and have received at least one additional monovalent dose. This includes people who received a monovalent booster dose and immunocompromised people who received an additional primary dose of COVID-19 vaccine. Monovalent doses were created from the original strain of the virus that causes COVID-19.

    • People with a bivalent dose: Number of people who received a bivalent (updated) dose of vaccine. Updated, bivalent doses became available in Fall 2022 and were created with the original strain of COVID-19 and newer Omicron variant strains.

    Weekly cumulative totals by vaccination status are shown for each combination of race-ethnicity and age group. Note that each age group has a row where race-ethnicity is "All" so care should be taken when summing rows.

    Vaccinations are counted based on the date on which they were administered. Weekly cumulative totals are reported from the week ending Saturday, December 19, 2020 onward (after December 15, when vaccines were first administered in Chicago) through the Saturday prior to the dataset being updated.

    Population counts are from the U.S. Census Bureau American Community Survey (ACS) 2019 1-year estimates. For some of the age groups by which COVID-19 vaccine has been authorized in the United States, race-ethnicity distributions were specifically reported in the ACS estimates. For others, race-ethnicity distributions were estimated by the Chicago Department of Public Health (CDPH) by weighting the available race-ethnicity distributions, using proportions of constituent age groups.

    Coverage percentages are calculated based on the cumulative number of people in each population subgroup (age group by race-ethnicity) who have each vaccination status as of the date, divided by the estimated number of Chicago residents in each subgroup.

    Actual counts may exceed population estimates and lead to >100% coverage, especially in small race-ethnicity subgroups of each age group. All coverage percentages are capped at 99%.

    All data are provisional and subject to change. Information is updated as additional details are received and it is, in fact, very common for recent dates to be incomplete and to be updated as time goes on. At any given time, this dataset reflects data currently known to CDPH.

    Numbers in this dataset may differ from other public sources due to when data are reported and how City of Chicago boundaries are defined.

    CDPH uses the most complete data available to estimate COVID-19 vaccination coverage among Chicagoans, but there are several limitations that impact our estimates. Data reported in I-CARE only include doses administered in Illinois and some doses administered outside of Illinois reported historically by Illinois providers. Doses administered by the federal Bureau of Prisons and Department of Defense are also not currently reported in I-CARE. The Veterans Health Administration began reporting doses in I-CARE beginning September 2022. Due to people receiving vaccinations that are not recorded in I-CARE that can be linked to their record, such as someone receiving a vaccine dose in another state, the number of people with a completed series or a booster dose is underestimated. Inconsistencies in records of separate doses administered to the same person, such as slight variations in dates of birth, can result in duplicate first dose records for a person and overestimate of the number of people with at least one dose and underestimate the number of people with a completed series or booster dose

    For all datasets related to COVID-19, see https://data.cityofchicago.org/browse?limitTo=datasets&sortBy=alpha&tags=covid-19.

    Data Source: Illinois Comprehensive Automated Immunization Registry Exchange (I-CARE), U.S. Census Bureau American Community Survey

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Wikimedia (2025). English Wikipedia People Dataset [Dataset]. https://www.kaggle.com/datasets/wikimedia-foundation/english-wikipedia-people-dataset
Organization logo

English Wikipedia People Dataset

Biographical Data for People on English Wikipedia

Explore at:
zip(4293465577 bytes)Available download formats
Dataset updated
Jul 31, 2025
Dataset provided by
Wikimedia Foundationhttp://www.wikimedia.org/
Authors
Wikimedia
License

Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically

Description

Summary

This dataset contains biographical information derived from articles on English Wikipedia as it stood in early June 2024. It was created as part of the Structured Contents initiative at Wikimedia Enterprise and is intended for evaluation and research use.

The beta sample dataset is a subset of the Structured Contents Snapshot focusing on people with infoboxes in EN wikipedia; outputted as json files (compressed in tar.gz).

We warmly welcome any feedback you have. Please share your thoughts, suggestions, and any issues you encounter on the discussion page for this dataset here on Kaggle.

Data Structure

  • File name: wme_people_infobox.tar.gz
  • Size of compressed file: 4.12 GB
  • Size of uncompressed file: 21.28 GB

Noteworthy Included Fields: - name - title of the article. - identifier - ID of the article. - image - main image representing the article's subject. - description - one-sentence description of the article for quick reference. - abstract - lead section, summarizing what the article is about. - infoboxes - parsed information from the side panel (infobox) on the Wikipedia article. - sections - parsed sections of the article, including links. Note: excludes other media/images, lists, tables and references or similar non-prose sections.

The Wikimedia Enterprise Data Dictionary explains all of the fields in this dataset.

Stats

Infoboxes - Compressed: 2GB - Uncompressed: 11GB

Infoboxes + sections + short description - Size of compressed file: 4.12 GB - Size of uncompressed file: 21.28 GB

Article analysis and filtering breakdown: - total # of articles analyzed: 6,940,949 - # people found with QID: 1,778,226 - # people found with Category: 158,996 - people found with Biography Project: 76,150 - Total # of people articles found: 2,013,372 - Total # people articles with infoboxes: 1,559,985 End stats - Total number of people articles in this dataset: 1,559,985 - that have a short description: 1,416,701 - that have an infobox: 1,559,985 - that have article sections: 1,559,921

This dataset includes 235,146 people articles that exist on Wikipedia but aren't yet tagged on Wikidata as instance of:human.

Maintenance and Support

This dataset was originally extracted from the Wikimedia Enterprise APIs on June 5, 2024. The information in this dataset may therefore be out of date. This dataset isn't being actively updated or maintained, and has been shared for community use and feedback. If you'd like to retrieve up-to-date Wikipedia articles or data from other Wikiprojects, get started with Wikimedia Enterprise's APIs

Initial Data Collection and Normalization

The dataset is built from the Wikimedia Enterprise HTML “snapshots”: https://enterprise.wikimedia.com/docs/snapshot/ and focuses on the Wikipedia article namespace (namespace 0 (main)).

Who are the source language producers?

Wikipedia is a human generated corpus of free knowledge, written, edited, and curated by a global community of editors since 2001. It is the largest and most accessed educational resource in history, accessed over 20 billion times by half a billion people each month. Wikipedia represents almost 25 years of work by its community; the creation, curation, and maintenance of millions of articles on distinct topics. This dataset includes the biographical contents of English Wikipedia language editions: English https://en.wikipedia.org/, written by the community.

Attribution

Terms and conditions

Wikimedia Enterprise provides this dataset under the assumption that downstream users will adhere to the relevant free culture licenses when the data is reused. In situations where attribution is required, reusers should identify the Wikimedia project from which the content was retrieved as the source of the content. Any attribution should adhere to Wikimedia’s trademark policy (available at https://foundation.wikimedia.org/wiki/Trademark_policy) and visual identity guidelines (ava...

Search
Clear search
Close search
Google apps
Main menu