This dataset contains counts of deaths for California as a whole based on information entered on death certificates. Final counts are derived from static data and include out-of-state deaths to California residents, whereas provisional counts are derived from incomplete and dynamic data. Provisional counts are based on the records available when the data was retrieved and may not represent all deaths that occurred during the time period. Deaths involving injuries from external or environmental forces, such as accidents, homicide and suicide, often require additional investigation that tends to delay certification of the cause and manner of death. This can result in significant under-reporting of these deaths in provisional data.
The final data tables include both deaths that occurred in California regardless of the place of residence (by occurrence) and deaths to California residents (by residence), whereas the provisional data table only includes deaths that occurred in California regardless of the place of residence (by occurrence). The data are reported as totals, as well as stratified by age, gender, race-ethnicity, and death place type. Deaths due to all causes (ALL) and selected underlying cause of death categories are provided. See temporal coverage for more information on which combinations are available for which years.
The cause of death categories are based solely on the underlying cause of death as coded by the International Classification of Diseases. The underlying cause of death is defined by the World Health Organization (WHO) as "the disease or injury which initiated the train of events leading directly to death, or the circumstances of the accident or violence which produced the fatal injury." It is a single value assigned to each death based on the details as entered on the death certificate. When more than one cause is listed, the order in which they are listed can affect which cause is coded as the underlying cause. This means that similar events could be coded with different underlying causes of death depending on variations in how they were entered. Consequently, while underlying cause of death provides a convenient comparison between cause of death categories, it may not capture the full impact of each cause of death as it does not always take into account all conditions contributing to the death.
Data for deaths by leading cause of death categories are now available in the death profiles dataset for each geographic granularity.
The cause of death categories are based solely on the underlying cause of death as coded by the International Classification of Diseases. The underlying cause of death is defined by the World Health Organization (WHO) as "the disease or injury which initiated the train of events leading directly to death, or the circumstances of the accident or violence which produced the fatal injury." It is a single value assigned to each death based on the details as entered on the death certificate. When more than one cause is listed, the order in which they are listed can affect which cause is coded as the underlying cause. This means that similar events could be coded with different underlying causes of death depending on variations in how they were entered. Consequently, while underlying cause of death provides a convenient comparison between cause of death categories, it may not capture the full impact of each cause of death as it does not always take into account all conditions contributing to the death.
Cause of death categories for years 1999 and later are based on tenth revision of International Classification of Diseases (ICD-10) codes. Comparable categories are provided for years 1979 through 1998 based on ninth revision (ICD-9) codes. For more information on the comparability of cause of death classification between ICD revisions see Comparability of Cause-of-death Between ICD Revisions.
Note: DPH is updating and streamlining the COVID-19 cases, deaths, and testing data. As of 6/27/2022, the data will be published in four tables instead of twelve. The COVID-19 Cases, Deaths, and Tests by Day dataset contains cases and test data by date of sample submission. The death data are by date of death. This dataset is updated daily and contains information back to the beginning of the pandemic. The data can be found at https://data.ct.gov/Health-and-Human-Services/COVID-19-Cases-Deaths-and-Tests-by-Day/g9vi-2ahj. The COVID-19 State Metrics dataset contains over 93 columns of data. This dataset is updated daily and currently contains information starting June 21, 2022 to the present. The data can be found at https://data.ct.gov/Health-and-Human-Services/COVID-19-State-Level-Data/qmgw-5kp6 . The COVID-19 County Metrics dataset contains 25 columns of data. This dataset is updated daily and currently contains information starting June 16, 2022 to the present. The data can be found at https://data.ct.gov/Health-and-Human-Services/COVID-19-County-Level-Data/ujiq-dy22 . The COVID-19 Town Metrics dataset contains 16 columns of data. This dataset is updated daily and currently contains information starting June 16, 2022 to the present. The data can be found at https://data.ct.gov/Health-and-Human-Services/COVID-19-Town-Level-Data/icxw-cada . To protect confidentiality, if a town has fewer than 5 cases or positive NAAT tests over the past 7 days, those data will be suppressed. COVID-19 cases and associated deaths that have been reported among Connecticut residents, broken down by race and ethnicity. All data in this report are preliminary; data for previous dates will be updated as new reports are received and data errors are corrected. Deaths reported to the either the Office of the Chief Medical Examiner (OCME) or Department of Public Health (DPH) are included in the COVID-19 update. The following data show the number of COVID-19 cases and associated deaths per 100,000 population by race and ethnicity. Crude rates represent the total cases or deaths per 100,000 people. Age-adjusted rates consider the age of the person at diagnosis or death when estimating the rate and use a standardized population to provide a fair comparison between population groups with different age distributions. Age-adjustment is important in Connecticut as the median age of among the non-Hispanic white population is 47 years, whereas it is 34 years among non-Hispanic blacks, and 29 years among Hispanics. Because most non-Hispanic white residents who died were over 75 years of age, the age-adjusted rates are lower than the unadjusted rates. In contrast, Hispanic residents who died tend to be younger than 75 years of age which results in higher age-adjusted rates. The population data used to calculate rates is based on the CT DPH population statistics for 2019, which is available online here: https://portal.ct.gov/DPH/Health-Information-Systems--Reporting/Population/Population-Statistics. Prior to 5/10/2021, the population estimates from 2018 were used. Rates are standardized to the 2000 US Millions Standard population (data available here: https://seer.cancer.gov/stdpopulations/). Standardization was done using 19 age groups (0, 1-4, 5-9, 10-14, ..., 80-84, 85 years and older). More information about direct standardization for age adjustment is available here: https://www.cdc.gov/nchs/data/statnt/statnt06rv.pdf Categories are mutually exclusive. The category “multiracial” includes people who answered ‘yes’ to more than one race category. Counts may not add up to total case counts as data on race and ethnicity may be missing. Age adjusted rates calculated only for groups with more than 20 deaths. Abbreviation: NH=Non-Hispanic. Data on Connecticut deaths were obtained from the Connecticut Deaths Registry maintained by the DPH Office of Vital Records. Cause of death was determined by a death certifier (e.g., physician, APRN, medical
Data for CDC’s COVID Data Tracker site on Rates of COVID-19 Cases and Deaths by Vaccination Status. Click 'More' for important dataset description and footnotes
Dataset and data visualization details: These data were posted on October 21, 2022, archived on November 18, 2022, and revised on February 22, 2023. These data reflect cases among persons with a positive specimen collection date through September 24, 2022, and deaths among persons with a positive specimen collection date through September 3, 2022.
Vaccination status: A person vaccinated with a primary series had SARS-CoV-2 RNA or antigen detected on a respiratory specimen collected ≥14 days after verifiably completing the primary series of an FDA-authorized or approved COVID-19 vaccine. An unvaccinated person had SARS-CoV-2 RNA or antigen detected on a respiratory specimen and has not been verified to have received COVID-19 vaccine. Excluded were partially vaccinated people who received at least one FDA-authorized vaccine dose but did not complete a primary series ≥14 days before collection of a specimen where SARS-CoV-2 RNA or antigen was detected. Additional or booster dose: A person vaccinated with a primary series and an additional or booster dose had SARS-CoV-2 RNA or antigen detected on a respiratory specimen collected ≥14 days after receipt of an additional or booster dose of any COVID-19 vaccine on or after August 13, 2021. For people ages 18 years and older, data are graphed starting the week including September 24, 2021, when a COVID-19 booster dose was first recommended by CDC for adults 65+ years old and people in certain populations and high risk occupational and institutional settings. For people ages 12-17 years, data are graphed starting the week of December 26, 2021, 2 weeks after the first recommendation for a booster dose for adolescents ages 16-17 years. For people ages 5-11 years, data are included starting the week of June 5, 2022, 2 weeks after the first recommendation for a booster dose for children aged 5-11 years. For people ages 50 years and older, data on second booster doses are graphed starting the week including March 29, 2022, when the recommendation was made for second boosters. Vertical lines represent dates when changes occurred in U.S. policy for COVID-19 vaccination (details provided above). Reporting is by primary series vaccine type rather than additional or booster dose vaccine type. The booster dose vaccine type may be different than the primary series vaccine type. ** Because data on the immune status of cases and associated deaths are unavailable, an additional dose in an immunocompromised person cannot be distinguished from a booster dose. This is a relevant consideration because vaccines can be less effective in this group. Deaths: A COVID-19–associated death occurred in a person with a documented COVID-19 diagnosis who died; health department staff reviewed to make a determination using vital records, public health investigation, or other data sources. Rates of COVID-19 deaths by vaccination status are reported based on when the patient was tested for COVID-19, not the date they died. Deaths usually occur up to 30 days after COVID-19 diagnosis. Participating jurisdictions: Currently, these 31 health departments that regularly link their case surveillance to immunization information system data are included in these incidence rate estimates: Alabama, Arizona, Arkansas, California, Colorado, Connecticut, District of Columbia, Florida, Georgia, Idaho, Indiana, Kansas, Kentucky, Louisiana, Massachusetts, Michigan, Minnesota, Nebraska, New Jersey, New Mexico, New York, New York City (New York), North Carolina, Philadelphia (Pennsylvania), Rhode Island, South Dakota, Tennessee, Texas, Utah, Washington, and West Virginia; 30 jurisdictions also report deaths among vaccinated and unvaccinated people. These jurisdictions represent 72% of the total U.S. population and all ten of the Health and Human Services Regions. Data on cases among people who received additional or booster doses were reported from 31 jurisdictions; 30 jurisdictions also reported data on deaths among people who received one or more additional or booster dose; 28 jurisdictions reported cases among people who received two or more additional or booster doses; and 26 jurisdictions reported deaths among people who received two or more additional or booster doses. This list will be updated as more jurisdictions participate. Incidence rate estimates: Weekly age-specific incidence rates by vaccination status were calculated as the number of cases or deaths divided by the number of people vaccinated with a primary series, overall or with/without a booster dose (cumulative) or unvaccinated (obtained by subtracting the cumulative number of people vaccinated with a primary series and partially vaccinated people from the 2019 U.S. intercensal population estimates) and multiplied by 100,000. Overall incidence rates were age-standardized using the 2000 U.S. Census standard population. To estimate population counts for ages 6 months through 1 year, half of the single-year population counts for ages 0 through 1 year were used. All rates are plotted by positive specimen collection date to reflect when incident infections occurred. For the primary series analysis, age-standardized rates include ages 12 years and older from April 4, 2021 through December 4, 2021, ages 5 years and older from December 5, 2021 through July 30, 2022 and ages 6 months and older from July 31, 2022 onwards. For the booster dose analysis, age-standardized rates include ages 18 years and older from September 19, 2021 through December 25, 2021, ages 12 years and older from December 26, 2021, and ages 5 years and older from June 5, 2022 onwards. Small numbers could contribute to less precision when calculating death rates among some groups. Continuity correction: A continuity correction has been applied to the denominators by capping the percent population coverage at 95%. To do this, we assumed that at least 5% of each age group would always be unvaccinated in each jurisdiction. Adding this correction ensures that there is always a reasonable denominator for the unvaccinated population that would prevent incidence and death rates from growing unrealistically large due to potential overestimates of vaccination coverage. Incidence rate ratios (IRRs): IRRs for the past one month were calculated by dividing the average weekly incidence rates among unvaccinated people by that among people vaccinated with a primary series either overall or with a booster dose. Publications: Scobie HM, Johnson AG, Suthar AB, et al. Monitoring Incidence of COVID-19 Cases, Hospitalizations, and Deaths, by Vaccination Status — 13 U.S. Jurisdictions, April 4–July 17, 2021. MMWR Morb Mortal Wkly Rep 2021;70:1284–1290. Johnson AG, Amin AB, Ali AR, et al. COVID-19 Incidence and Death Rates Among Unvaccinated and Fully Vaccinated Adults with and Without Booster Doses During Periods of Delta and Omicron Variant Emergence — 25 U.S. Jurisdictions, April 4–December 25, 2021. MMWR Morb Mortal Wkly Rep 2022;71:132–138. Johnson AG, Linde L, Ali AR, et al. COVID-19 Incidence and Mortality Among Unvaccinated and Vaccinated Persons Aged ≥12 Years by Receipt of Bivalent Booster Doses and Time Since Vaccination — 24 U.S. Jurisdictions, October 3, 2021–December 24, 2022. MMWR Morb Mortal Wkly Rep 2023;72:145–152. Johnson AG, Linde L, Payne AB, et al. Notes from the Field: Comparison of COVID-19 Mortality Rates Among Adults Aged ≥65 Years Who Were Unvaccinated and Those Who Received a Bivalent Booster Dose Within the Preceding 6 Months — 20 U.S. Jurisdictions, September 18, 2022–April 1, 2023. MMWR Morb Mortal Wkly Rep 2023;72:667–669.
Note: This COVID-19 data set is no longer being updated as of December 1, 2023. Access current COVID-19 data on the CDPH respiratory virus dashboard (https://www.cdph.ca.gov/Programs/CID/DCDC/Pages/Respiratory-Viruses/RespiratoryDashboard.aspx) or in open data format (https://data.chhs.ca.gov/dataset/respiratory-virus-dashboard-metrics).
As of August 17, 2023, data is being updated each Friday.
For death data after December 31, 2022, California uses Provisional Deaths from the Center for Disease Control and Prevention’s National Center for Health Statistics (NCHS) National Vital Statistics System (NVSS). Prior to January 1, 2023, death data was sourced from the COVID-19 registry. The change in data source occurred in July 2023 and was applied retroactively to all 2023 data to provide a consistent source of death data for the year of 2023.
As of May 11, 2023, data on cases, deaths, and testing is being updated each Thursday. Metrics by report date have been removed, but previous versions of files with report date metrics are archived below.
All metrics include people in state and federal prisons, US Immigration and Customs Enforcement facilities, US Marshal detention facilities, and Department of State Hospitals facilities. Members of California's tribal communities are also included.
The "Total Tests" and "Positive Tests" columns show totals based on the collection date. There is a lag between when a specimen is collected and when it is reported in this dataset. As a result, the most recent dates on the table will temporarily show NONE in the "Total Tests" and "Positive Tests" columns. This should not be interpreted as no tests being conducted on these dates. Instead, these values will be updated with the number of tests conducted as data is received.
U.S. Government Workshttps://www.usa.gov/government-works
License information was derived automatically
Note: DPH is updating and streamlining the COVID-19 cases, deaths, and testing data. As of 6/27/2022, the data will be published in four tables instead of twelve.
The COVID-19 Cases, Deaths, and Tests by Day dataset contains cases and test data by date of sample submission. The death data are by date of death. This dataset is updated daily and contains information back to the beginning of the pandemic. The data can be found at https://data.ct.gov/Health-and-Human-Services/COVID-19-Cases-Deaths-and-Tests-by-Day/g9vi-2ahj.
The COVID-19 State Metrics dataset contains over 93 columns of data. This dataset is updated daily and currently contains information starting June 21, 2022 to the present. The data can be found at https://data.ct.gov/Health-and-Human-Services/COVID-19-State-Level-Data/qmgw-5kp6 .
The COVID-19 County Metrics dataset contains 25 columns of data. This dataset is updated daily and currently contains information starting June 16, 2022 to the present. The data can be found at https://data.ct.gov/Health-and-Human-Services/COVID-19-County-Level-Data/ujiq-dy22 .
The COVID-19 Town Metrics dataset contains 16 columns of data. This dataset is updated daily and currently contains information starting June 16, 2022 to the present. The data can be found at https://data.ct.gov/Health-and-Human-Services/COVID-19-Town-Level-Data/icxw-cada . To protect confidentiality, if a town has fewer than 5 cases or positive NAAT tests over the past 7 days, those data will be suppressed.
COVID-19 tests, cases, and associated deaths that have been reported among Connecticut residents. All data in this report are preliminary; data for previous dates will be updated as new reports are received and data errors are corrected. Hospitalization data were collected by the Connecticut Hospital Association and reflect the number of patients currently hospitalized with laboratory-confirmed COVID-19. Deaths reported to the either the Office of the Chief Medical Examiner (OCME) or Department of Public Health (DPH) are included in the daily COVID-19 update.
Data on Connecticut deaths were obtained from the Connecticut Deaths Registry maintained by the DPH Office of Vital Records. Cause of death was determined by a death certifier (e.g., physician, APRN, medical examiner) using their best clinical judgment. Additionally, all COVID-19 deaths, including suspected or related, are required to be reported to OCME. On April 4, 2020, CT DPH and OCME released a joint memo to providers and facilities within Connecticut providing guidelines for certifying deaths due to COVID-19 that were consistent with the CDC’s guidelines and a reminder of the required reporting to OCME.25,26 As of July 1, 2021, OCME had reviewed every case reported and performed additional investigation on about one-third of reported deaths to better ascertain if COVID-19 did or did not cause or contribute to the death. Some of these investigations resulted in the OCME performing postmortem swabs for PCR testing on individuals whose deaths were suspected to be due to COVID-19, but antemortem diagnosis was unable to be made.31 The OCME issued or re-issued about 10% of COVID-19 death certificates and, when appropriate, removed COVID-19 from the death certificate. For standardization and tabulation of mortality statistics, written cause of death statements made by the certifiers on death certificates are sent to the National Center for Health Statistics (NCHS) at the CDC which assigns cause of death codes according to the International Causes of Disease 10th Revision (ICD-10) classification system.25,26 COVID-19 deaths in this report are defined as those for which the death certificate has an ICD-10 code of U07.1 as either a primary (underlying) or a contributing cause of death. More information on COVID-19 mortality can be found at the following link: https://portal.ct.gov/DPH/Health-Information-Systems--Reporting/Mortality/Mortality-Statistics
Data are reported daily, with timestamps indicated in the daily briefings posted at: portal.ct.gov/coronavirus. Data are subject to future revision as reporting changes.
Starting in July 2020, this dataset will be updated every weekday.
Additional notes: As of 11/5/2020, CT DPH has added antigen testing for SARS-CoV-2 to reported test counts in this dataset. The tests included in this dataset include both molecular and antigen datasets. Molecular tests reported include polymerase chain reaction (PCR) and nucleic acid amplicfication (NAAT) tests.
A delay in the data pull schedule occurred on 06/23/2020. Data from 06/22/2020 was processed on 06/23/2020 at 3:30 PM. The normal data cycle resumed with the data for 06/23/2020.
A network outage on 05/19/2020 resulted in a change in the data pull schedule. Data from 5/19/2020 was processed on 05/20/2020 at 12:00 PM. Data from 5/20/2020 was processed on 5/20/2020 8:30 PM. The normal data cycle resumed on 05/20/2020 with the 8:30 PM data pull. As a result of the network outage, the timestamp on the datasets on the Open Data Portal differ from the timestamp in DPH's daily PDF reports.
Starting 5/10/2021, the date field will represent the date this data was updated on data.ct.gov. Previously the date the data was pulled by DPH was listed, which typically coincided with the date before the data was published on data.ct.gov. This change was made to standardize the COVID-19 data sets on data.ct.gov.
Starting April 4, 2022, negative rapid antigen and rapid PCR test results for SARS-CoV-2 are no longer required to be reported to the Connecticut Department of Public Health as of April 4. Negative test results from laboratory based molecular (PCR/NAAT) results are still required to be reported as are all positive test results from both molecular (PCR/NAAT) and antigen tests.
On 5/16/2022, 8,622 historical cases were included in the data. The date range for these cases were from August 2021 – April 2022.”
This dataset contains counts of deaths for California residents by ZIP Code based on information entered on death certificates. Final counts are derived from static data and include out-of-state deaths of California residents. The data tables include deaths of residents of California by ZIP Code of residence (by residence). The data are reported as totals, as well as stratified by age and gender. Deaths due to all causes (ALL) and selected underlying cause of death categories are provided. See temporal coverage for more information on which combinations are available for which years.
The cause of death categories are based solely on the underlying cause of death as coded by the International Classification of Diseases. The underlying cause of death is defined by the World Health Organization (WHO) as "the disease or injury which initiated the train of events leading directly to death, or the circumstances of the accident or violence which produced the fatal injury." It is a single value assigned to each death based on the details as entered on the death certificate. When more than one cause is listed, the order in which they are listed can affect which cause is coded as the underlying cause. This means that similar events could be coded with different underlying causes of death depending on variations in how they were entered. Consequently, while underlying cause of death provides a convenient comparison between cause of death categories, it may not capture the full impact of each cause of death as it does not always take into account all conditions contributing to the death.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
The data presented here consists of three parts: Dataset 1: In this set, we extract 327,322 faces from our entire collection of 3389 issues, and automatically classified each face as male or female. We present this data as a single table with columns identifying the date, issue, page number, the coordinates identifying the position of the face on the page, and classification (male or female). The coordinates identifying the position of the face on the page are based on the size and resolution of the pages found in the “Time Vault”. Dataset 2: Dataset 2 consists of 8,789 classified faces from 100 selected issues. Human labor was used to identify and extract 3,299 face images from 39 issues, which were later classified by another set of workers. This selection of 39 issues contains one issue per decade spanned by the archive plus one issue per year between 1961 and 1991, and the extracted face images were used to train the face extraction algorithm. The remaining 5,490 faces from 61 issues were extracted via machine learning before being classified by human coders. These 61 issues were chosen to complement the first selection of 39 issues: one issue per year for all years in the archive excluding those between 1961 and 1991. Thus, Dataset 2 contains fully-labelled faces from at least one issue per year. Dataset 3: In the interest of transparency, Dataset 3 consists of the raw data collected to create Dataset 2, and consists of 2 tables. Before explaining these tables we first briefly describe our data collection and verification procedures, which have been fully described elsewhere. A custom AMT interface was used to enable human labors to classify faces according the categories in Table 4. Each worker was given a randomly-selected batch of 25 pages, each with a clearly highlighted face to be categorized, of which three pages were verification pages with known features, which were used for quality control. Each face was labeled by two distinct human coders, determined at random so that the paring of coders varied with the image. A proficiency rating was calculated for each coder by considering all images they annotated and computing the average number of labels that matched those identified by the image’s other coder. The tables in Dataset 2 were created by resolving inconsistencies between the two image coders by selecting the labels from the coder with the highest proficiency rating. Prior to calculating the proficiency score, all faces that were tagged as having ‘Poor’ or ‘Error’ image quality by either of the two coders were eliminated. Due to technical bugs when the AMT interface was first implemented, a small number of images were only labeled once; these were also eliminated from Datasets 2 and 3. In Dataset 3, we present the raw annotations for each coder that tagged each face, along with demographic data for each coder. Dataset 3 consists of two tables: the raw data from each of the two sets of coders, and the demographic information for each of the coders.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Authors:
*Corresponding author: mathias.sable-meyer@ucl.ac.uk
The perception and production of regular geometric shapes is a characteristic trait of human cultures since prehistory, whose neural mechanisms are unknown. Behavioral studies suggest that humans are attuned to discrete regularities such as symmetries and parallelism, and rely on their combinations to encode regular geometric shapes in a compressed form. To identify the relevant brain systems and their dynamics, we collected functional MRI and magnetoencephalography data in both adults and six-year-olds during the perception of simple shapes such as hexagons, triangles and quadrilaterals. The results revealed that geometric shapes, relative to other visual categories, induce a hypoactivation of ventral visual areas and an overactivation of the intraparietal and inferior temporal regions also involved in mathematical processing, whose activation is modulated by geometric regularity. While convolutional neural networks captured the early visual activity evoked by geometric shapes, they failed to account for subsequent dorsal parietal and prefrontal signals, which could only be captured by discrete geometric features or by more advanced transformer models of vision. We propose that the perception of abstract geometric regularities engages an additional symbolic mode of visual perception.
We separately share the MEG dataset at https://openneuro.org/datasets/ds006012. Below are some notes about the
fMRI dataset of N=20 adult participants (sub-2xx
, numbers between 204 and
223), and N=22 children (sub-3xx
, numbers between 301 and 325).
20.0.5
/usr/local/miniconda/bin/fmriprep /data /out participant --participant-label <label> --output-spaces MNI152NLin6Asym:res-2 MNI152NLin2009cAsym:res-2
bidsonym
running the pydeface
masking,
and nobrainer
brain registraction pipeline.sub-325
was acquired by a different experimenter and defaced before being
shared with the rest of the research team, hence why the slightly different
defacing mask. That participant was also preprocessed separately, and using a
more recent fMRIPrep version: 20.2.6
.sub-313
and sub-316
are missing one run of the localizer eachsub-316
has no data at all for the geometrysub-308
has eno useable data for the intruder task
Since all of these still have some data to contribute to either task, all
available files were kept on this dataset. The analysis code reflects these
inconsistencies where required with specific exceptions.Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Actual dataset Location on Kaggle Contains data scrapped by MyAutoScrapper (Written in go)
Since this kaggle dataset real car deals, placed by real humans with pictures, It can be used for real world Machine Learning(ML) or Machine Vision. Price predictions, image processing, machine vision etc.
This dataset contains data.csv file, which has 100 000 car deal detail. Each row representing each deal. data.csv has 18 columns: - ID: Represents unique identifier for each entry, also for each id, there is a sub-folder in images respectively, which contains images for the given deal. ID is an integer starting from 0. - Manufacturer: A string identifying car manufacturer. - Model: A string identifying car model. - Year: An Integer for the car production year. - Category: A type of the vechile (Sedan, Cabriolet, etc.). - Mileage: An integer representing car mileage in kilometers. - FuelType: A Fuel type the car uses. - EngineVolume: A Floating point number, representing engine volume in litres. - DriveWheels: A String representing car drive wheels (i.e. Front, Rear, 4x4, etc.). - GearBox: A string to identify gear box of the transmission (Manual, Automatic, etc.) - Doors: A string representing car doors (4, 4/5, etc.) - Wheel: Steering wheel position (Left Wheel, Right Wheel) - Color: Color of the car body. - InteriorColor: Interior color. - VIN: VIN number of the vechile, represented as a string. - LeatherInterior: A boolean value, true if car has a leather interior. - Price: Price of the car in USD. If ommited, meants price was set as negotiable. - Clearance: A boolean value identifying, whether customs has been cleared of not.
None of the fields (Except ID) are guaranteed to be filled, or filled with correct information. Since, people sometimes don't enter correct information, or hide some information for reasons. But for most of the entries, most of the fields are supposed to be filled with correct information.
U.S. Government Workshttps://www.usa.gov/government-works
License information was derived automatically
Note: DPH is updating and streamlining the COVID-19 cases, deaths, and testing data. As of 6/27/2022, the data will be published in four tables instead of twelve.
The COVID-19 Cases, Deaths, and Tests by Day dataset contains cases and test data by date of sample submission. The death data are by date of death. This dataset is updated daily and contains information back to the beginning of the pandemic. The data can be found at https://data.ct.gov/Health-and-Human-Services/COVID-19-Cases-Deaths-and-Tests-by-Day/g9vi-2ahj.
The COVID-19 State Metrics dataset contains over 93 columns of data. This dataset is updated daily and currently contains information starting June 21, 2022 to the present. The data can be found at https://data.ct.gov/Health-and-Human-Services/COVID-19-State-Level-Data/qmgw-5kp6 .
The COVID-19 County Metrics dataset contains 25 columns of data. This dataset is updated daily and currently contains information starting June 16, 2022 to the present. The data can be found at https://data.ct.gov/Health-and-Human-Services/COVID-19-County-Level-Data/ujiq-dy22 .
The COVID-19 Town Metrics dataset contains 16 columns of data. This dataset is updated daily and currently contains information starting June 16, 2022 to the present. The data can be found at https://data.ct.gov/Health-and-Human-Services/COVID-19-Town-Level-Data/icxw-cada . To protect confidentiality, if a town has fewer than 5 cases or positive NAAT tests over the past 7 days, those data will be suppressed.
This dataset includes a count and rate per 100,000 population for COVID-19 cases, a count of COVID-19 molecular diagnostic tests, and a percent positivity rate for tests among people living in community settings for the previous two-week period. Dates are based on date of specimen collection (cases and positivity).
A person is considered a new case only upon their first COVID-19 testing result because a case is defined as an instance or bout of illness. If they are tested again subsequently and are still positive, it still counts toward the test positivity metric but they are not considered another case.
Percent positivity is calculated as the number of positive tests among community residents conducted during the 14 days divided by the total number of positive and negative tests among community residents during the same period. If someone was tested more than once during that 14 day period, then those multiple test results (regardless of whether they were positive or negative) are included in the calculation.
These case and test counts do not include cases or tests among people residing in congregate settings, such as nursing homes, assisted living facilities, or correctional facilities.
These data are updated weekly and reflect the previous two full Sunday-Saturday (MMWR) weeks (https://wwwn.cdc.gov/nndss/document/MMWR_week_overview.pdf).
DPH note about change from 7-day to 14-day metrics: Prior to 10/15/2020, these metrics were calculated using a 7-day average rather than a 14-day average. The 7-day metrics are no longer being updated as of 10/15/2020 but the archived dataset can be accessed here: https://data.ct.gov/Health-and-Human-Services/COVID-19-case-rate-per-100-000-population-and-perc/s22x-83rd
As you know, we are learning more about COVID-19 all the time, including the best ways to measure COVID-19 activity in our communities. CT DPH has decided to shift to 14-day rates because these are more stable, particularly at the town level, as compared to 7-day rates. In addition, since the school indicators were initially published by DPH last summer, CDC has recommended 14-day rates and other states (e.g., Massachusetts) have started to implement 14-day metrics for monitoring COVID transmission as well.
With respect to geography, we also have learned that many people are looking at the town-level data to inform decision making, despite emphasis on the county-level metrics in the published addenda. This is understandable as there has been variation within counties in COVID-19 activity (for example, rates that are higher in one town than in most other towns in the county).
Additional notes: As of 11/5/2020, CT DPH has added antigen testing for SARS-CoV-2 to reported test counts in this dataset. The tests included in this dataset include both molecular and antigen datasets. Molecular tests reported include polymerase chain reaction (PCR) and nucleic acid amplicfication (NAAT) tests.
The population data used to calculate rates is based on the CT DPH population statistics for 2019, which is available online here: https://portal.ct.gov/DPH/Health-Information-Systems--Reporting/Population/Population-Statistics. Prior to 5/10/2021, the population estimates from 2018 were used.
Data suppression is applied when the rate is <5 cases per 100,000 or if there are <5 cases within the town. Information on why data suppression rules are applied can be found online here: https://www.cdc.gov/cancer/uscs/technical_notes/stat_methods/suppression.htm
The dataset contains risk-adjusted mortality rates, quality ratings, and number of deaths and cases for 6 medical conditions treated (Acute Stroke, Acute Myocardial Infarction, Heart Failure, Gastrointestinal Hemorrhage, Hip Fracture and Pneumonia) and 3 procedures performed (Carotid Endarterectomy, Pancreatic Resection, and Percutaneous Coronary Intervention) in California hospitals. The 2023 IMIs were generated using AHRQ Version 2024, while previous years' IMIs were generated with older versions of AHRQ software (2022 IMIs by Version 2023, 2021 IMIs by Version 2022, 2020 IMIs by Version 2021, 2019 IMIs by Version 2020, 2016-2018 IMIs by Version 2019, 2014 and 2015 IMIs by Version 5.0, and 2012 and 2013 IMIs by Version 4.5). The differences in the statistical method employed and inclusion and exclusion criteria using different versions can lead to different results. Users should not compare trends of mortality rates over time. However, many hospitals showed consistent performance over years; “better” performing hospitals may perform better and “worse” performing hospitals may perform worse consistently across years. This dataset does not include conditions treated or procedures performed in outpatient settings. Please refer to statewide table for California overall rates: https://data.chhs.ca.gov/dataset/california-hospital-inpatient-mortality-rates-and-quality-ratings/resource/af88090e-b6f5-4f65-a7ea-d613e6569d96
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Universal Thermal Climate Index (UTCI) is a physiological temperature that is widely used in biometeorological studies to assess the heat stress felt by humans. UTCI considers the shortwave and longwave radiation incident on humans from the six cubical directions as well as air temperature, humidity, wind speed and clothing. As a part of NOAA National Integrated Heat Health Information System (NIHHIS) and NASA Interdisciplinary Research in Earth Science (IDS) project, we have generated the UTCI data for Austin, Texas and surrounding peri-urban area at 2-meters spatial resolution for the year 2017. Details on data generation and methodology can be found in Kamath et al., (2023) but are summarized here.
1. Datasets and model used
The solar and longwave environmental irradiance geometry (SOLWEIG) model was used to simulate shadows, mean radiant temperature (TMRT) and the UTCI (Lindberg et al., 2008). TMRT is the equivalent temperature due to exposure to absorbed shortwave and longwave radiation from all directions in a standing position. SOLWEIG was forced using near-surface ERA-5 data available at a spatial resolution of 0.25°x 0.25°. Building, vegetation heights, and digital terrain model were again derived from 3DEP LiDAR point cloud data. SOLWEIG was run using the urban multi-scale environment predictor (UMEP) (Lindberg et al., 2018) plug-in with QGIS.
2. Data availability
Diurnal UTCI data were calculated for typical meteorological clear sky days corresponding to Summer and Fall. The typical clear sky day was selected using the 10-year Typical meteorological Year (TMY) for Austin, Texas (30.2672° N, 97.7431° W) provided by National Solar Radiation Database (NSRDB). More details on TMY files can be found at: https://nsrdb.nrel.gov/data-sets/tmy
Additionally, data is developed for heat hazard for daytime Human Heat Health Index (H3I) calculation as defined by Kamath et al., (2023). Briefly, this heat hazard is defined as the fraction of the day when the UTCI exceeds certain threshold. The threshold used to calculate heat hazard for Summer and Fall were 35° C and 32°C, respectively that imply strong heat stress (Jendritzky et al., 2012). Note that UTCI is on a different scale compared to air temperature, and could yield different heat stress levels.
3. Data format
The georeferenced UTCI and heat hazard data are available in the geoTIFF file format. The files can be readily visualized using GIS software such as QGIS and ArcGIS, as well as programing languages such as Python.
4. Companion dataset
Based on the calculated UTCI here, the potential locations for tree planting were calculated to increase the shade to reduce heat vulnerability for Austin, Texas. [https://doi.org/10.5281/zenodo.6363494]
References
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Cervical Cancer Risk Factors for Biopsy: This Dataset is Obtained from UCI Repository and kindly acknowledged! This file contains a List of Risk Factors for Cervical Cancer leading to a Biopsy Examination! About 11,000 new cases of invasive cervical cancer are diagnosed each year in the U.S. However, the number of new cervical cancer cases has been declining steadily over the past decades. Although it is the most preventable type of cancer, each year cervical cancer kills about 4,000 women in the U.S. and about 300,000 women worldwide. In the United States, cervical cancer mortality rates plunged by 74% from 1955 - 1992 thanks to increased screening and early detection with the Pap test. AGE Fifty percent of cervical cancer diagnoses occur in women ages 35 - 54, and about 20% occur in women over 65 years of age. The median age of diagnosis is 48 years. About 15% of women develop cervical cancer between the ages of 20 - 30. Cervical cancer is extremely rare in women younger than age 20. However, many young women become infected with multiple types of human papilloma virus, which then can increase their risk of getting cervical cancer in the future. Young women with early abnormal changes who do not have regular examinations are at high risk for localized cancer by the time they are age 40, and for invasive cancer by age 50. SOCIOECONOMIC AND ETHNIC FACTORS Although the rate of cervical cancer has declined among both Caucasian and African-American women over the past decades, it remains much more prevalent in African-Americans -- whose death rates are twice as high as Caucasian women. Hispanic American women have more than twice the risk of invasive cervical cancer as Caucasian women, also due to a lower rate of screening. These differences, however, are almost certainly due to social and economic differences. Numerous studies report that high poverty levels are linked with low screening rates. In addition, lack of health insurance, limited transportation, and language difficulties hinder a poor woman’s access to screening services. HIGH SEXUAL ACTIVITY Human papilloma virus (HPV) is the main risk factor for cervical cancer. In adults, the most important risk factor for HPV is sexual activity with an infected person. Women most at risk for cervical cancer are those with a history of multiple sexual partners, sexual intercourse at age 17 years or younger, or both. A woman who has never been sexually active has a very low risk for developing cervical cancer. Sexual activity with multiple partners increases the likelihood of many other sexually transmitted infections (chlamydia, gonorrhea, syphilis).Studies have found an association between chlamydia and cervical cancer risk, including the possibility that chlamydia may prolong HPV infection. FAMILY HISTORY Women have a higher risk of cervical cancer if they have a first-degree relative (mother, sister) who has had cervical cancer. USE OF ORAL CONTRACEPTIVES Studies have reported a strong association between cervical cancer and long-term use of oral contraception (OC). Women who take birth control pills for more than 5 - 10 years appear to have a much higher risk HPV infection (up to four times higher) than those who do not use OCs. (Women taking OCs for fewer than 5 years do not have a significantly higher risk.) The reasons for this risk from OC use are not entirely clear. Women who use OCs may be less likely to use a diaphragm, condoms, or other methods that offer some protection against sexual transmitted diseases, including HPV. Some research also suggests that the hormones in OCs might help the virus enter the genetic material of cervical cells. HAVING MANY CHILDREN Studies indicate that having many children increases the risk for developing cervical cancer, particularly in women infected with HPV. SMOKING Smoking is associated with a higher risk for precancerous changes (dysplasia) in the cervix and for progression to invasive cervical cancer, especially for women infected with HPV. IMMUNOSUPPRESSION Women with weak immune systems, (such as those with HIV / AIDS), are more susceptible to acquiring HPV. Immunocompromised patients are also at higher risk for having cervical precancer develop rapidly into invasive cancer. DIETHYLSTILBESTROL (DES) From 1938 - 1971, diethylstilbestrol (DES), an estrogen-related drug, was widely prescribed to pregnant women to help prevent miscarriages. The daughters of these women face a higher risk for cervical cancer. DES is no longer prsecribed.
https://www.usa.gov/government-workshttps://www.usa.gov/government-works
After May 3, 2024, this dataset and webpage will no longer be updated because hospitals are no longer required to report data on COVID-19 hospital admissions, and hospital capacity and occupancy data, to HHS through CDC’s National Healthcare Safety Network. Data voluntarily reported to NHSN after May 1, 2024, will be available starting May 10, 2024, at COVID Data Tracker Hospitalizations.
The following dataset provides facility-level data for hospital utilization aggregated on a weekly basis (Sunday to Saturday). These are derived from reports with facility-level granularity across two main sources: (1) HHS TeleTracking, and (2) reporting provided directly to HHS Protect by state/territorial health departments on behalf of their healthcare facilities.
The hospital population includes all hospitals registered with Centers for Medicare & Medicaid Services (CMS) as of June 1, 2020. It includes non-CMS hospitals that have reported since July 15, 2020. It does not include psychiatric, rehabilitation, Indian Health Service (IHS) facilities, U.S. Department of Veterans Affairs (VA) facilities, Defense Health Agency (DHA) facilities, and religious non-medical facilities.
For a given entry, the term “collection_week” signifies the start of the period that is aggregated. For example, a “collection_week” of 2020-11-15 means the average/sum/coverage of the elements captured from that given facility starting and including Sunday, November 15, 2020, and ending and including reports for Saturday, November 21, 2020.
Reported elements include an append of either “_coverage”, “_sum”, or “_avg”.
The file will be updated weekly. No statistical analysis is applied to impute non-response. For averages, calculations are based on the number of values collected for a given hospital in that collection week. Suppression is applied to the file for sums and averages less than four (4). In these cases, the field will be replaced with “-999,999”.
A story page was created to display both corrected and raw datasets and can be accessed at this link: https://healthdata.gov/stories/s/nhgk-5gpv
This data is preliminary and subject to change as more data become available. Data is available starting on July 31, 2020.
Sometimes, reports for a given facility will be provided to both HHS TeleTracking and HHS Protect. When this occurs, to ensure that there are not duplicate reports, deduplication is applied according to prioritization rules within HHS Protect.
For influenza fields listed in the file, the current HHS guidance marks these fields as optional. As a result, coverage of these elements are varied.
For recent updates to the dataset, scroll to the bottom of the dataset description.
On May 3, 2021, the following fields have been added to this data set.
On May 8, 2021, this data set has been converted to a corrected data set. The corrections applied to this data set are to smooth out data anomalies caused by keyed in data errors. To help determine which records have had corrections made to it. An additional Boolean field called is_corrected has been added.
On May 13, 2021 Changed vaccination fields from sum to max or min fields. This reflects the maximum or minimum number reported for that metric in a given week.
On June 7, 2021 Changed vaccination fields from max or min fields to Wednesday reported only. This reflects that the number reported for that metric is only reported on Wednesdays in a given week.
On September 20, 2021, the following has been updated: The use of analytic dataset as a source.
On January 19, 2022, the following fields have been added to this dataset:
On April 28, 2022, the following pediatric fields have been added to this dataset:
On October 24, 2022, the data includes more analytical calculations in efforts to provide a cleaner dataset. For a raw version of this dataset, please follow this link: https://healthdata.gov/Hospital/COVID-19-Reported-Patient-Impact-and-Hospital-Capa/uqq2-txqb
Due to changes in reporting requirements, after June 19, 2023, a collection week is defined as starting on a Sunday and ending on the next Saturday.
https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement
Welcome to the Native American Multi-Year Facial Image Dataset, thoughtfully curated to support the development of advanced facial recognition systems, biometric identification models, KYC verification tools, and other computer vision applications. This dataset is ideal for training AI models to recognize individuals over time, track facial changes, and enhance age progression capabilities.
This dataset includes over 5,000+ high-quality facial images, organized into individual participant sets, each containing:
To ensure model generalization and practical usability, images in this dataset reflect real-world diversity:
Each participant’s dataset is accompanied by rich metadata to support advanced model training and analysis, including:
This dataset is highly valuable for a wide range of AI and computer vision applications:
To keep pace with evolving AI needs, this dataset is regularly updated and customizable. Custom data collection options include:
Attribution-NoDerivs 4.0 (CC BY-ND 4.0)https://creativecommons.org/licenses/by-nd/4.0/
License information was derived automatically
Give Machines the Power to See People.
This isn’t just a dataset — it’s a foundation for building the future of human-aware technology. Carefully crafted and annotated with precision, the People Detection dataset enables AI systems to recognize and understand human presence in dynamic, real-world environments.
Whether you’re building smart surveillance, autonomous vehicles, crowd analytics, or next-gen robotics, this dataset gives your model the eyes it needs.
Created using Roboflow. Optimized for clarity, performance, and scale. Source Dataset on Roboflow →
This is more than a dataset. It’s a step toward a smarter world — One where machines can understand people.
Note: DPH is updating and streamlining the COVID-19 cases, deaths, and testing data. As of 6/27/2022, the data will be published in four tables instead of twelve. The COVID-19 Cases, Deaths, and Tests by Day dataset contains cases and test data by date of sample submission. The death data are by date of death. This dataset is updated daily and contains information back to the beginning of the pandemic. The data can be found at https://data.ct.gov/Health-and-Human-Services/COVID-19-Cases-Deaths-and-Tests-by-Day/g9vi-2ahj. The COVID-19 State Metrics dataset contains over 93 columns of data. This dataset is updated daily and currently contains information starting June 21, 2022 to the present. The data can be found at https://data.ct.gov/Health-and-Human-Services/COVID-19-State-Level-Data/qmgw-5kp6 . The COVID-19 County Metrics dataset contains 25 columns of data. This dataset is updated daily and currently contains information starting June 16, 2022 to the present. The data can be found at https://data.ct.gov/Health-and-Human-Services/COVID-19-County-Level-Data/ujiq-dy22 . The COVID-19 Town Metrics dataset contains 16 columns of data. This dataset is updated daily and currently contains information starting June 16, 2022 to the present. The data can be found at https://data.ct.gov/Health-and-Human-Services/COVID-19-Town-Level-Data/icxw-cada . To protect confidentiality, if a town has fewer than 5 cases or positive NAAT tests over the past 7 days, those data will be suppressed. COVID-19 cases and associated deaths that have been reported among Connecticut residents, broken out by age group. All data in this report are preliminary; data for previous dates will be updated as new reports are received and data errors are corrected. Deaths reported to the either the Office of the Chief Medical Examiner (OCME) or Department of Public Health (DPH) are included in the daily COVID-19 update. Data are reported daily, with timestamps indicated in the daily briefings posted at: portal.ct.gov/coronavirus. Data are subject to future revision as reporting changes. Starting in July 2020, this dataset will be updated every weekday. Additional notes: A delay in the data pull schedule occurred on 06/23/2020. Data from 06/22/2020 was processed on 06/23/2020 at 3:30 PM. The normal data cycle resumed with the data for 06/23/2020. A network outage on 05/19/2020 resulted in a change in the data pull schedule. Data from 5/19/2020 was processed on 05/20/2020 at 12:00 PM. Data from 5/20/2020 was processed on 5/20/2020 8:30 PM. The normal data cycle resumed on 05/20/2020 with the 8:30 PM data pull. As a result of the network outage, the timestamp on the datasets on the Open Data Portal differ from the timestamp in DPH's daily PDF reports. Starting 5/10/2021, the date field will represent the date this data was updated on data.ct.gov. Previously the date the data was pulled by DPH was listed, which typically coincided with the date before the data was published on data.ct.gov. This change was made to standardize the COVID-19 data sets on data.ct.gov.
Global Human Footprint Index represents the relative human influence in each terrestrial biome expressed as a percentage. The purpose is to provide an updated map of anthropogenic impacts on the environment in geographic projection which can be used in wildlife conservation planning, natural resource management, and research on human-environment interactions. Dataset SummaryThe Global Human Footprint Index Dataset of the Last of the Wild Project, Version 2, 2005 (LWP-2) is the Human Influence Index (HII) normalized by biome and realm. The HII is a global dataset of 1-kilometer grid cells, created from nine global data layers of human population pressure (population density), human land use and infrastructure (built-up areas, nighttime lights, land use/land cover), and human access (coastlines, roads, railroads, navigable rivers). A value of zero represents the least influenced–the “most wild” part of the biome with value of 100 representing the most influenced (least wild) part of the biome. The dataset is produced by the Wildlife Conservation Society (WCS) and the Columbia University Center for International Earth Science Information Network (CIESIN).Recommended CitationWildlife Conservation Society - WCS, and Center for International Earth Science Information Network - CIESIN - Columbia University. 2005. Last of the Wild Project, Version 2, 2005 (LWP-2): Global Human Footprint Dataset (Geographic). Palisades, NY: NASA Socioeconomic Data and Applications Center (SEDAC). http://dx.doi.org/10.7927/H4M61H5F. Accessed DAY MONTH YEAR.
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
By Huggingface Hub [source]
This dataset contains valuable records of conversations between humans and AI-driven chatbots in real-world scenarios. This is a great opportunity to explore the nuances and intricacies of conversations between humans and machines, opening the door to interesting research directions for machine learning, artificial intelligence, natural language processing (NLP), and beyond. With this data, researchers can determine how well machines are able to simulate real conversation behavior such as nonverbal exchanges, intonations, humorous insights or even sarcasm. The data also provides an avenue for comparative studies between human behavior and AI capabilities in carrying out meaningful dialogues with humans. This knowledge base is invaluable for those who aim to create more astounding AI systems that can closely imitate comprehensible speech patterns through their trained technology models
For more datasets, click here.
- 🚨 Your notebook can be here! 🚨!
How to Use this Dataset
This dataset contains conversations between humans and AI-driven chatbots in real-world scenarios. With this dataset, you will be able to use the data to build an AI system that can respond intelligently in natural language conversations. For example, you can build a system with the ability to further engage users by replying with meaningful responses as the conversation progresses.
In order to get started, first familiarize yourself with the columns included in this dataset: 'chat' and 'system'. The column 'chat' contains conversations between humans and chatbot systems while the column 'system' contains responses from AI-driven chatbots.
Once you understand what is included in the data set, it's time for you to start building your AI system! Depending on how complex or advanced your goal is, there are several different approaches that could be used when working with this data set such as supervised learning models like seq2seq network or unsupervised methods like autoencoders etc. To get more detailed information regarding those methods refer to external materials available online.
After having trained your model, now it's time for testing out its performance! Enter some sample text into your model using either a web form or command line interface – then observe how it responds against what’s already stored within training datasets column ‘System’ which indicates expected chatsbot response (see above). You should find that once trained correctly; potential outcomes of such tests explores very closely resembling instances from learning sources (the training dataset) leading evidence of advanced Artificial intelligence applications are possible with sufficient analysis inputs! As always if extra accuracy is needed afterwards tweak any parameters until desired results are achieved - Congratulations!
- AI-driven natural language generation: Using this dataset, developers can train AI systems to automatically generate natural conversations between humans and machines.
- Automatic response selection: The data in the dataset could be used to train AI algorithms which select the most appropriate response in any given conversation.
- Evaluating human-machine interaction: Researchers can use this data to identify areas of improvement in conversational interactions between humans and machines, as well as evaluate various techniques for creating effective dialogue systems
If you use this dataset in your research, please credit the original authors. Data Source
License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.
File: train.csv | Column name | Description | |:--------------|:--------------------------------------------------------| | chat | Contains dialogues uttered by the human. (String) | | system | Contains responses from the AI-driven chatbot. (String) |
If you use this dataset in your research, please credit the original authors. If you use this dataset in your research, please credit Huggingface Hub.
This dataset contains counts of deaths for California as a whole based on information entered on death certificates. Final counts are derived from static data and include out-of-state deaths to California residents, whereas provisional counts are derived from incomplete and dynamic data. Provisional counts are based on the records available when the data was retrieved and may not represent all deaths that occurred during the time period. Deaths involving injuries from external or environmental forces, such as accidents, homicide and suicide, often require additional investigation that tends to delay certification of the cause and manner of death. This can result in significant under-reporting of these deaths in provisional data.
The final data tables include both deaths that occurred in California regardless of the place of residence (by occurrence) and deaths to California residents (by residence), whereas the provisional data table only includes deaths that occurred in California regardless of the place of residence (by occurrence). The data are reported as totals, as well as stratified by age, gender, race-ethnicity, and death place type. Deaths due to all causes (ALL) and selected underlying cause of death categories are provided. See temporal coverage for more information on which combinations are available for which years.
The cause of death categories are based solely on the underlying cause of death as coded by the International Classification of Diseases. The underlying cause of death is defined by the World Health Organization (WHO) as "the disease or injury which initiated the train of events leading directly to death, or the circumstances of the accident or violence which produced the fatal injury." It is a single value assigned to each death based on the details as entered on the death certificate. When more than one cause is listed, the order in which they are listed can affect which cause is coded as the underlying cause. This means that similar events could be coded with different underlying causes of death depending on variations in how they were entered. Consequently, while underlying cause of death provides a convenient comparison between cause of death categories, it may not capture the full impact of each cause of death as it does not always take into account all conditions contributing to the death.