In early February 2024, we will be retiring the Mpox Vaccinations Given to SF Residents by Demographics dataset. This dataset will be archived and no longer update. A historic record of this data will remain available.
A. SUMMARY This dataset represents doses of mpox vaccine (JYNNEOS) administered in California to residents of San Francisco ages 18 years or older. This dataset only includes doses of the JYNNEOS vaccine given on or after 5/1/2022. All vaccines given to people who live in San Francisco are included, no matter where the vaccination took place. The data are broken down by multiple demographic stratifications.
B. HOW THE DATASET IS CREATED Information on doses administered to those who live in San Francisco is from the California Immunization Registry (CAIR2), run by the California Department of Public Health (CDPH). Information on individuals’ city of residence, age, race, ethnicity, and sex are recorded in CAIR2 and are self-reported at the time of vaccine administration. Because CAIR2 does not include information on sexual orientation, we pull information from the San Francisco Department of Public Health’s Epic Electronic Health Record (EHR). The populations represented in our Epic data and the CAIR2 data are different. Epic data only include vaccinations administered at SFDPH managed sites to SF residents.
Data notes for population characteristic types are listed below.
Age * Data only include individuals who are 18 years of age or older.
Race/ethnicity * The response option "Other Race" is categorized by the data source system, and the response option "Unknown" refers to a lack of data.
Sex * The response option "Other" is categorized by the source system, and the response option "Unknown" refers to a lack of data.
Sexual orientation * The response option “Unknown/Declined” refers to a lack of data or individuals who reported multiple different sexual orientations during their most recent interaction with SFDPH.
For convenience, we provide the 2020 5-year American Community Survey population estimates.
C. UPDATE PROCESS Updated daily via automated process.
D. HOW TO USE THIS DATASET This dataset includes many different types of demographic groups. Filter the “demographic_group” column to explore a topic area. Then, the “demographic_subgroup” column shows each group or category within that topic area and the total count of doses administered to that population subgroup.
E. CHANGE LOG
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This repository contains the data required for the paper "The effects of skin tone on photoacoustic imaging and oximetry". Figure papers can all be reproduced using code available on GitHub: https://github.com/BohndiekLab/melanin-phantom-simulation-paper. Please follow the README available at this link to do so.
The aim of the study is to understand how photoacoustic imaging is affected by skin colour. To do so, we ran photoacoustic simulations of a forearm and a cylindrical blood flow phantom, imaged a blood-flow phantom in a commercial photoacoustic imaging system (iThera inVision, iThera Medical GmbH) and imaged pigmented mice. Further details can be obtained in the associated publication. We particularly looked at how blood oximetry was affected by skin tone.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The data on relationship to householder were derived from answers to Question 2 in the 2015 American Community Survey (ACS), which was asked of all people in housing units. The question on relationship is essential for classifying the population information on families and other groups. Information about changes in the composition of the American family, from the number of people living alone to the number of children living with only one parent, is essential for planning and carrying out a number of federal programs.
The responses to this question were used to determine the relationships of all persons to the householder, as well as household type (married couple family, nonfamily, etc.). From responses to this question, we were able to determine numbers of related children, own children, unmarried partner households, and multi-generational households. We calculated average household and family size. When relationship was not reported, it was imputed using the age difference between the householder and the person, sex, and marital status.
Household – A household includes all the people who occupy a housing unit. (People not living in households are classified as living in group quarters.) A housing unit is a house, an apartment, a mobile home, a group of rooms, or a single room that is occupied (or if vacant, is intended for occupancy) as separate living quarters. Separate living quarters are those in which the occupants live separately from any other people in the building and which have direct access from the outside of the building or through a common hall. The occupants may be a single family, one person living alone, two or more families living together, or any other group of related or unrelated people who share living arrangements.
Average Household Size – A measure obtained by dividing the number of people in households by the number of households. In cases where people in households are cross-classified by race or Hispanic origin, people in the household are classified by the race or Hispanic origin of the householder rather than the race or Hispanic origin of each individual.
Average household size is rounded to the nearest hundredth.
Comparability – The relationship categories for the most part can be compared to previous ACS years and to similar data collected in the decennial census, CPS, and SIPP. With the change in 2008 from “In-law” to the two categories of “Parent-in-law” and “Son-in-law or daughter-in-law,” caution should be exercised when comparing data on in-laws from previous years. “In-law” encompassed any type of in-law such as sister-in-law. Combining “Parent-in-law” and “son-in-law or daughter-in-law” does not represent all “in-laws” in 2008.
The same can be said of comparing the three categories of “biological” “step,” and “adopted” child in 2008 to “Child” in previous years. Before 2008, respondents may have considered anyone under 18 as “child” and chosen that category. The ACS includes “foster child” as a category. However, the 2010 Census did not contain this category, and “foster children” were included in the “Other nonrelative” category. Therefore, comparison of “foster child” cannot be made to the 2010 Census. Beginning in 2013, the “spouse” category includes same-sex spouses.
This dataset contains detailed information on cases where a hate or bias crime has been reported to the Bloomington Police Department. Hate crimes are criminal offenses motivated by bias against race, religion, ethnicity, sexual orientation, gender identity, or other protected characteristics. This dataset provides insights into the nature and demographics of hate crimes in Bloomington, aiding in understanding and addressing these incidents.
The dataset includes the following columns:
Column Name | Description | API Field Name | Data Type |
---|---|---|---|
case_number | Case Number | case_number | Text |
date | Date | date | Floating Timestamp |
weekday | Day of Week | day_of_week | Text |
victims | Total Number of Victims | victims | Number |
victim_race | Victim Race | victim_race | Text |
victim_gender | Victim Gender | victim_gender | Text |
victim_type | Victim Type | victim_type | Text |
offenders | Total Number of Offenders | offenders | Number |
offender_race | Offender Race | offender_race | Text |
offender_gender | Offender Gender | offender_gender | Text |
offense | Offense / Crime | offense | Text |
location_type | Offense / Crime Location Type | location_type | Text |
motivation | Offense/Crime Bias Motivation | motivation | Text |
This dataset can be used for:
With the increasing relevance of ethnic groups as political actors, the literature has attempted to identify and study the ethnic organizations representing these groups. How do these organizations use digital communication channels to reach their domestic and international audiences? To enable research on these questions, we have developed the Ethnic Organizations Online (EO2) dataset, a new data collection focusing on the online channels that ethnic organizations use. The dataset includes four types of channels: Twitter, Facebook, Instagram, and regular websites. It relies on the Ethnic Power Relations -- Organizations database, and is therefore compatible with an entire family of datasets on ethnic politics. Featuring more than 2,000 online channels used by 265 groups, it allows researchers to study a wide variety of questions related to digital ethnic mobilization.
This repository contains the dataset, codebook, and further information on working with the dataset. A paper titled "Ethnic Politics via Digital Means: Introducing the Ethnic Organizations Online (EO2) Dataset" is forthcoming in Journal of Peace Research.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Context
The dataset tabulates the population of Brooklyn borough by race. It includes the population of Brooklyn borough across racial categories (excluding ethnicity) as identified by the Census Bureau. The dataset can be utilized to understand the population distribution of Brooklyn borough across relevant racial categories.
Key observations
The percent distribution of Brooklyn borough population by race (across all racial categories recognized by the U.S. Census Bureau): 39.26% are white, 28.99% are Black or African American, 0.59% are American Indian and Alaska Native, 12.04% are Asian, 0.05% are Native Hawaiian and other Pacific Islander, 10.23% are some other race and 8.84% are multiracial.
When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates.
Racial categories include:
Variables / Data Columns
Good to know
Margin of Error
Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.
Custom data
If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.
Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.
This dataset is a part of the main dataset for Brooklyn borough Population by Race & Ethnicity. You can refer the same here
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Context
The dataset presents the detailed breakdown of the count of individuals within distinct income brackets, categorizing them by gender (men and women) and employment type - full-time (FT) and part-time (PT), offering valuable insights into the diverse income landscapes within Many. The dataset can be utilized to gain insights into gender-based income distribution within the Many population, aiding in data analysis and decision-making..
Key observations
When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates.
Income brackets:
Variables / Data Columns
Employment type classifications include:
Good to know
Margin of Error
Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.
Custom data
If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.
Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.
This dataset is a part of the main dataset for Many median household income by race. You can refer the same here
Description: 👉 Download the dataset here This dataset provides a detailed collection of horse race photo finishes from PMU events. It is ideal for machine learning and computer vision research, particularly in image recognition and sports analytics.
Download Dataset
Image Resolutions: Each image is captured and made available in three sizes:
Small (200×96 pixels): Optimized for quick, low-resolution analysis.
Medium (450×217 pixels): Balances detail and file size for moderate analysis.
Large (478×230 pixels): High-resolution images for precise, in-depth research.
CSV Metadata:
Accompanying the images, a CSV file contains critical metadata about each race:
Walk Type: Differentiates between Trot and Gallop, fundamental gait types in horse racing, essential for analyzing movement patterns.
Speciality & Discipline: These sub-categories give further details on the race type, providing researchers with more context for analysis.
Rope Direction: The position of the track’s barrier is noted as either on the left or right side of the horses, influencing lane dynamics and photo finish placement.
Weather Conditions: Detailed weather codes allow insights into the environmental conditions affecting race visibility and performance:
P1 – P17: Codes range from sunny and cloudy to adverse weather like thunderstorms and snow, offering a broad spectrum for training models on different environmental factors.
Additional Environmental Factors: Luminosity: Whether the race occurred during day or night is an important factor, providing training data for models that operate under various lighting conditions, enhancing the dataset’s versatility. Potential Applications:
This dataset is suitable for numerous machine learning applications:
Photo Finish Analysis: Machine learning models can be trained to detect race winners based on these images.
Environmental Impact Studies: The weather data enables research into how different weather conditions affect race outcomes.
Gait Classification: Using the trot and gallop metadata, researchers can develop algorithms that classify different horse movements automatically.
Why This Dataset Stands Out:
Versatility: With varied image sizes, weather conditions, and gait types, this dataset supports a broad range of research.
Rich Metadata: The dataset provides thorough race information that offers a deeper context for understanding the nuances of horse racing.
This dataset provides an excellent resource for building models related to sports analytics, environmental conditions, and gait classification. The variety in race conditions and photo finish details ensures its suitability for complex machine learning projects.
This dataset is sourced from Kaggle.
A. SUMMARY This dataset shows San Francisco COVID-19 deaths by population characteristics. This data may not be immediately available for recently reported deaths. Data updates as more information becomes available. Because of this, death totals may increase or decrease. Population characteristics are subgroups, or demographic cross-sections, like age, race, or gender. The City tracks how deaths have been distributed among different subgroups. This information can reveal trends and disparities among groups. B. HOW THE DATASET IS CREATED As of January 1, 2023, COVID-19 deaths are defined as persons who had COVID-19 listed as a cause of death or a significant condition contributing to their death on their death certificate. This definition is in alignment with the California Department of Public Health and the national Council of State and Territorial Epidemiologists. Death certificates are maintained by the California Department of Public Health. Data on the population characteristics of COVID-19 deaths are from: Case reports Medical records Electronic lab reports Death certificates Data are continually updated to maximize completeness of information and reporting on San Francisco COVID-19 deaths. To protect resident privacy, we summarize COVID-19 data by only one population characteristic at a time. Data are not shown until cumulative citywide deaths reach five or more. Data notes on select population characteristic types are listed below. Race/ethnicity * We include all race/ethnicity categories that are collected for COVID-19 cases. Gender * The City collects information on gender identity using these guidelines. C. UPDATE PROCESS Updates automatically at 06:30 and 07:30 AM Pacific Time on Wednesday each week. Dataset will not update on the business day following any federal holiday. D. HOW TO USE THIS DATASET Population estimates are only available for age groups and race/ethnicity categories. San Francisco population estimates for race/ethnicity and age groups can be found in a dataset based on the San Francisco Population and Demographic Census dataset.These population estimates are from the 2018-2022 5-year American Community Survey (ACS). This dataset includes several characteristic types. Filter the “Characteristic Type” column to explore a topic area. Then, the “Characteristic Group” column shows each group or category within that topic area and the number of cumulative deaths. Cumulative deaths are the running total of all San Francisco COVID-19 deaths in that characteristic group up to the date listed. To explore data on the total number of deaths, use the COVID-19 Deaths Over Time dataset. E. CHANGE LOG
As of July 2nd, 2024 the COVID-19 Deaths by Population Characteristics Over Time dataset has been retired. This dataset is archived and will no longer update. We will be publishing a cumulative deaths by population characteristics dataset that will update moving forward.
A. SUMMARY This dataset shows San Francisco COVID-19 deaths by population characteristics and by date. This data may not be immediately available for recently reported deaths. Data updates as more information becomes available. Because of this, death totals for previous days may increase or decrease. More recent data is less reliable.
Population characteristics are subgroups, or demographic cross-sections, like age, race, or gender. The City tracks how deaths have been distributed among different subgroups. This information can reveal trends and disparities among groups.
B. HOW THE DATASET IS CREATED As of January 1, 2023, COVID-19 deaths are defined as persons who had COVID-19 listed as a cause of death or a significant condition contributing to their death on their death certificate. This definition is in alignment with the California Department of Public Health and the national https://preparedness.cste.org/wp-content/uploads/2022/12/CSTE-Revised-Classification-of-COVID-19-associated-Deaths.Final_.11.22.22.pdf">Council of State and Territorial Epidemiologists. Death certificates are maintained by the California Department of Public Health.
Data on the population characteristics of COVID-19 deaths are from: *Case reports *Medical records *Electronic lab reports *Death certificates
Data are continually updated to maximize completeness of information and reporting on San Francisco COVID-19 deaths.
To protect resident privacy, we summarize COVID-19 data by only one characteristic at a time. Data are not shown until cumulative citywide deaths reach five or more.
Data notes on each population characteristic type is listed below.
Race/ethnicity * We include all race/ethnicity categories that are collected for COVID-19 cases.
Gender * The City collects information on gender identity using these guidelines.
C. UPDATE PROCESS Updates automatically at 06:30 and 07:30 AM Pacific Time on Wednesday each week.
Dataset will not update on the business day following any federal holiday.
D. HOW TO USE THIS DATASET Population estimates are only available for age groups and race/ethnicity categories. San Francisco population estimates for race/ethnicity and age groups can be found in a view based on the San Francisco Population and Demographic Census dataset. These population estimates are from the 2016-2020 5-year American Community Survey (ACS).
This dataset includes many different types of characteristics. Filter the “Characteristic Type” column to explore a topic area. Then, the “Characteristic Group” column shows each group or category within that topic area and the number of deaths on each date.
New deaths are the count of deaths within that characteristic group on that specific date. Cumulative deaths are the running total of all San Francisco COVID-19 deaths in that characteristic group up to the date listed.
This data may not be immediately available for more recent deaths. Data updates as more information becomes available.
To explore data on the total number of deaths, use the COVID-19 Deaths Over Time dataset.
E. CHANGE LOG
ODC Public Domain Dedication and Licence (PDDL) v1.0http://www.opendatacommons.org/licenses/pddl/1.0/
License information was derived automatically
A. SUMMARY This archived dataset includes data for population characteristics that are no longer being reported publicly. The date on which each population characteristic type was archived can be found in the field “data_loaded_at”.
B. HOW THE DATASET IS CREATED Data on the population characteristics of COVID-19 cases are from: * Case interviews * Laboratories * Medical providers These multiple streams of data are merged, deduplicated, and undergo data verification processes.
Race/ethnicity * We include all race/ethnicity categories that are collected for COVID-19 cases. * The population estimates for the "Other" or “Multi-racial” groups should be considered with caution. The Census definition is likely not exactly aligned with how the City collects this data. For that reason, we do not recommend calculating population rates for these groups.
Gender * The City collects information on gender identity using these guidelines.
Skilled Nursing Facility (SNF) occupancy * A Skilled Nursing Facility (SNF) is a type of long-term care facility that provides care to individuals, generally in their 60s and older, who need functional assistance in their daily lives. * This dataset includes data for COVID-19 cases reported in Skilled Nursing Facilities (SNFs) through 12/31/2022, archived on 1/5/2023. These data were identified where “Characteristic_Type” = ‘Skilled Nursing Facility Occupancy’.
Sexual orientation * The City began asking adults 18 years old or older for their sexual orientation identification during case interviews as of April 28, 2020. Sexual orientation data prior to this date is unavailable. * The City doesn’t collect or report information about sexual orientation for persons under 12 years of age. * Case investigation interviews transitioned to the California Department of Public Health, Virtual Assistant information gathering beginning December 2021. The Virtual Assistant is only sent to adults who are 18+ years old. https://www.sfdph.org/dph/files/PoliciesProcedures/COM9_SexualOrientationGuidelines.pdf">Learn more about our data collection guidelines pertaining to sexual orientation.
Comorbidities * Underlying conditions are reported when a person has one or more underlying health conditions at the time of diagnosis or death.
Homelessness Persons are identified as homeless based on several data sources: * self-reported living situation * the location at the time of testing * Department of Public Health homelessness and health databases * Residents in Single-Room Occupancy hotels are not included in these figures. These methods serve as an estimate of persons experiencing homelessness. They may not meet other homelessness definitions.
Single Room Occupancy (SRO) tenancy * SRO buildings are defined by the San Francisco Housing Code as having six or more "residential guest rooms" which may be attached to shared bathrooms, kitchens, and living spaces. * The details of a person's living arrangements are verified during case interviews.
Transmission Type * Information on transmission of COVID-19 is based on case interviews with individuals who have a confirmed positive test. Individuals are asked if they have been in close contact with a known COVID-19 case. If they answer yes, transmission category is recorded as contact with a known case. If they report no contact with a known case, transmission category is recorded as community transmission. If the case is not interviewed or was not asked the question, they are counted as unknown.
C. UPDATE PROCESS This dataset has been archived and will no longer update as of 9/11/2023.
D. HOW TO USE THIS DATASET Population estimates are only available for age groups and race/ethnicity categories. San Francisco population estimates for race/ethnicity and age groups can be found in a view based on the San Francisco Population and Demographic Census dataset. These population estimates are from the 2016-2020 5-year American Community Survey (ACS).
This dataset includes many different types of characteristics. Filter the “Characteristic Type” column to explore a topic area. Then, the “Characteristic Group” column shows each group or category within that topic area and the number of cases on each date.
New cases are the count of cases within that characteristic group where the positive tests were collected on that specific specimen collection date. Cumulative cases are the running total of all San Francisco cases in that characteristic group up to the specimen collection date listed.
This data may not be immediately available for recently reported cases. Data updates as more information becomes available.
To explore data on the total number of cases, use the ARCHIVED: COVID-19 Cases Over Time dataset.
E. CHANGE LOG
This dataset contains aggregate data on violent index victimizations at the quarter level of each year (i.e., January – March, April – June, July – September, October – December), from 2001 to the present (1991 to present for Homicides), with a focus on those related to gun violence. Index crimes are 10 crime types selected by the FBI (codes 1-4) for special focus due to their seriousness and frequency. This dataset includes only those index crimes that involve bodily harm or the threat of bodily harm and are reported to the Chicago Police Department (CPD). Each row is aggregated up to victimization type, age group, sex, race, and whether the victimization was domestic-related. Aggregating at the quarter level provides large enough blocks of incidents to protect anonymity while allowing the end user to observe inter-year and intra-year variation. Any row where there were fewer than three incidents during a given quarter has been deleted to help prevent re-identification of victims. For example, if there were three domestic criminal sexual assaults during January to March 2020, all victims associated with those incidents have been removed from this dataset. Human trafficking victimizations have been aggregated separately due to the extremely small number of victimizations.
This dataset includes a " GUNSHOT_INJURY_I " column to indicate whether the victimization involved a shooting, showing either Yes ("Y"), No ("N"), or Unknown ("UKNOWN.") For homicides, injury descriptions are available dating back to 1991, so the "shooting" column will read either "Y" or "N" to indicate whether the homicide was a fatal shooting or not. For non-fatal shootings, data is only available as of 2010. As a result, for any non-fatal shootings that occurred from 2010 to the present, the shooting column will read as “Y.” Non-fatal shooting victims will not be included in this dataset prior to 2010; they will be included in the authorized dataset, but with "UNKNOWN" in the shooting column.
The dataset is refreshed daily, but excludes the most recent complete day to allow CPD time to gather the best available information. Each time the dataset is refreshed, records can change as CPD learns more about each victimization, especially those victimizations that are most recent. The data on the Mayor's Office Violence Reduction Dashboard is updated daily with an approximately 48-hour lag. As cases are passed from the initial reporting officer to the investigating detectives, some recorded data about incidents and victimizations may change once additional information arises. Regularly updated datasets on the City's public portal may change to reflect new or corrected information.
How does this dataset classify victims?
The methodology by which this dataset classifies victims of violent crime differs by victimization type:
Homicide and non-fatal shooting victims: A victimization is considered a homicide victimization or non-fatal shooting victimization depending on its presence in CPD's homicide victims data table or its shooting victims data table. A victimization is considered a homicide only if it is present in CPD's homicide data table, while a victimization is considered a non-fatal shooting only if it is present in CPD's shooting data tables and absent from CPD's homicide data table.
To determine the IUCR code of homicide and non-fatal shooting victimizations, we defer to the incident IUCR code available in CPD's Crimes, 2001-present dataset (available on the City's open data portal). If the IUCR code in CPD's Crimes dataset is inconsistent with the homicide/non-fatal shooting categorization, we defer to CPD's Victims dataset.
For a criminal homicide, the only sensible IUCR codes are 0110 (first-degree murder) or 0130 (second-degree murder). For a non-fatal shooting, a sensible IUCR code must signify a criminal sexual assault, a robbery, or, most commonly, an aggravated battery. In rare instances, the IUCR code in CPD's Crimes and Victims dataset do not align with the homicide/non-fatal shooting categorization:
Other violent crime victims: For other violent crime types, we refer to the IUCR classification that exists in CPD's victim table, with only one exception:
Note: All businesses identified as victims in CPD data have been removed from this dataset.
Note: The definition of “homicide” (shooting or otherwise) does not include justifiable homicide or involuntary manslaughter. This dataset also excludes any cases that CPD considers to be “unfounded” or “noncriminal.”
Note: In some instances, the police department's raw incident-level data and victim-level data that were inputs into this dataset do not align on the type of crime that occurred. In those instances, this dataset attempts to correct mismatches between incident and victim specific crime types. When it is not possible to determine which victims are associated with the most recent crime determination, the dataset will show empty cells in the respective demographic fields (age, sex, race, etc.).
Note: The initial reporting officer usually asks victims to report demographic data. If victims are unable to recall, the reporting officer will use their best judgment. “Unknown” can be reported if it is truly unknown.
Dataset, GDB, and Online Map created by Renee Haley, NMCDC, May 2023 DATA ACQUISITION PROCESS
Scope and purpose of project: New Mexico is struggling to maintain its healthcare workforce, particularly in Rural areas. This project was undertaken with the intent of looking at flows of healthcare workers into and out of New Mexico at the most granular geographic level possible. This dataset, in combination with others (such as housing cost and availability data) may help us understand where our healthcare workforce is relocating and why.
The most relevant and detailed data on workforce indicators in the United States is housed by the Census Bureau's Longitudinal Employer-Household Dynamics, LEHD, System. Information on this system is available here:
The Job-to-Job flows explorer within this system was used to download the data. Information on the J2J explorer can ve found here:
https://j2jexplorer.ces.census.gov/explore.html#1432012
The dataset was built from data queried with the LED Extraction Tool, which allows for the query of more intersectional and detailed data than the explorer. This is a link to the LED extraction tool:
https://ledextract.ces.census.gov/
The geographies used are US Metro areas as determined by the Census, (N=389). The shapefile is named lehd_shp_gb.zip, and can be downloaded under this section of the following webpage: 5.5. Job-to-Job Flow Geographies, 5.5.1. Metropolitan (Complete). A link to the download site is available below:
https://lehd.ces.census.gov/data/schema/j2j_latest/lehd_shapefiles.html
DATA CLEANING PROCESS
This dataset was built from 8 non intersectional datasets downloaded from the LED Extraction Tool.
Separate datasets were downloaded in order to obtain detailed information on the race, ethnicity, and educational attainment levels of healthcare workers and where they are migrating.
Datasets included information for the four separate quarters of 2021. It was not possible to download annual data, only quarterly. Quarterly data was summed in a later step to derive annual totals for 2021.
4 datasets for healthcare workers moving OUT OF New Mexico, with details on race, ethnicity, and educational attainment, were downloaded. 1 contained information on educational attainment, 2 contained information on 7 racial categories identifying as non- Hispanic, 3 contained information on those same 7 categories also identifying as Hispanic, and 4 contained information for workers identifying as white and Hispanic.
4 datasets for healthcare worker moving INTO New Mexico, with details on race, ethnicity, and educational attainment, were downloaded with the same details outlined above.
Each dataset was cleaned according to Data Template which kept key attributes and discarded excess information. Within each dataset, the J2J Indicators reflecting 6 different types of job migration were totaled in order to simplify analysis, as this information was not needed in detail.
After cleaning, each set of 4 datasets for workers moving INTO New Mexico were joined. The process was repeated for workers moving OUT OF New Mexico. This resulted 2 main datasets.
These 2 main datasets still listed all of the variables by each quarter of 2021. Because of this the data was split in JMP, so that attributes of educational attainment, race and ethnicity, of workers migrating by quarter were moved from rows to columns. After this, summary columns for the year of 2021 were derived. This resulted in totals columns for workers identifying as: 6 separate races and all ethnicities, all races and Hispanic, white-Hispanic, and workers of 6 different education levels, reflecting how many workers of each indicator migrated to and from metro areas in New Mexico in 2021.
The data split transposed duplicate rows reflecting differing worker attributes within the same metro area, resulting in one row for each metro area and reflecting the attributes in columns, thus resulting in a mappable dataset.
The 2 datasets were joined (on Metro Area) resulting in one master file containing information on healthcare workers entering and leaving New Mexico.
Rows (N=389) reflect all of the metro areas across the US, and each state. Rows include the 5 metro areas within New Mexico, and New Mexico State.
Columns (N=99) contain information on worker race, ethnicity and educational attainment, specific to each metro area in New Mexico.
78 of these rows reflect workers of specific attributes moving OUT OF the 5 specific Metro Areas in New Mexico and totals for NM State. This level of detail is intended for analyzing who is leaving what area of New Mexico, where they are going to, and why.
13 Columns reflect each worker attribute for healthcare workers moving INTO New Mexico by race, ethnicity and education level. Because all 5 metro areas and New Mexico state are contained in the rows, this information for incoming workers is available by metro area and at the state level - there is less possability for mapping these attributes since it was not realistic or possible to create a dataset reflecting all of these variables for every healthcare worker from every metro area in the US also coming into New Mexico (that dataset would have over 1,000 columns and be unmappable). Therefore this dataset is easier to utilize in looking at why workers are leaving the state but also includes detailed information on who is coming in.
The remaining 8 columns contain geographic information.
GIS AND MAPPING PROCESS
The master file was opened in Arc GIS Pro and the Shapefile of US Metro Areas was also imported
The excel file was joined to the shapefile by Metro Area Name as they matched exactly
The resulting layer was exported as a GDB in order to retain null values which would turn to zeros if exported as a shapefile.
This GDB was uploaded to Arc GIS Online, Aliases were inserted as column header names, and the layer was visualized as desired.
SYSTEMS USED
MS Excel was used for data cleaning, summing NM state totals, and summing quarterly to annual data.
JMP was used to transpose, join, and split data.
ARC GIS Desktop was used to create the shapefile uploaded to NMCDC's online platform.
VARIABLE AND RECODING NOTES
Summary of variables selected for datasets downloaded focused on educational attainment:
J2J Flows by Educational Attainment
Summary of variables selected for datasets downloaded focused on race and ethnicity:
J2J Flows by Race and Ethnicity
Note: Variables in Datasets 1 through 4 downloaded twice, once for workers coming into New Mexico and once for those leaving NM. VARIABLE: LEHD VARIABLE DEFINITION LEHD VARIABLE NOTES DETAILS OR URL FOR RAW DATA DOWNLOAD
Geography Type - State Origin and Destination State
Data downloaded for worker migration into and out of all US States
Geography Type - Metropolitan Areas Origin and Dest Metro Area
Data downloaded for worker migration into and out of all US Metro Areas
NAICS sectors North American Industry Classification System Under Firm Characteristics Only downloaded for Healthcare and Social Assistance Sectors
Other Firm Characteristics No Firm Age / Size Detail Under Firm Characteristics Downloaded data on all firm ages, sizes, and other details.
Worker Characteristics Education, Race, Ethnicity
Non Intersectional data aside from Race / Ethnicity data.
Sex Gender
0 - All Sexes Selected
Age Age
A00 All Ages (14-99)
Education Education Level E0, E1, E2, E3, 34, E5 E0 - All Education Categories, E1 - Less than high school, E2 - High school or equivalent, no college, E3 - Some college or Associate’s degree, E4 - Bachelor's degree or advanced degree, E5 - Educational attainment not available (workers aged 24 or younger)
Dataset 1 All Education Levels, E1, E2, E3, E4, and E5
RACE
A0, A1, A2, A3, A4, A5 OPTIONS: A0 All Races, A1 White Alone, A2 Black or African American Alone, A3 American Indian or Alaska Native Alone, A4 Asian Alone, A5 Native Hawaiian or Other Pacific Islander Alone, SDA7 Two or More Race Groups
ETHNICITY
A0, A1, A2 OPTIONS: A0 All Ethnicities, A1 Not Hispanic or Latino, A2 Hispanic or Latino
Dataset 2 All Races (A0) and All Ethnicities (A0)
Dataset 3 6 Races (A1 through A5) and All Ethnicities (A0)
Dataset 4 White (A1) and Hispanic or Latino (A1)
Quarter Quarter and Year
Data from all quarters of 2021 to sum into annual numbers; yearly data was not available
Employer type Sector: Private or Governmental
Query included all healthcare sector workflows from all employer types and firm sizes from every quarter of 2021
J2J indicator categories Detailed types of job migration
All options were selected for all datasets and totaled: AQHire, AQHireS, EE, EES, J2J, J2JS. Counts were selected vs. earnings, and data was not seasonally adjusted (unavailable).
NOTES AND RESOURCES
The following resources and documentation were used to navigate the LEHD and J2J Worker Flows system and to answer questions about variables:
https://lehd.ces.census.gov/data/schema/j2j_latest/lehd_public_use_schema.html
https://www.census.gov/history/www/programs/geography/metropolitan_areas.html
https://lehd.ces.census.gov/data/schema/j2j_latest/lehd_csv_naming.html
Statewide (New
These data examine the effects on total crime rates of changes in the demographic composition of the population and changes in criminality of specific age and race groups. The collection contains estimates from national data of annual age-by-race specific arrest rates and crime rates for murder, robbery, and burglary over the 21-year period 1965-1985. The data address the following questions: (1) Are the crime rates reported by the Uniform Crime Reports (UCR) data series valid indicators of national crime trends? (2) How much of the change between 1965 and 1985 in total crime rates for murder, robbery, and burglary is attributable to changes in the age and race composition of the population, and how much is accounted for by changes in crime rates within age-by-race specific subgroups? (3) What are the effects of age and race on subgroup crime rates for murder, robbery, and burglary? (4) What is the effect of time period on subgroup crime rates for murder, robbery, and burglary? (5) What is the effect of birth cohort, particularly the effect of the very large (baby-boom) cohorts following World War II, on subgroup crime rates for murder, robbery, and burglary? (6) What is the effect of interactions among age, race, time period, and cohort on subgroup crime rates for murder, robbery, and burglary? (7) How do patterns of age-by-race specific crime rates for murder, robbery, and burglary compare for different demographic subgroups? The variables in this study fall into four categories. The first category includes variables that define the race-age cohort of the unit of observation. The values of these variables are directly available from UCR and include year of observation (from 1965-1985), age group, and race. The second category of variables were computed using UCR data pertaining to the first category of variables. These are period, birth cohort of age group in each year, and average cohort size for each single age within each single group. The third category includes variables that describe the annual age-by-race specific arrest rates for the different crime types. These variables were estimated for race, age, group, crime type, and year using data directly available from UCR and population estimates from Census publications. The fourth category includes variables similar to the third group. Data for estimating these variables were derived from available UCR data on the total number of offenses known to the police and total arrests in combination with the age-by-race specific arrest rates for the different crime types.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Context
The dataset presents median income data over a decade or more for males and females categorized by Total, Full-Time Year-Round (FT), and Part-Time (PT) employment in May township. It showcases annual income, providing insights into gender-specific income distributions and the disparities between full-time and part-time work. The dataset can be utilized to gain insights into gender-based pay disparity trends and explore the variations in income for male and female individuals.
Key observations: Insights from 2023
Based on our analysis ACS 2019-2023 5-Year Estimates, we present the following observations: - All workers, aged 15 years and older: In May township, the median income for all workers aged 15 years and older, regardless of work hours, was $77,845 for males and $40,227 for females.
These income figures highlight a substantial gender-based income gap in May township. Women, regardless of work hours, earn 52 cents for each dollar earned by men. This significant gender pay gap, approximately 48%, underscores concerning gender-based income inequality in the township of May township.
- Full-time workers, aged 15 years and older: In May township, among full-time, year-round workers aged 15 years and older, males earned a median income of $136,250, while females earned $91,750, leading to a 33% gender pay gap among full-time workers. This illustrates that women earn 67 cents for each dollar earned by men in full-time roles. This level of income gap emphasizes the urgency to address and rectify this ongoing disparity, where women, despite working full-time, face a more significant wage discrepancy compared to men in the same employment roles.Remarkably, across all roles, including non-full-time employment, women displayed a similar gender pay gap percentage. This indicates a consistent gender pay gap scenario across various employment types in May township, showcasing a consistent income pattern irrespective of employment status.
When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates. All incomes have been adjusting for inflation and are presented in 2023-inflation-adjusted dollars.
Gender classifications include:
Employment type classifications include:
Variables / Data Columns
Good to know
Margin of Error
Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.
Custom data
If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.
Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.
This dataset is a part of the main dataset for May township median household income by race. You can refer the same here
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This data contains popular baby names in New York .
Dataset :- 1 file (popular-baby-names.csv)
Columns - Year of Birth : Year of the baby's birth. - Gender : Gender of the baby. - Ethnicity : Types of ethnicity they belong to. - Child's First Name : The first name of the child. - Count : How many babies were named . - Ranking : Ranking of that name.
This dataset contains summary table data of information from the provincial Use of Force Reports and occurrences that resulted in an enforcement action. The data used to produce these summary data comes from two sources: a) information about enforcement actions, such as calls for service types and occurrence categories, come from the Service's Records Management System and b) information related to reported use of force, such as highest types of force and perceived weapons, comes from the provincial use of force reports. The data counts unique occurrences which resulted in a police enforcement action or incidents of reported use of force. Hence, there may be more than one person and more than one officer involved in enforcement action incident or reported use of force incident. Since the summary tables are of incidents, where there was more than one person, descriptors such as perceived race refer to the composition of person(s) involved in the enforcement action incident. For example, if the incident involved more than one person, each perceived to be of a different race or gender group, then the incident is categorized as a “multiple race group.” For the purpose of the race-based data analysis, the data includes all incidents which resulted in a police enforcement action and excludes other police interactions with the public, such as taking victim reports, routine traffic or pedestrian stops, or outreach events. Enforcement actions are occurrences where person(s) involved were arrested resulting in charges (including released at scene) or released without charges; received Provincial Offences Act Part III tickets; summons; cautions; diversions; apprehensions, mental health-related incidents as well as those identified as “subject” or “suspect” in an incident to which an officer attended. Reported use of force incident are those in which a Toronto Police Service officer used force and are required to submit a report under the Police Services Act, 1990. For the purposes of the race-based data analysis, it excludes reportable incidents in which force was used against animals, team reports, and incidents where an officer unintentionally discharged a Service weapon during training. Each reported use of force incident is counted once, regardless of the number of officers or subjects involved.
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
The US Family Budget Dataset provides insights into the cost of living in different US counties based on the Family Budget Calculator by the Economic Policy Institute (EPI).
This dataset offers community-specific estimates for ten family types, including one or two adults with zero to four children, in all 1877 counties and metro areas across the United States.
If you find this dataset valuable, don't forget to hit the upvote button! 😊💝
Employment-to-Population Ratio for USA
Productivity and Hourly Compensation
USA Unemployment Rates by Demographics & Race
Photo by Alev Takil on Unsplash
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Scientific conferences incorporate diversity-focused events into their programming to increase their diversity and inclusivity and to improve the conference experience for scientists from underrepresented groups (URGs). While simply adding diversity-focused events to conferences is positive, maximizing their impact requires that conferences organizeand schedule these events to minimize well-acknowledged, problematic patterns such as the minority tax. To our knowledge, the programming of diversity-focused events at conferences has not been systematically reviewed to identify the extent of these shortcomings and how they can be addressed. This dataset describes temporal trends in the types of diversity-focused events held at biology conferences, the targeted audiences of those events, and scheduling conflicts that occur with each event. Methods Time-series: We gathered publicly available conference programs for the selected biology conferences (Table 1) for the years 2010 through 2019. Not all conferences had programs available for all years, particularly as time from the present increased, thus sample sizes varied across the time series from 17 to 28. Programs were searched for diversity-focused events by both reading through the entire program and conducting keyword searches. The following keywords were used: diversity, gender, female, woman, women, black, race, ethnic*, minorit*, inclusiv*, LGBT*, where asterisks indicate wild-card search terms. For each program, we first scored (yes/no) on whether there were any diversity-focused events. We then scored whether each event was “women-focused” - where the event was specific to women; “ethnic/racial minority groups-focused” – where the event was specific to any URG based on ethnicity and/or race; and/or “LGBTQ+-focused” - where the event was specific to any part of the LGBTQ+ community. Using these scores, we calculated for each calendar year the percent of conferences with (1) any kind of diversity-focused event, (2) women-focused events, (3) ethnic/racial minorities-focused events, and (4) LGBTQ+-focused events. Table 1. Biology conferences were acquired from a list of societies affiliated with the American Association for the Advancement of Science (https://www.aaas.org/group/60/list-aaas-affiliates). We included a conference if its primary focus was on the biological sciences, regardless of whether the conference was hosted by an academic, professional, or not-for-profit organization. Recent publicly available conference programs were used to examine how conferences incorporated diversity-focused events into their schedules.
Society/Conference
Year analyzed
Society/Conference
Year analyzed
American Dairy Science Association
2018
Ecological Society of America
2019
American Ornithological Society
2018
Entomological Society of America
2018
American Physiological Society
2018
International Biometrics Society - Eastern North America
2018
American Phytopathological Society
2018
Microscopy Society of America
2018
American Society for Horticultural Science
2018
Mycological Society of America
2017
American Society for Microbiology
2019
Phycological Society of America
2019
American Society of Agronomy
2018
Poultry Science Association
2018
American Society of Mammalogists
2018
Society for Integrative and Comparative Biology
2018
American Society of Plant Biologists
2019
Society for Neuroscience
2018
Animal Behavior Society
2019
Society for the Study of Evolution
2018
Association for the Sciences of Limnology and Oceanography - Ocean Sciences Meeting
2018
Society of American Foresters
2019
Association of Southeastern Biologists
2018
Society of Toxicology
2018
Behavior Genetics Association
2018
The Wildlife Society
2018
Biophysical Society
2018
Weed Science Society of America
2018
Botanical Society of America
2018
Survey of event-scheduling and targeted audiences: Using one recent program from each conference (years 2017 through 2019), we searched for diversity-focused events by both reading through the entire program and conducting keyword searches. The keywords used are listed above in the Time Series section. From these searches, we found 87 diversity-focused events from 21 out of the 29 conferences. Target audience: For each conference, we used the title and any other description of the event to classify the targeted audience as either an underrepresented group (URG) or the broader conference community. For example, events with titles such as “Inclusive Teaching Workshop” were classified as broadly targeted, whereas events with titles such as “Minority Social” were classified as URG-targeted. However, if any event contained the explicit statement that “all are welcome” (or similar), the event was classified as targeted at the broader conference community. Event format: We also used the titles and other event descriptions to classify the formats of events. Events were classified as socials, workshops, symposia, plenary lectures, forums and town halls, orientations, or poster sessions. The most common events were socials, workshops, and symposia (e.g., “LGBTQ+ Networking Event and Social”, “Workshop for Creating an Inclusive Research Environment”, and “Symposium Honoring the Roles of Women in Microbiology”, respectively). Breaks or scientific sessions: We used the conference schedule to identify whether each diversity-focused event occurred during a scheduled break versus the main scientific sessions. We defined a break as a period that was either explicitly labeled as a break (e.g., lunch, dinner) or occurred outside the daily start or end of conference-wide scientific events, which included workshops, plenary lectures, poster sessions, and contributed oral presentations. Number of conflicting events: We used the conference schedule to count the number of events that overlapped with each diversity-focused event for more than 15 minutes. Events were only counted as separate events if they occurred in separate rooms. “Business” events and other closed, invitation-only events were not included in this calculation. Overlap for an average conference event: Because the baseline number of overlapping events can vary with the size of a conference, we conducted a randomized survey to calculate how many events overlapped with an “average” event at a conference. For each day of a conference, we used a random number generator to identify a single hour with conference activity and counted the number of overlapping events within the first 15 minutes of that hour. The number of events conflicting with an average event was calculated as the total number of overlapping events minus 1. This number was averaged across the different days for each conference. To validate our randomized survey, we also contacted the organizers of each conference to request attendance numbers for the surveyed years - 15 conferences provided this information. Conflict with an average event was strongly correlated with the size of the conference, thus, we concluded that our method of random surveys was a reliable method for quantifying how busy a conference was.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Summary
Facial expression is among the most natural methods for human beings to convey their emotional information in daily life. Although the neural mechanism of facial expression has been extensively studied employing lab-controlled images and a small number of lab-controlled video stimuli, how the human brain processes natural facial expressions still needs to be investigated. To our knowledge, this type of data specifically on large number of natural facial expression videos is currently missing. We describe here the natural Facial Expressions Dataset (NFED), a fMRI dataset including responses to 1,320 short (3-second) natural facial expression video clips. These video clips is annotated with three types of labels: emotion, gender, and ethnicity, along with accompanying metadata. We validate that the dataset has good quality within and across participants and, notably, can capture temporal and spatial stimuli features. NFED provides researchers with fMRI data for understanding of the visual processing of large number of natural facial expression videos.
Data Records
The data, which were structured following the BIDS format53, were accessible at https://openneuro.org/datasets/ds00504754. The “sub-
Stimulus. Distinct folders store the stimuli for distinct fMRI experiments: "stimuli/face-video", "stimuli/floc", and "stimuli/prf" (Fig. 2b). The category labels and metadata corresponding to video stimuli are stored in the "videos-stimuli_category_metadata.tsv”. The “videos-stimuli_description.json” file describes category and metadata information of video stimuli(Fig. 2b).
Raw MRI data. Each participant's folder is comprised of 11 session folders: “sub-
Volume data from pre-processing. The pre-processed volume-based fMRI data were in the folder named “pre-processed_volume_data/sub-
Surface data from pre-processing. The pre-processed surface-based data were stored in a file named “volumetosurface/sub-
FreeSurfer recon-all. The results of reconstructing the cortical surface were saved as “recon-all-FreeSurfer/sub-
Surface-based GLM analysis data. We have conducted GLMsingle on the data of the main experiment. There is a file named “sub--
Validation. The code of technical validation was saved in the “derivatives/validation/code” folder. The results of technical validation were saved in the “derivatives/validation/results” folder (Fig. 2h). “README.md” describes the detailed information of code and results.
In early February 2024, we will be retiring the Mpox Vaccinations Given to SF Residents by Demographics dataset. This dataset will be archived and no longer update. A historic record of this data will remain available.
A. SUMMARY This dataset represents doses of mpox vaccine (JYNNEOS) administered in California to residents of San Francisco ages 18 years or older. This dataset only includes doses of the JYNNEOS vaccine given on or after 5/1/2022. All vaccines given to people who live in San Francisco are included, no matter where the vaccination took place. The data are broken down by multiple demographic stratifications.
B. HOW THE DATASET IS CREATED Information on doses administered to those who live in San Francisco is from the California Immunization Registry (CAIR2), run by the California Department of Public Health (CDPH). Information on individuals’ city of residence, age, race, ethnicity, and sex are recorded in CAIR2 and are self-reported at the time of vaccine administration. Because CAIR2 does not include information on sexual orientation, we pull information from the San Francisco Department of Public Health’s Epic Electronic Health Record (EHR). The populations represented in our Epic data and the CAIR2 data are different. Epic data only include vaccinations administered at SFDPH managed sites to SF residents.
Data notes for population characteristic types are listed below.
Age * Data only include individuals who are 18 years of age or older.
Race/ethnicity * The response option "Other Race" is categorized by the data source system, and the response option "Unknown" refers to a lack of data.
Sex * The response option "Other" is categorized by the source system, and the response option "Unknown" refers to a lack of data.
Sexual orientation * The response option “Unknown/Declined” refers to a lack of data or individuals who reported multiple different sexual orientations during their most recent interaction with SFDPH.
For convenience, we provide the 2020 5-year American Community Survey population estimates.
C. UPDATE PROCESS Updated daily via automated process.
D. HOW TO USE THIS DATASET This dataset includes many different types of demographic groups. Filter the “demographic_group” column to explore a topic area. Then, the “demographic_subgroup” column shows each group or category within that topic area and the total count of doses administered to that population subgroup.
E. CHANGE LOG