NOTE: This dataset replaces a previous one. Please see below. Chicago residents who are up to date with COVID-19 vaccines by ZIP Code, based on the reported home address and age group of the person vaccinated, as provided by the medical provider in the Illinois Comprehensive Automated Immunization Registry Exchange (I-CARE). “Up to date” refers to individuals who meet the CDC’s updated COVID-19 vaccination criteria based on their age and prior vaccination history. For surveillance purposes, up to date is defined based on the following criteria: People ages 5 years and older: · Are up to date when they receive 1+ doses of a COVID-19 vaccine during the current season. Children ages 6 months to 4 years: · Children who have received at least two prior COVID-19 vaccine doses are up to date when they receive one additional dose of COVID-19 vaccine during the current season, regardless of vaccine product. · Children who have received only one prior COVID-19 vaccine dose are up to date when they receive one additional dose of the current season's Moderna COVID-19 vaccine or two additional doses of the current season's Pfizer-BioNTech COVID-19 vaccine. · Children who have never received a COVID-19 vaccination are up to date when they receive either two doses of the current season's Moderna vaccine or three doses of the current season's Pfizer-BioNTech vaccine. This dataset takes the place of a previous dataset, which covers doses administered from December 15, 2020 through September 13, 2023 and is marked as historical: - https://data.cityofchicago.org/Health-Human-Services/COVID-19-Vaccinations-by-ZIP-Code/553k-3xzc. Data Notes: Weekly cumulative totals of people up to date are shown for each combination ZIP Code and age group. Note there are rows where age group is "All ages" so care should be taken when summing rows. Coverage percentages are calculated based on the cumulative number of people in each ZIP Code and age group who are considered up to date as of the week ending date divided by the estimated number of people in that subgroup. Population counts are obtained from the 2020 U.S. Decennial Census. For ZIP Codes mostly outside Chicago, coverage percentages are not calculated reliable Chicago-only population counts are not available. Actual counts may exceed population estimates and lead to coverage estimates that are greater than 100%, especially in smaller ZIP Codes with smaller populations. Additionally, the medical provider may report a work address or incorrect home address for the person receiving the vaccination, which may lead to over- or underestimation of vaccination coverage by geography. All coverage percentages are capped at 99%. Weekly cumulative counts and coverage percentages are reported from the week ending Saturday, September 16, 2023 onward through the Saturday prior to the dataset being updated. All data are provisional and subject to change. Information is updated as additional details are received and it is, in fact, very common for recent dates to be incomplete and to be updated as time goes on. At any given time, this dataset reflects data currently known to CDPH. Numbers in this dataset may differ from other public sources due to when data are reported and how City of Chicago boundaries are defined. The Chicago Department of Public Health uses the most complete data available to estimate COVID-19 vaccination coverage among Chicagoans, but there are several limitations that impact our estimates. Individuals may receive vaccinations that are not recorded in the Illinois immunization registry, I-CARE, such as those administered in another state, causing underestimation of the number individuals who are up to date. Inconsistencies in records of separate doses administered to the same person, such as slight variations in dates of birth, can result in duplicate records for a person and underestimate the number of people who are up to date. For all datasets related to COVID-19, please
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset contains the code for Relevance and Redundancy ranking; a an efficient filter-based feature ranking framework for evaluating relevance based on multi-feature interactions and redundancy on mixed datasets.Source code is in .scala and .sbt format, metadata in .xml, all of which can be accessed and edited in standard, openly accessible text edit software. Diagrams are in openly accessible .png format.Supplementary_2.pdf: contains the results of experiments on multiple classifiers, along with parameter settings and a description of how KLD converges to mutual information based on its symmetricity.dataGenerator.zip: Synthetic data generator inspired from NIPS: Workshop on variable and feature selection (2001), http://www.clopinet.com/isabelle/Projects/NIPS2001/rar-mfs-master.zip: Relevance and Redundancy Framework containing overview diagram, example datasets, source code and metadata. Details on installing and running are provided below.Background. Feature ranking is benfiecial to gain knowledge and to identify the relevant features from a high-dimensional dataset. However, in several datasets, few features by themselves might have small correlation with the target classes, but by combining these features with some other features, they can be strongly correlated with the target. This means that multiple features exhibit interactions among themselves. It is necessary to rank the features based on these interactions for better analysis and classifier performance. However, evaluating these interactions on large datasets is computationally challenging. Furthermore, datasets often have features with redundant information. Using such redundant features hinders both efficiency and generalization capability of the classifier. The major challenge is to efficiently rank the features based on relevance and redundancy on mixed datasets. In the related publication, we propose a filter-based framework based on Relevance and Redundancy (RaR), RaR computes a single score that quantifies the feature relevance by considering interactions between features and redundancy. The top ranked features of RaR are characterized by maximum relevance and non-redundancy. The evaluation on synthetic and real world datasets demonstrates that our approach outperforms several state of-the-art feature selection techniques.# Relevance and Redundancy Framework (rar-mfs) rar-mfs is an algorithm for feature selection and can be employed to select features from labelled data sets. The Relevance and Redundancy Framework (RaR), which is the theory behind the implementation, is a novel feature selection algorithm that - works on large data sets (polynomial runtime),- can handle differently typed features (e.g. nominal features and continuous features), and- handles multivariate correlations.## InstallationThe tool is written in scala and uses the weka framework to load and handle data sets. You can either run it independently providing the data as an
.arff
or .csv
file or you can include the algorithm as a (maven / ivy) dependency in your project. As an example data set we use heart-c. ### Project dependencyThe project is published to maven central (link). To depend on the project use:- maven xml de.hpi.kddm rar-mfs_2.11 1.0.2
- sbt: sbt libraryDependencies += "de.hpi.kddm" %% "rar-mfs" % "1.0.2"
To run the algorithm usescalaimport de.hpi.kddm.rar._// ...val dataSet = de.hpi.kddm.rar.Runner.loadCSVDataSet(new File("heart-c.csv", isNormalized = false, "")val algorithm = new RaRSearch( HicsContrastPramsFA(numIterations = config.samples, maxRetries = 1, alphaFixed = config.alpha, maxInstances = 1000), RaRParamsFixed(k = 5, numberOfMonteCarlosFixed = 5000, parallelismFactor = 4))algorithm.selectFeatures(dataSet)
### Command line tool- EITHER download the prebuild binary which requires only an installation of a recent java version (>= 6) 1. download the prebuild jar from the releases tab (latest) 2. run java -jar rar-mfs-1.0.2.jar--help
Using the prebuild jar, here is an example usage: sh rar-mfs > java -jar rar-mfs-1.0.2.jar arff --samples 100 --subsetSize 5 --nonorm heart-c.arff Feature Ranking: 1 - age (12) 2 - sex (8) 3 - cp (11) ...
- OR build the repository on your own: 1. make sure sbt is installed 2. clone repository 3. run sbt run
Simple example using sbt directly after cloning the repository: sh rar-mfs > sbt "run arff --samples 100 --subsetSize 5 --nonorm heart-c.arff" Feature Ranking: 1 - age (12) 2 - sex (8) 3 - cp (11) ...
### [Optional]To speed up the algorithm, consider using a fast solver such as Gurobi (http://www.gurobi.com/). Install the solver and put the provided gurobi.jar
into the java classpath. ## Algorithm### IdeaAbstract overview of the different steps of the proposed feature selection algorithm:https://github.com/tmbo/rar-mfs/blob/master/docu/images/algorithm_overview.png" alt="Algorithm Overview">The Relevance and Redundancy ranking framework (RaR) is a method able to handle large scale data sets and data sets with mixed features. Instead of directly selecting a subset, a feature ranking gives a more detailed overview into the relevance of the features. The method consists of a multistep approach where we 1. repeatedly sample subsets from the whole feature space and examine their relevance and redundancy: exploration of the search space to gather more and more knowledge about the relevance and redundancy of features 2. decude scores for features based on the scores of the subsets 3. create the best possible ranking given the sampled insights.### Parameters| Parameter | Default value | Description || ---------- | ------------- | ------------|| m - contrast iterations | 100 | Number of different slices to evaluate while comparing marginal and conditional probabilities || alpha - subspace slice size | 0.01 | Percentage of all instances to use as part of a slice which is used to compare distributions || n - sampling itertations | 1000 | Number of different subsets to select in the sampling phase|| k - sample set size | 5 | Maximum size of the subsets to be selected in the sampling phase|
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Previous studies mainly focus on socio-demographic variables and observable behavior.Our goal was to link these variables with information on lifestyles and personal characteristics. Consequently, the questionnaire revolved around the core research questions “Who is cycling?” and “Why are they cycling?” In order to answer these questions, we collected data in three different categories: personal, behavioral, and motivational.In total, 569 female, 501 male, and 3 non-binary participants completed the survey. The mean age of the participants was 42 years (σ = 12.75) with a range between 7 and 80 years.The age difference between female ( = 40.75, σ = 12.59) and male ( = 43.43, σ = 12.79) participants was highly significant (t = −3.45, p < 0.001). Participants with non-binary gender had an average age of 32 years (σ = 6.16).This subset dataset only includes those records that could be georeferenced using an Austrian ZIP Code.In terms of educational background, the dataset inclined towards highly educated persons; 60.34% of all participants had a university degree, whereas the percentage is 25.18% in the city of Salzburg and 17.0% in the surrounding district (Salzburg-Umgebung) according to official statistics [10]. Participants with compulsory school as highest degree were underrepresented in our sample (0.65% compared to 21.66% and 11.86%, respectively, in the two reference-districts).The majority of respondents were frequent cyclists and among them, 38.40% were using the bicycle more than once a day. In the survey, 2.80% of all participants were non-cyclists. Compared to national and regional modal split statistics, cyclists were overrepresented in the sample. The primary trip purpose of all the respondents was commuting to work, university, or school. Thus, we can conclude that the dataset represented the perspectives of mainly utilitarian cyclists.Further information is available: https://www.mdpi.com/2306-5729/4/4/140
2013 Medicaid figures for southeast Michigan counties of Wayne, Oakland, and Macomb at the ZCTA level. This data represent number of visits, they are not counts of different individuals, only of visits. One person could have had multiple visits and each visit would be counted. Blank cells indicate no visits. Whether a visit is coded as a hospital or ER visit depends on the DRG, Diagnosis-Related Group; that is, the diagnosis.
This dataset includes COVID-19 self-test result data voluntarily reported by users of tests through the MakeMyTestCount website (makemytestcount.org). All fields are self-reported by the user with the exception of fields derived from the self-reported zip code. This dataset will be updated monthly. If there are any questions, please direct them to the data steward, Jasmine Chaitram zoa6@cdc.gov.
This dataset includes the following self-reported data:
- Date (by week)– date of test shown by week starting date
- Age group (years) – age of individual taking the test, categorized into the following: 2-4, 5-11, 12-15, 16-17, 18-29, 30-39, 40-49, 50-64, 65-74, 75+
- Race – race of individual taking the test: American Indian or Alaska Native, Asian, Black, Native Hawaiian or Other Pacific Islander, White, Multiple or Other, missing
- Ethnicity – ethnicity of individual taking the test: Hispanic, Non-Hispanic, missing
- Sex – sex of individual taking the test: male, female, missing
- Test result – positive, negative, inconclusive
The dataset also includes the following columns to support analyses. These columns are based on the self-reported zip code:
- State abbreviation
- State name
- State FIPS code
- FEMA region
Please note that there are limitations with these data, including:
Data are not comprehensive of all self-tests performed. Data represent results voluntarily reported by an individual via the MakeMyTestCount website. These data do not include self-test results that were reported to state and local health departments if they were not also reported through the MakeMyTestCount website. The true denominator (known number of tests completed in the US) cannot be ascertained and reflects a small fraction of the number of self-tests used.
Data are not verified. The quality of specimen, appropriate execution of self-test, result produced, and person tested are unverified; therefore, reported interpretation of results cannot be confirmed. All results and accompanying demographic information are also self-reported and cannot be verified.
Data reports are not complete. Individual submissions vary widely in terms of the data elements collected. Not all data elements are required (only date, age, and zip code), and some results are missing demographic information.
Data are not representative. Based on the limited number of self-reported test results, this dataset is not representative of the use of self-testing by demographic, nor is the dataset inclusive of all self-testing completed within each jurisdiction. This dataset represents a small proportion of overall COVID-19 testing conducted and reported volumes are much lower than testing conducted in point of care and laboratory settings.
Data represent individual test results, not persons tested. Data in this dataset are not linkable and do not allow for analyses around serial testing. Data also cannot be disaggregated to identify multiple reports by the same individual.
All analyses should be completed with these limitations in mind.
For more information about the challenges and opportunities around self-test data, please refer to the following article: Ritchey MD, Rosenblum HG, Del Guercio K, et al. COVID-19 Self-Test Data: Challenges and Opportunities — United States, October 31, 2021–June 11, 2022. MMWR Morb Mortal Wkly Rep 2022;71:1005–1010. DOI: http://dx.doi.org/10.15585/mmwr.mm7132a1
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This layer was developed by the Research & Analytics Group of the Atlanta Regional Commission, using data from the U.S. Census Bureau’s American Community Survey 5-year estimates for 2013-2017, to show population by sex and age by Zip Code Tabulation Area in the Atlanta region.
The user should note that American Community Survey data represent estimates derived from a surveyed sample of the population, which creates some level of uncertainty, as opposed to an exact measure of the entire population (the full census count is only conducted once every 10 years and does not cover as many detailed characteristics of the population). Therefore, any measure reported by ACS should not be taken as an exact number – this is why a corresponding margin of error (MOE) is also given for ACS measures. The size of the MOE relative to its corresponding estimate value provides an indication of confidence in the accuracy of each estimate. Each MOE is expressed in the same units as its corresponding measure; for example, if the estimate value is expressed as a number, then its MOE will also be a number; if the estimate value is expressed as a percent, then its MOE will also be a percent.
The user should also note that for relatively small geographic areas, such as census tracts shown here, ACS only releases combined 5-year estimates, meaning these estimates represent rolling averages of survey results that were collected over a 5-year span (in this case 2013-2017). Therefore, these data do not represent any one specific point in time or even one specific year. For geographic areas with larger populations, 3-year and 1-year estimates are also available.
For further explanation of ACS estimates and margin of error, visit Census ACS website.
Naming conventions:
Prefixes:
None
Count
p
Percent
r
Rate
m
Median
a
Mean (average)
t
Aggregate (total)
ch
Change in absolute terms (value in t2 - value in t1)
pch
Percent change ((value in t2 - value in t1) / value in t1)
chp
Change in percent (percent in t2 - percent in t1)
Suffixes:
None
Change over two periods
_e
Estimate from most recent ACS
_m
Margin of Error from most recent ACS
_00
Decennial 2000
Attributes:
Attributes and definitions available below under "Attributes" section and in Infrastructure Manifest (due to text box constraints, attributes cannot be displayed here). Source: U.S. Census Bureau, Atlanta Regional Commission
Date: 2013-2017
For additional information, please visit the Census ACS website.
A list of all uniform citations from the Louisville Metro Police Department, the CSV file is updated daily, including case number, date, location, division, beat, offender demographics, statutes and charges, and UCR codes can be found in this Link.INCIDENT_NUMBER or CASE_NUMBER links these data sets together:Crime DataUniform Citation DataFirearm intakeLMPD hate crimesAssaulted OfficersCITATION_CONTROL_NUMBER links these data sets together:Uniform Citation DataLMPD Stops DataNote: When examining this data, make sure to read the LMPDCrime Data section in our Terms of Use.AGENCY_DESC - the name of the department that issued the citation CASE_NUMBER - the number associated with either the incident or used as reference to store the items in our evidence rooms and can be used to connect the dataset to the following other datasets INCIDENT_NUMBER: 1. Crime Data 2. Firearms intake 3. LMPD hate crimes 4. Assaulted Officers NOTE: CASE_NUMBER is not formatted the same as the INCIDENT_NUMBER in the other datasets. For example: in the Uniform Citation Data you have CASE_NUMBER 8018013155 (no dashes) which matches up with INCIDENT_NUMBER 80-18-013155 in the other 4 datasets. CITATION_YEAR - the year the citation was issued CITATION_CONTROL_NUMBER - links this LMPD stops data CITATION_TYPE_DESC - the type of citation issued (citations include: general citations, summons, warrants, arrests, and juvenile) CITATION_DATE - the date the citation was issued CITATION_LOCATION - the location the citation was issued DIVISION - the LMPD division in which the citation was issued BEAT - the LMPD beat in which the citation was issued PERSONS_SEX - the gender of the person who received the citation PERSONS_RACE - the race of the person who received the citation (W-White, B-Black, H-Hispanic, A-Asian/Pacific Islander, I-American Indian, U-Undeclared, IB-Indian/India/Burmese, M-Middle Eastern Descent, AN-Alaskan Native) PERSONS_ETHNICITY - the ethnicity of the person who received the citation (N-Not Hispanic, H=Hispanic, U=Undeclared) PERSONS_AGE - the age of the person who received the citation PERSONS_HOME_CITY - the city in which the person who received the citation lives PERSONS_HOME_STATE - the state in which the person who received the citation lives PERSONS_HOME_ZIP - the zip code in which the person who received the citation lives VIOLATION_CODE - multiple alpha/numeric code assigned by the Kentucky State Police to link to a Kentucky Revised Statute. For a full list of codes visit: https://kentuckystatepolice.org/crime-traffic-data/ ASCF_CODE - the code that follows the guidelines of the American Security Council Foundation. For more details visit https://www.ascfusa.org/ STATUTE - multiple alpha/numeric code representing a Kentucky Revised Statute. For a full list of Kentucky Revised Statute information visit: https://apps.legislature.ky.gov/law/statutes/ CHARGE_DESC - the description of the type of charge for the citation UCR_CODE - the code that follows the guidelines of the Uniform Crime Report. For more details visit https://ucr.fbi.gov/ UCR_DESC - the description of the UCR_CODE. For more details visit https://ucr.fbi.gov/
Mapping Layer Data Released: 06/15/2017, | Last Updated 04/20/2024Data Currency: This data is checked semi-annually from it's enterprise federal source fo 2010 CENSUS Data and will support mapping, analysis, data exports and the Open Geospatial Consortium (OGC) Application Programming Interface (API).Data Update Frequency: Twice, YearlyData Cycle | History (as required below)QA/QC Performed: December, 2024Next Scheduled Data QA/QC: July, 2024CDC PLACES (2010 CENSUS) FEATURE LAYERData Requester: Rhode Island Executive Office of Health and Human Service (OHHS) via Health Equity Institute (HEI).Data Requester: Rhode Island Department of Health, Maternal Child Health via Health Equity Institute (HEI).Data Request: Provide a database deliverable via download that contains both US CENSUS tracts and USPS Zip Code Tabulation Areas (ZCTA).HEALTH EQUITY INSTITUTE DATA CONNECT RI Using Modern GIS (Mapping)🡅 Click IT 🡅Facilitate transformative mapping visualizations that engage constituents and measure the impact of real-world solutions.Instructions to Join Your Data Provided Below STEP 1: Video (Pending)STEP 2: Video (Pending)STEP 3: Video (Pending)There are twenty-two U.S. CENSUS fields (download here) that you can join to your datasets. For additional insight, please contact the Center for Health Data and Analysis (CHDA) Rhode Island Department of Health (GIS) Mapping Department for assistance.Database Enhancement: This database contains two (2) additional data fields for consideration to be added to the existing 2020 State of Rhode Island Health Equity Map.Zip Code Tabulation Area (ZCTA)ZCTA/Tract Relationship (Singular ZCTAs per Tract, versus Multiple ZCTAs per Tract)Additional Information: While ZCTAs can be useful for certain qualitative purposes, such as broad or general high level analysis, they may not provide the level of granularity and accuracy required for in-depth demographic research which is required for policy mapping. ZCTAs can change frequently as the US Postal Service (USPS) adjusts postal routes and boundaries. These changes can lead to inconsistencies and challenges in tracking demographic trends and making accurate comparisons over time.RIDOH GIS encourages analysts to make the appropriate choice of using census based data, with their consistent boundaries readily available for suitability for spatial analysis when conducting detailed demographic research.Here are a few reasons why you might want to consider using census based data (tracts, block groups, and blocks) instead of ZCTAs:1. Inaccurate Representations: ZCTAs are not designed for statistical analysis or demographic research. They are created by the United States Postal Service (USPS) for efficient mail delivery and can often span multiple cities, counties, or even states. As a result, ZCTAs may not accurately represent the actual geographic boundaries or demographic characteristics of a specific area.2. Lack of Granularity: ZCTAs are typically larger than census tracts, which are smaller, more homogeneous geographic units defined by the U.S. Census Bureau. Census tracts are designed to be relatively consistent in terms of population size, allowing for more detailed analysis at a local level. ZCTAs, on the other hand, can vary significantly in terms of population size, making it challenging to draw precise conclusions about specific neighborhoods or communities.3. Data Availability and Compatibility: Census tracts are used by the U.S. Census Bureau to collect and report demographic data. Consequently, a wide range of demographic information, such as population counts, age distribution, income levels, and education levels, is readily available at the census tract level. In contrast, data specifically tailored to ZCTAs may be more limited, making it difficult to obtain comprehensive and consistent data for demographic analysis.4. Changes Over Time: Census tracts are relatively stable over time, allowing for consistent longitudinal analysis. ZCTAs, however, can change frequently as the USPS adjusts postal routes and boundaries. These changes can lead to inconsistencies and challenges in tracking demographic trends and making accurate comparisons over time.5. Spatial Analysis: Census tracts are designed to maintain a level of spatial proximity, adjacency, or connectedness of these data containers while providing consistency and continuity over time - making them useful for spatial analysis. Mapping. ZCTAs, on the other hand, may not exhibit the same level of spatial coherence due to their primary purpose being mail delivery efficiency rather than geographic representation.State Agencies - Contact RIDOH GIS - Learn More About Mapping Data Available at the Census Tract LevelRIDOH GIS releases this database with the caveats noted above and that the researcher can accurately align the ZCTAs with the corresponding census tracts. Careful consideration should be given to the comparability and compatibility of the data collected at different geographic levels to ensure valid and meaningful statistical conclusions. Data Dictionary: 2010 Decennial CensusOBJECT ID - the count of each census tract entity.GEOID (10) STATE,COUNTY,TRACT - Numeric US CENSUS Tract Description (2010) HEZ (10) - Health Equity Zone (2020)LOCATION (10) - Plain Language Census Tract Descriptor (2010)COUNTY (10) NAME - County Name (2010)STATE (10) NAME - State Name (2010)ZCTA (23) - Zip Code Tabulation Area - Numeric US CENSUS ZCTA Description (2023)ZCTA/TRACT CONTEXT - Number of ZCTAs (Singular/Multiple) that reside within a US CENSUS TractST (10) - Numeric US CENSUS Tract Description (2010) CO (10) - Numeric US CENSUS Tract Description (2010)ST (10) CO (10) - Numeric US CENSUS Tract Description (2010)TRACT (10) - Numeric US CENSUS Tract Description (2010)GEOID (10) - Numeric US CENSUS Tract Description (2010)TRIBAL TRACT (10) - Numeric US CENSUS Tract Description (2010)Additional Mapping DataThe user is provided authoritative Federal Information Processing Standards (FIPS) such as numeric descriptions of state, county and tract identification, in addition to shape and length measurements of each census tract for data joining purposes.STATE (10) - Federal Information Processing Standards (FIPS)COUNTY (10) - Federal Information Processing Standards (FIPS)STATE (10), COUNTY (10) - Federal Information Processing Standards (FIPS)TRACT (10) - Federal Information Processing Standards (FIPS)TRIBAL TRACT (10) - Federal Information Processing Standards (FIPS)ST ABBRV (10) - State AbbreviationShape_Length - Total length of the polygon's (census tract) perimeter, in the units used by the feature class' coordinate system.Shape_Area - Total area of the polygon's (census tract) in the units used by the feature class' coordinate system.Data Source: Series Information for 2020 Census 5-Digit ZIP Code Tabulation Area (ZCTA5) National TIGER/Line Shapefiles, Current Open Geospatial Consortium (OGC) Application Programming Interface (API) Census ZIP Code Tabulation Areas - OGC Features copy this link to embed it in OGC Compliant viewers. For more information, please visit: ZIP Code Tabulation Areas (ZCTAs)To Report Data Discrepancies Contact the Rhode Island Department of Health (RIDOH) GIS (mapping) OfficePlease Be Certain To --Provide a Brief Description of What the Discrepancy IsInclude Your, Name, Organization, Telephone NumberAttach the Complete .xlsx with the Discrepancy Highlighted
Adult respondents 18+ who walked for transportation or leisure for at least 150 minutes in the past week. Years covered are from 2013-2014 by zip code. Data taken from the California Health Interview Survey Neighborhood Edition (AskCHIS NE) (http://askchisne.ucla.edu/), downloaded February 2018.AskCHIS Neighborhood Edition is an online data dissemination and visualization platform that provides health estimates at sub-county geographic regions. Estimates are powered by data from The California Health Interview Survey (CHIS). CHIS is conducted by The UCLA Center for Health Policy Research, an affiliate of UCLA Fielding School of Public Health.Health estimates available in AskCHIS NE (Neighborhood Edition) are model-based small area estimates (SAEs).SAEs are not direct estimates (estimates produced directly from survey data, such as those provided through AskCHIS).CHIS data and analytic results are used extensively in California in policy development, service planning and research, and is recognized and valued nationally as a model population-based health survey.Before using estimates from AskCHIS NE, it is recommended that you read more about the methodology and data limitations at: http://healthpolicy.ucla.edu/Lists/AskCHIS%20NE%20Page%20Content/AllItems.aspx. You can go to http://askchisne.ucla.edu/ to create your own account.Produced by The California Health Interview Survey and The UCLA Center for Health Policy Research and compiled by the Los Angeles County Department of Public Health. "Field Name = Field Definition"Zipcode" = postal zip code in the City of Los Angeles “Percent” = adults ages 18+ who walked for transportation or leisure for at least 150 minutes in the past week"LowerCL" = the lower 95% confidence limit represents the lower margin of error that occurs with statistical sampling"UpperCL" = the upper 95% confidence limit represents the upper margin of error that occurs in statistical sampling "Population" = estimated population 18 and older (denominator) residing in the zip code Notes: 1) Zip codes are based on the Los Angeles Housing Department Zip Codes Within the City of Los Angeles map (https://media.metro.net/about_us/pla/images/lazipcodes.pdf).2) Zip codes that did not have data available (i.e., null values) are not included in the dataset; there are additional zip codes that fall within the City of Los Angeles.3) Zip code boundaries do not align with political boundaries. These data are best viewed with a City of Los Angeles political boundary file (i.e., City of Los Angeles jurisdiction boundary, City Council boundary, etc.) FAQS: 1. Which cycle of CHIS does AskCHIS Neighborhood Edition provide estimates for?All health estimates in this version of AskCHIS Neighborhood Edition are based on data from the 2013-2014 California Health Interview Survey. 2. Why do your population estimates differ from other sources like ACS? The population estimates in AskCHIS NE represent the CHIS 2013-2014 population sample, which excludes Californians living in group quarters (such as prisons, nursing homes, and dormitories). 3. Why isn't there data available for all ZIP codes in Los Angeles?While AskCHIS NE has data on all ZCTAs (Zip Code Tabulation Areas), two factors may influence our ability to display the estimates:A small population (under 15,000): currently, the application only shows estimates for geographic entities with populations above 15,000. If your ZCTA has a population below this threshold, the easiest way to obtain data is to combine it with a neighboring ZCTA and obtain a pooled estimate.A high coefficient of variation: high coefficients of variation denote statistical instability.
A list of all uniform citations from the Louisville Metro Police Department, the CSV file is updated daily, including case number, date, _location, division, beat, offender demographics, statutes and charges, and UCR codes can be found in this Link.INCIDENT_NUMBER or CASE_NUMBER links these data sets together:Crime DataUniform Citation DataFirearm intakeLMPD hate crimesAssaulted OfficersCITATION_CONTROL_NUMBER links these data sets together:Uniform Citation DataLMPD Stops DataNote: When examining this data, make sure to read the LMPDCrime Data section in our Terms of Use.AGENCY_DESC - the name of the department that issued the citationCASE_NUMBER - the number associated with either the incident or used as reference to store the items in our evidence rooms and can be used to connect the dataset to the following other datasets INCIDENT_NUMBER:1. Crime Data2. Firearms intake3. LMPD hate crimes4. Assaulted OfficersNOTE: CASE_NUMBER is not formatted the same as the INCIDENT_NUMBER in the other datasets. For example: in the Uniform Citation Data you have CASE_NUMBER 8018013155 (no dashes) which matches up with INCIDENT_NUMBER 80-18-013155 in the other 4 datasets.CITATION_YEAR - the year the citation was issuedCITATION_CONTROL_NUMBER - links this LMPD stops dataCITATION_TYPE_DESC - the type of citation issued (citations include: general citations, summons, warrants, arrests, and juvenile)CITATION_DATE - the date the citation was issuedCITATION_LOCATION - the _location the citation was issuedDIVISION - the LMPD division in which the citation was issuedBEAT - the LMPD beat in which the citation was issuedPERSONS_SEX - the gender of the person who received the citationPERSONS_RACE - the race of the person who received the citation (W-White, B-Black, H-Hispanic, A-Asian/Pacific Islander, I-American Indian, U-Undeclared, IB-Indian/India/Burmese, M-Middle Eastern Descent, AN-Alaskan Native)PERSONS_ETHNICITY - the ethnicity of the person who received the citation (N-Not Hispanic, H=Hispanic, U=Undeclared)PERSONS_AGE - the age of the person who received the citationPERSONS_HOME_CITY - the city in which the person who received the citation livesPERSONS_HOME_STATE - the state in which the person who received the citation livesPERSONS_HOME_ZIP - the zip code in which the person who received the citation livesVIOLATION_CODE - multiple alpha/numeric code assigned by the Kentucky State Police to link to a Kentucky Revised Statute. For a full list of codes visit: https://kentuckystatepolice.org/crime-traffic-data/ASCF_CODE - the code that follows the guidelines of the American Security Council Foundation. For more details visit https://www.ascfusa.org/STATUTE - multiple alpha/numeric code representing a Kentucky Revised Statute. For a full list of Kentucky Revised Statute information visit: https://apps.legislature.ky.gov/law/statutes/CHARGE_DESC - the description of the type of charge for the citationUCR_CODE - the code that follows the guidelines of the Uniform Crime Report. For more details visit https://ucr.fbi.gov/UCR_DESC - the description of the UCR_CODE. For more details visit https://ucr.fbi.gov/
Note: Due to a system migration, this data will cease to update on March 14th, 2023. The current projection is to restart the updates within 30 days of the system migration, on or around April 13th, 2023A list of all uniform citations from the Louisville Metro Police Department, the CSV file is updated daily, including case number, date, location, division, beat, offender demographics, statutes and charges, and UCR codes can be found in this Link.INCIDENT_NUMBER or CASE_NUMBER links these data sets together:Crime DataUniform Citation DataFirearm intakeLMPD hate crimesAssaulted OfficersCITATION_CONTROL_NUMBER links these data sets together:Uniform Citation DataLMPD Stops DataNote: When examining this data, make sure to read the LMPDCrime Data section in our Terms of Use.AGENCY_DESC - the name of the department that issued the citationCASE_NUMBER - the number associated with either the incident or used as reference to store the items in our evidence rooms and can be used to connect the dataset to the following other datasets INCIDENT_NUMBER:1. Crime Data2. Firearms intake3. LMPD hate crimes4. Assaulted OfficersNOTE: CASE_NUMBER is not formatted the same as the INCIDENT_NUMBER in the other datasets. For example: in the Uniform Citation Data you have CASE_NUMBER 8018013155 (no dashes) which matches up with INCIDENT_NUMBER 80-18-013155 in the other 4 datasets.CITATION_YEAR - the year the citation was issuedCITATION_CONTROL_NUMBER - links this LMPD stops dataCITATION_TYPE_DESC - the type of citation issued (citations include: general citations, summons, warrants, arrests, and juvenile)CITATION_DATE - the date the citation was issuedCITATION_LOCATION - the location the citation was issuedDIVISION - the LMPD division in which the citation was issuedBEAT - the LMPD beat in which the citation was issuedPERSONS_SEX - the gender of the person who received the citationPERSONS_RACE - the race of the person who received the citation (W-White, B-Black, H-Hispanic, A-Asian/Pacific Islander, I-American Indian, U-Undeclared, IB-Indian/India/Burmese, M-Middle Eastern Descent, AN-Alaskan Native)PERSONS_ETHNICITY - the ethnicity of the person who received the citation (N-Not Hispanic, H=Hispanic, U=Undeclared)PERSONS_AGE - the age of the person who received the citationPERSONS_HOME_CITY - the city in which the person who received the citation livesPERSONS_HOME_STATE - the state in which the person who received the citation livesPERSONS_HOME_ZIP - the zip code in which the person who received the citation livesVIOLATION_CODE - multiple alpha/numeric code assigned by the Kentucky State Police to link to a Kentucky Revised Statute. For a full list of codes visit: https://kentuckystatepolice.org/crime-traffic-data/ASCF_CODE - the code that follows the guidelines of the American Security Council Foundation. For more details visit https://www.ascfusa.org/STATUTE - multiple alpha/numeric code representing a Kentucky Revised Statute. For a full list of Kentucky Revised Statute information visit: https://apps.legislature.ky.gov/law/statutes/CHARGE_DESC - the description of the type of charge for the citationUCR_CODE - the code that follows the guidelines of the Uniform Crime Report. For more details visit https://ucr.fbi.gov/UCR_DESC - the description of the UCR_CODE. For more details visit https://ucr.fbi.gov/
Adult respondents ages 18+ who were ever diagnosed with heart disease by a doctor. Years covered are from 2013-2014 by zip code. Data taken from the California Health Interview Survey Neighborhood Edition (AskCHIS NE) (http://askchisne.ucla.edu/), downloaded February 2018.AskCHIS Neighborhood Edition is an online data dissemination and visualization platform that provides health estimates at sub-county geographic regions. Estimates are powered by data from The California Health Interview Survey (CHIS). CHIS is conducted by The UCLA Center for Health Policy Research, an affiliate of UCLA Fielding School of Public Health.Health estimates available in AskCHIS NE (Neighborhood Edition) are model-based small area estimates (SAEs).SAEs are not direct estimates (estimates produced directly from survey data, such as those provided through AskCHIS).CHIS data and analytic results are used extensively in California in policy development, service planning and research, and is recognized and valued nationally as a model population-based health survey.Before using estimates from AskCHIS NE, it is recommended that you read more about the methodology and data limitations at: http://healthpolicy.ucla.edu/Lists/AskCHIS%20NE%20Page%20Content/AllItems.aspx. You can go to http://askchisne.ucla.edu/ to create your own account.Produced by The California Health Interview Survey and The UCLA Center for Health Policy Research and compiled by the Los Angeles County Department of Public Health. "Field Name = Field Definition"Zipcode" = postal zip code in the City of Los Angeles “Percent” = estimated percentage of adult respondents ages 18+ who were ever diagnosed with heart disease by a doctor"LowerCL" = the lower 95% confidence limit represents the lower margin of error that occurs with statistical sampling"UpperCL" = the upper 95% confidence limit represents the upper margin of error that occurs in statistical sampling "Population" = estimated population 18 and older (denominator) residing in the zip code Notes: 1) Zip codes are based on the Los Angeles Housing Department Zip Codes Within the City of Los Angeles map (https://media.metro.net/about_us/pla/images/lazipcodes.pdf).2) Zip codes that did not have data available (i.e., null values) are not included in the dataset; there are additional zip codes that fall within the City of Los Angeles.3) Zip code boundaries do not align with political boundaries. These data are best viewed with a City of Los Angeles political boundary file (i.e., City of Los Angeles jurisdiction boundary, City Council boundary, etc.) FAQS: 1. Which cycle of CHIS does AskCHIS Neighborhood Edition provide estimates for?All health estimates in this version of AskCHIS Neighborhood Edition are based on data from the 2013-2014 California Health Interview Survey. 2. Why do your population estimates differ from other sources like ACS? The population estimates in AskCHIS NE represent the CHIS 2013-2014 population sample, which excludes Californians living in group quarters (such as prisons, nursing homes, and dormitories). 3. Why isn't there data available for all ZIP codes in Los Angeles?While AskCHIS NE has data on all ZCTAs (Zip Code Tabulation Areas), two factors may influence our ability to display the estimates:A small population (under 15,000): currently, the application only shows estimates for geographic entities with populations above 15,000. If your ZCTA has a population below this threshold, the easiest way to obtain data is to combine it with a neighboring ZCTA and obtain a pooled estimate.A high coefficient of variation: high coefficients of variation denote statistical instability.
A dataset of 12-lead ECGs with annotations. The dataset contains 345 779 exams from 233 770 patients. It was obtained through stratified sampling from the CODE dataset ( 15% of the patients). The data was collected by the Telehealth Network of Minas Gerais in the period between 2010 and 2016. This repository contains the files exams.csv
and the files exams_part{i}.zip
for i = 0, 1, 2, ... 17. "exams.csv": is a comma-separated values (csv) file containing the columns "exam_id": id used for identifying the exam; "age": patient age in years at the moment of the exam; "is_male": true if the patient is male; "nn_predicted_age": age predicted by a neural network to the patient. As described in the paper "Deep neural network estimated electrocardiographic-age as a mortality predictor" bellow. "1dAVb": Whether or not the patient has 1st degree AV block; "RBBB": Whether or not the patient has right bundle branch block; "LBBB": Whether or not the patient has left bundle branch block; "SB": Whether or not the patient has sinus bradycardia; "AF": Whether or not the patient has atrial fibrillation; "ST": Whether or not the patient has sinus tachycardia; "patient_id": id used for identifying the patient; "normal_ecg": True if automatic annotation system say it is a normal ECG; "death": true if the patient dies in the follow-up time. This data is available only in the first exam of the patient. Other exams will have this as an empty field; "timey": if the patient dies it is the time to the death of the patient. If not, it is the follow-up time. This data is available only in the first exam of the patient. Other exams will have this as an empty field; "trace_file": identify in which hdf5 file the file corresponding to this patient is located. "exams_part{i}.hdf5": The HDF5 file containing two datasets named tracings
and other named exam_id
. The exam_id
is a tensor of dimension (N,)
containing the exam id (the same as in the csv file) and the dataset tracings
is a (N, 4096, 12)
tensor containing the ECG tracings in the same order. The first dimension corresponds to the different exams; the second dimension corresponds to the 4096 signal samples; the third dimension to the 12 different leads of the ECG exams in the following order: {DI, DII, DIII, AVR, AVL, AVF, V1, V2, V3, V4, V5, V6}
. The signals are sampled at 400 Hz. Some signals originally have a duration of 10 seconds (10 * 400 = 4000 samples) and others of 7 seconds (7 * 400 = 2800 samples). In order to make them all have the same size (4096 samples), we fill them with zeros on both sizes. For instance, for a 7 seconds ECG signal with 2800 samples we include 648 samples at the beginning and 648 samples at the end, yielding 4096 samples that are then saved in the hdf5 dataset. In python, one can read this file using h5py.pythonimport h5py f = h5py.File(path_to_file, 'r')# Get idstraces_ids = np.array(self.f['id_exam'])x = f['signal']
The signal
dataset is too large to fit in memory, so don't convert it to a numpy array all at once.It is possible to access a chunk of it using: x[start:end, :, :]
. The CODE dataset was collected by the Telehealth Network of Minas Gerais (TNMG) in the period between 2010 and 2016. TNMG is a public telehealth system assisting 811 out of the 853 municipalities in the state of Minas Gerais, Brazil. The dataset is described Ribeiro, Antônio H., Manoel Horta Ribeiro, Gabriela M. M. Paixão, Derick M. Oliveira, Paulo R. Gomes, Jéssica A. Canazart, Milton P. S. Ferreira, et al. “Automatic Diagnosis of the 12-Lead ECG Using a Deep Neural Network.” Nature Communications 11, no. 1 (2020): 1760. https://doi.org/10.1038/s41467-020-15432-4 The CODE 15% dataset is obtained from stratified sampling from the CODE dataset. This subset of the code dataset is described in and used for assessing model performance:"Deep neural network estimated electrocardiographic-age as a mortality predictor"Emilly M Lima, Antônio H Ribeiro, Gabriela MM Paixão, Manoel Horta Ribeiro, Marcelo M Pinto Filho, Paulo R Gomes, Derick M Oliveira, Ester C Sabino, Bruce B Duncan, Luana Giatti, Sandhi M Barreto, Wagner Meira Jr, Thomas B Schön, Antonio Luiz P Ribeiro. MedRXiv (2021) https://www.doi.org/10.1101/2021.02.19.21251232The companion code for reproducing the experiments in the two papers described above can be found, respectively, in:- https://github.com/antonior92/automatic-ecg-diagnosis; and in,- https://github.com/antonior92/ecg-age-prediction.Note about authorship: Antônio H. Ribeiro, Emilly M. Lima and Gabriela M.M. Paixão contributed equally to this work.
A list of all uniform citations from the Louisville Metro Police Department, the CSV file is updated daily, including case number, date, location, division, beat, offender demographics, statutes and charges, and UCR codes can be found in this Link.INCIDENT_NUMBER or CASE_NUMBER links these data sets together:Crime DataUniform Citation DataFirearm intakeLMPD hate crimesAssaulted OfficersCITATION_CONTROL_NUMBER links these data sets together:Uniform Citation DataLMPD Stops DataNote: When examining this data, make sure to read the LMPDCrime Data section in our Terms of Use.AGENCY_DESC - the name of the department that issued the citationCASE_NUMBER - the number associated with either the incident or used as reference to store the items in our evidence rooms and can be used to connect the dataset to the following other datasets INCIDENT_NUMBER:1. Crime Data2. Firearms intake3. LMPD hate crimes4. Assaulted OfficersNOTE: CASE_NUMBER is not formatted the same as the INCIDENT_NUMBER in the other datasets. For example: in the Uniform Citation Data you have CASE_NUMBER 8018013155 (no dashes) which matches up with INCIDENT_NUMBER 80-18-013155 in the other 4 datasets.CITATION_YEAR - the year the citation was issuedCITATION_CONTROL_NUMBER - links this LMPD stops dataCITATION_TYPE_DESC - the type of citation issued (citations include: general citations, summons, warrants, arrests, and juvenile)CITATION_DATE - the date the citation was issuedCITATION_LOCATION - the location the citation was issuedDIVISION - the LMPD division in which the citation was issuedBEAT - the LMPD beat in which the citation was issuedPERSONS_SEX - the gender of the person who received the citationPERSONS_RACE - the race of the person who received the citation (W-White, B-Black, H-Hispanic, A-Asian/Pacific Islander, I-American Indian, U-Undeclared, IB-Indian/India/Burmese, M-Middle Eastern Descent, AN-Alaskan Native)PERSONS_ETHNICITY - the ethnicity of the person who received the citation (N-Not Hispanic, H=Hispanic, U=Undeclared)PERSONS_AGE - the age of the person who received the citationPERSONS_HOME_CITY - the city in which the person who received the citation livesPERSONS_HOME_STATE - the state in which the person who received the citation livesPERSONS_HOME_ZIP - the zip code in which the person who received the citation livesVIOLATION_CODE - multiple alpha/numeric code assigned by the Kentucky State Police to link to a Kentucky Revised Statute. For a full list of codes visit: https://kentuckystatepolice.org/crime-traffic-data/ASCF_CODE - the code that follows the guidelines of the American Security Council Foundation. For more details visit https://www.ascfusa.org/STATUTE - multiple alpha/numeric code representing a Kentucky Revised Statute. For a full list of Kentucky Revised Statute information visit: https://apps.legislature.ky.gov/law/statutes/CHARGE_DESC - the description of the type of charge for the citationUCR_CODE - the code that follows the guidelines of the Uniform Crime Report. For more details visit https://ucr.fbi.gov/UCR_DESC - the description of the UCR_CODE. For more details visit https://ucr.fbi.gov/
A list of all uniform citations from the Louisville Metro Police Department, the CSV file is updated daily, including case number, date, location, division, beat, offender demographics, statutes and charges, and UCR codes can be found in this Link.INCIDENT_NUMBER or CASE_NUMBER links these data sets together:Crime DataUniform Citation DataFirearm intakeLMPD hate crimesAssaulted OfficersCITATION_CONTROL_NUMBER links these data sets together:Uniform Citation DataLMPD Stops DataNote: When examining this data, make sure to read the LMPDCrime Data section in our Terms of Use.AGENCY_DESC - the name of the department that issued the citationCASE_NUMBER - the number associated with either the incident or used as reference to store the items in our evidence rooms and can be used to connect the dataset to the following other datasets INCIDENT_NUMBER:1. Crime Data2. Firearms intake3. LMPD hate crimes4. Assaulted OfficersNOTE: CASE_NUMBER is not formatted the same as the INCIDENT_NUMBER in the other datasets. For example: in the Uniform Citation Data you have CASE_NUMBER 8018013155 (no dashes) which matches up with INCIDENT_NUMBER 80-18-013155 in the other 4 datasets.CITATION_YEAR - the year the citation was issuedCITATION_CONTROL_NUMBER - links this LMPD stops dataCITATION_TYPE_DESC - the type of citation issued (citations include: general citations, summons, warrants, arrests, and juvenile)CITATION_DATE - the date the citation was issuedCITATION_LOCATION - the location the citation was issuedDIVISION - the LMPD division in which the citation was issuedBEAT - the LMPD beat in which the citation was issuedPERSONS_SEX - the gender of the person who received the citationPERSONS_RACE - the race of the person who received the citation (W-White, B-Black, H-Hispanic, A-Asian/Pacific Islander, I-American Indian, U-Undeclared, IB-Indian/India/Burmese, M-Middle Eastern Descent, AN-Alaskan Native)PERSONS_ETHNICITY - the ethnicity of the person who received the citation (N-Not Hispanic, H=Hispanic, U=Undeclared)PERSONS_AGE - the age of the person who received the citationPERSONS_HOME_CITY - the city in which the person who received the citation livesPERSONS_HOME_STATE - the state in which the person who received the citation livesPERSONS_HOME_ZIP - the zip code in which the person who received the citation livesVIOLATION_CODE - multiple alpha/numeric code assigned by the Kentucky State Police to link to a Kentucky Revised Statute. For a full list of codes visit: https://kentuckystatepolice.org/crime-traffic-data/ASCF_CODE - the code that follows the guidelines of the American Security Council Foundation. For more details visit https://www.ascfusa.org/STATUTE - multiple alpha/numeric code representing a Kentucky Revised Statute. For a full list of Kentucky Revised Statute information visit: https://apps.legislature.ky.gov/law/statutes/CHARGE_DESC - the description of the type of charge for the citationUCR_CODE - the code that follows the guidelines of the Uniform Crime Report. For more details visit https://ucr.fbi.gov/UCR_DESC - the description of the UCR_CODE. For more details visit https://ucr.fbi.gov/
Mapping Layer Data Released: 06/15/2023, | Last Updated 01/20/2024Data Currency: This data is checked semi-annually from it's enterprise federal source fo 2010 CENSUS Data and will support mapping, analysis, data exports and the Open Geospatial Consortium (OGC) Application Programming Interface (API).Data Update Frequency: Twice, YearlyData Cycle | History (as required below)QA/QC Performed: December, 2024Next Scheduled Data QA/QC: July, 2024TRACT 10 (2010 CENSUS) CONNECT LAYERData Requester: Rhode Island Executive Office of Health and Human Service (OHHS) via Health Equity Institute (HEI).Data Requester: Rhode Island Department of Health, Maternal Child Health via Health Equity Institute (HEI).Data Request: Provide a database deliverable via download that contains both US CENSUS tracts and USPS Zip Code Tabulation Areas (ZCTA).HEALTH EQUITY INSTITUTE DATA CONNECT RI Using Modern GIS (Mapping)🡅 Click IT 🡅Facilitate transformative mapping visualizations that engage constituents and measure the impact of real-world solutions.Instructions to Join Your Data Provided Below STEP 1: Video (Pending)STEP 2: Video (Pending)STEP 3: Video (Pending)There are twenty-two U.S. CENSUS fields (download here) that you can join to your datasets. For additional insight, please contact the Center for Health Data and Analysis (CHDA) Rhode Island Department of Health (GIS) Mapping Department for assistance.Database Enhancement: This database contains two (2) additional data fields for consideration to be added to the existing 2020 State of Rhode Island Health Equity Map.Zip Code Tabulation Area (ZCTA)ZCTA/Tract Relationship (Singular ZCTAs per Tract, versus Multiple ZCTAs per Tract)Additional Information: While ZCTAs can be useful for certain qualitative purposes, such as broad or general high level analysis, they may not provide the level of granularity and accuracy required for in-depth demographic research which is required for policy mapping. ZCTAs can change frequently as the US Postal Service (USPS) adjusts postal routes and boundaries. These changes can lead to inconsistencies and challenges in tracking demographic trends and making accurate comparisons over time.RIDOH GIS encourages analysts to make the appropriate choice of using census based data, with their consistent boundaries readily available for suitability for spatial analysis when conducting detailed demographic research.Here are a few reasons why you might want to consider using census based data (tracts, block groups, and blocks) instead of ZCTAs:1. Inaccurate Representations: ZCTAs are not designed for statistical analysis or demographic research. They are created by the United States Postal Service (USPS) for efficient mail delivery and can often span multiple cities, counties, or even states. As a result, ZCTAs may not accurately represent the actual geographic boundaries or demographic characteristics of a specific area.2. Lack of Granularity: ZCTAs are typically larger than census tracts, which are smaller, more homogeneous geographic units defined by the U.S. Census Bureau. Census tracts are designed to be relatively consistent in terms of population size, allowing for more detailed analysis at a local level. ZCTAs, on the other hand, can vary significantly in terms of population size, making it challenging to draw precise conclusions about specific neighborhoods or communities.3. Data Availability and Compatibility: Census tracts are used by the U.S. Census Bureau to collect and report demographic data. Consequently, a wide range of demographic information, such as population counts, age distribution, income levels, and education levels, is readily available at the census tract level. In contrast, data specifically tailored to ZCTAs may be more limited, making it difficult to obtain comprehensive and consistent data for demographic analysis.4. Changes Over Time: Census tracts are relatively stable over time, allowing for consistent longitudinal analysis. ZCTAs, however, can change frequently as the USPS adjusts postal routes and boundaries. These changes can lead to inconsistencies and challenges in tracking demographic trends and making accurate comparisons over time.5. Spatial Analysis: Census tracts are designed to maintain a level of spatial proximity, adjacency, or connectedness of these data containers while providing consistency and continuity over time - making them useful for spatial analysis. Mapping. ZCTAs, on the other hand, may not exhibit the same level of spatial coherence due to their primary purpose being mail delivery efficiency rather than geographic representation.State Agencies - Contact RIDOH GIS - Learn More About Mapping Data Available at the Census Tract LevelRIDOH GIS releases this database with the caveats noted above and that the researcher can accurately align the ZCTAs with the corresponding census tracts. Careful consideration should be given to the comparability and compatibility of the data collected at different geographic levels to ensure valid and meaningful statistical conclusions. Data Dictionary: 2010 Decennial CensusOBJECT ID - the count of each census tract entity.GEOID (10) STATE,COUNTY,TRACT - Numeric US CENSUS Tract Description (2010) HEZ (10) - Health Equity Zone (2020)LOCATION (10) - Plain Language Census Tract Descriptor (2010)COUNTY (10) NAME - County Name (2010)STATE (10) NAME - State Name (2010)ZCTA (23) - Zip Code Tabulation Area - Numeric US CENSUS ZCTA Description (2023)ZCTA/TRACT CONTEXT - Number of ZCTAs (Singular/Multiple) that reside within a US CENSUS TractST (10) - Numeric US CENSUS Tract Description (2010) CO (10) - Numeric US CENSUS Tract Description (2010)ST (10) CO (10) - Numeric US CENSUS Tract Description (2010)TRACT (10) - Numeric US CENSUS Tract Description (2010)GEOID (10) - Numeric US CENSUS Tract Description (2010)TRIBAL TRACT (10) - Numeric US CENSUS Tract Description (2010)Additional Mapping DataThe user is provided authoritative Federal Information Processing Standards (FIPS) such as numeric descriptions of state, county and tract identification, in addition to shape and length measurements of each census tract for data joining purposes.STATE (10) - Federal Information Processing Standards (FIPS)COUNTY (10) - Federal Information Processing Standards (FIPS)STATE (10), COUNTY (10) - Federal Information Processing Standards (FIPS)TRACT (10) - Federal Information Processing Standards (FIPS)TRIBAL TRACT (10) - Federal Information Processing Standards (FIPS)ST ABBRV (10) - State AbbreviationShape_Length - Total length of the polygon's (census tract) perimeter, in the units used by the feature class' coordinate system.Shape_Area - Total area of the polygon's (census tract) in the units used by the feature class' coordinate system.Data Source: Series Information for 2020 Census 5-Digit ZIP Code Tabulation Area (ZCTA5) National TIGER/Line Shapefiles, Current Open Geospatial Consortium (OGC) Application Programming Interface (API) Census ZIP Code Tabulation Areas - OGC Features copy this link to embed it in OGC Compliant viewers. For more information, please visit: ZIP Code Tabulation Areas (ZCTAs)To Report Data Discrepancies Contact the Rhode Island Department of Health (RIDOH) GIS (mapping) OfficePlease Be Certain To --Provide a Brief Description of What the Discrepancy IsInclude Your, Name, Organization, Telephone NumberAttach the Complete .xlsx with the Discrepancy Highlighted
EUCA dataset description Associated Paper: EUCA: the End-User-Centered Explainable AI Framework
Authors: Weina Jin, Jianyu Fan, Diane Gromala, Philippe Pasquier, Ghassan Hamarneh
Introduction: EUCA dataset is for modelling personalized or interactive explainable AI. It contains 309 data points of 32 end-users' preferences on 12 forms of explanation (including feature-, example-, and rule-based explanations). The data were collected from a user study on 32 layperson participants in the Greater Vancouver city area in 2019-2020. In the user study, the participants (P01-P32) were presented with AI-assisted critical tasks on house price prediction, health status prediction, purchasing a self-driving car, and studying for a biological exam [1]. Within each task and for its given explanation goal [2], the participants selected and rank the explanatory forms [3] that they saw the most suitable.
1 EUCA_EndUserXAI_ExplanatoryFormRanking.csv
Column description:
Index - Participants' number Case - task-explanation goal combination accept to use AI? trust it? - Participants response to whether they will use AI given the task and explanation goal require explanation? - Participants response to the question whether they request an explanation for the AI 1st, 2nd, 3rd, ... - Explanatory form card selection and ranking cards fulfill requirement? - After the card selection, participants were asked whether the selected card combination fulfill their explainability requirement.
2 EUCA_EndUserXAI_demography.csv
It contains the participants demographics, including their age, gender, educational background, and their knowledge and attitudes toward AI.
EUCA dataset zip file for download
More Context for EUCA Dataset [1] Critical tasks There are four tasks. Task label and their corresponding task titles are: house - Selling your house car - Buying an autonomous driving vehicle health - Personal health decision bird - Learning bird species
Please refer to EUCA quantatative data analysis report for the storyboard of the tasks and explanation goals presented in the user study.
[2] Explanation goal End-users may have different goals/purposes to check an explanation from AI. The EUCA dataset includes the following 11 explanation goals, with its [label] in the dataset, full name and description
[trust] Calibrate trust: trust is a key to establish human-AI decision-making partnership. Since users can easily distrust or overtrust AI, it is important to calibrate the trust to reflect the capabilities of AI systems.
[safe] Ensure safety: users need to ensure safety of the decision consequences.
[bias] - Detect bias: users need to ensure the decision is impartial and unbiased.
[unexpect] Resolve disagreement with AI: the AI prediction is unexpected and there are disagreements between users and AI.
[expected] - Expected: the AI's prediction is expected and aligns with users' expectations.
[differentiate] Differentiate similar instances: due to the consequences of wrong decisions, users sometimes need to discern similar instances or outcomes. For example, a doctor differentiates whether the diagnosis is a benign or malignant tumor.
[learning] Learn: users need to gain knowledge, improve their problem-solving skills, and discover new knowledge
[control] Improve: users seek causal factors to control and improve the predicted outcome.
[communicate] Communicate with stakeholders: many critical decision-making processes involve multiple stakeholders, and users need to discuss the decision with them.
[report] Generate reports: users need to utilize the explanations to perform particular tasks such as report production. For example, a radiologist generates a medical report on a patient's X-ray image.
[multi] Trade-off multiple objectives: AI may be optimized on an incomplete objective while the users seek to fulfill multiple objectives in real-world applications. For example, a doctor needs to ensure a treatment plan is effective as well as has acceptable patient adherence. Ethical and legal requirements may also be included as objectives.
[3] Explanatory form The following 12 explanatory forms are end-user-friendly, i.e.: no technical knowledge is required for the end-user to interpret the explanation.
Feature-Based Explanation
Feature Attribution - fa
Note: for tasks that has image as input data, the feature attribution is denoted by the following two cards:
ir: important regions (a.k.a. heat map or saliency map)
irc: important regions with their feature contribution percentage
Feature Shape - fs
Feature Interaction - fi
Example-Based Explanation
Similar Example - se Typical Example - te
Counterfactual Example - ce
Note: for contractual example, there were two visual variations used in the user study: cet: counterfactual example with transition from one example to the counterfactual one ceh: counterfactual example with the contrastive feature highlighted
Rule-Based Explanation
Rule - rt Decision Tree - dt
Decision Flow - df
Supplementary Information
Input Output Performance Dataset - prior (output prediction with prior distribution of each class in the training set)
Note: occasionally there is a wild card, which means the participant draw the card by themselves. It is indicated as 'wc'.
For visual examples of each explanatory form card, please refer to the Explanatory_form_labels.pdf document.
Link to the details on users' requirements on different explanatory forms
Code and report for EUCA data quantatitve analysis
EUCA data analysis code EUCA quantatative data analysis report
EUCA data citation @article{jin2021euca, title={EUCA: the End-User-Centered Explainable AI Framework}, author={Weina Jin and Jianyu Fan and Diane Gromala and Philippe Pasquier and Ghassan Hamarneh}, year={2021}, eprint={2102.02437}, archivePrefix={arXiv}, primaryClass={cs.HC} }
Data Layer Name: Vermont Rational Service Areas (RSAs)
Alternate Name: Vermont RSAs
Overview:
Rational Service Areas (RSAs), originally developed in 2001 and revised in 2011, are generalized catchment areas relating to the delivery of primary health care services. In Vermont, RSA area delineations rely primarily on utilization data. The methods used are similar to those used by David Goodman to define primary care service areas based on Medicare data, but include additional sources of utilization data. Using these methods, towns were assigned based on where residents are going for their primary care.
The process used to delineate Vermont RSAs was iterative. It began by examining utilization patterns based on: (1) the primary care service areas that Goodman had defined for Vermont from Medicare data; (2) Vermont Medicaid assignments of clients to primary care providers; and, (3) responses to the “town of residence”/”town of primary care” questions in the Vermont Behavioral Risk Factor survey. Taking into account the limitations of each of these sources of data, VDH statisticians defined preliminary town centers and were able to assign approximately two/thirds of the towns to a town center. For towns with no clear utilization patterns, they examined mileage from these preliminary centers, and mileage from towns that had primary care physicians. Contiguity of areas was also examined. A few centers were added and others were deleted. After all towns were assigned to a center and mapped, outliers were identified and reviewed by referring to both mileage maps and utilization patterns. Drive time information was not available. In some cases where the mileage map seemed to indicate one center, but the utilization patterns were strongly supportive of another center, utilization was used as a proxy for drive time.
Preliminary RSAs were presented to the Vermont Primary Care Collaborative, the Vermont Coalition of Clinics for the Uninsured and other community members for their feedback. Department of Health District Directors from the Division of Community Public Health were also consulted. These groups suggested modifications to the areas based on their experience working in the areas in question. As a result of this review a few centers were added, deleted and combined, and several towns were reassigned. The Vermont Primary Care Collaborative reviewed the final version of RSAs.
The result of this process is 38 Rational Service Areas.
Given the limitations of the information available for this purpose, the delineation approach was deemed reasonable and has resulted in a set of RSAs that have been widely reviewed and accepted. Because of the iterative process, it is recognized that this is not a "pure" methodology in the sense that someone else attempting to replicate this process would probably not produce exactly the same results.
RSAs have been reviewed periodically to keep up with changes in demographics and provider practice locations. One revision occurred in 2011. This 2011 revision took towns that had originally been assigned as using out-of-state providers and reassigned them to Vermont RSAs.
Technical Details:
Vermont RSAs were defined using 3 sources of primary care utilization data and mileage maps. Each of the data sources had limitations, and these limitations had to be considered as towns were assigned to a RSA. A description of each of these data sources is provided.
Medicare utilization data was obtained from the Primary Care Service Areas developed by David Goodman using 1996 and 1997 Medicare Part B and Outpatient files. Thirty-eight primary care service areas were defined for Vermont. The major limitation of these assignments was that they were based on zip codes rather than town boundaries. Many small towns do not have their own zip code, or the town may be divided into multiple zip codes shared with multiple other towns. As the utilization data was reviewed consideration was given to whether the zip code in question represented the town, or whether utilization from that town may have been masked by a larger town's utilization patterns. A second consideration was that the Medicare data used 1996 & 1997 utilization. In areas where there were new practices established after 1997, the Medicare data would not be able to reflect their utilization.
Medicaid claims data only included children age 17 and under. The file contained Medicaid clients in 2000 with the town of residence of the client and the town of the primary care provider. The limitation in this file was that although the Medicaid database included a field for the geographic location of the provider separate from the mailing address, after examining the file it was determined that in many cases the mailing address was also being entered into the geographic location. In areas where practices were owned by a larger organization, the utilization patterns could not be determined. For example, in the St. Johnsbury RSA there were practices owned by an out-of-state medical center. Although it is known that there are medicaid providers in some of the towns in that area, all of the utilization was coded to out of state. Therefore the Medicaid data had to be disregarded in this area. The St. Johnsbury RSA was subsequently defined around three town centers (St. Johnsbury, Lyndon, and Danville) because more precise utilization patterns could not be distinguished.
The BRFSS data was obtained from the 1998-2000 surveys. Respondents were asked for the town of their primary care provider. The town of residence of the respondent is also collected. These responses represented all Vermonters age 18-64 years old, regardless of type of insurance. The limitation of this data was small number of respondents in the smaller towns.
Mileage information was obtained from the Vermont Medicaid program. This mileage information was derived using GIS mapping software to assess all statewide roads. However, drive-time data could not be determined at that time because there was no distinction between primary and secondary roads. The Medicaid program applied GIS mapping software to assign clients to primary care providers using 15 miles as a proxy for 30-minute drive time. This standard was also used in 2001 when the original RSAs were developed.
The VDH Public Health Statistics program periodically updates RSA GIS data. (last updated in 2011)
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The UpStory dataset is an anonymized child-child interaction dataset, with an experimental manipulation for the level of rapport. It contains data pertaining to pairs of classmates (ages 8-10) playing a storytelling game in a naturalistic setting; pairs are selected to either promote close and friendly interactions (high-rapport condition), or promote distant interactions between acquaintances (low-rapport condition). Due to the experimental design, most children participated in two pairs: one high-rapport and one low-rapport.
A copy of this text is included in the ZIP file.
The dataset contains data for 35 pairs. Each pair is given an ID starting with P
(high-rapport condition) or N
(low-rapport condition), followed by the academic year (2 or 3), and 2 additional digits. E.g.: N251 is a low-rapport pair from year 2; P318 is a high-rapport pair from year 3. Similarly, each child is given a 2-digit ID. E.g.: child 17 participated in pairs P245 and N255.
Each pair played between 1 and 5 rounds of the game. Each round is provided as an individual sample, with its own associated time series as CSV files. In total, 106 rounds are provided.
The top-level CSV file child-info.csv
offers child-level information, including the following items:
child_id
: the child's ID (a random 2-digit unique identifier).gender
: boy
or girl
.year
: academic year the child belonged to (2
or 3
).age
: the child's age in years at the beginning of the data collection effort. Either an exact value (9
or 10
), or a range (8-9
).The top-level CSV file pair-info.csv
offers pair-level information, including the following items:
pair_id
: the pair ID, as described above.condition
: the experimental condition this pair belonged to (low_rapport
or high_rapport
).distance
: the distance between the two participants in their year's friendship network (integer in range 2 <= n <=56
for Year 2 pairs, and 2 <= n <= 20
for Year 3 pairs).year
: academic year the children belonged to (2
or 3
).rounds
: number of game rounds the pair played (1 <= n <= 5
).child_1
: first child in the pair (lower ID; 2 digits).child_2
: second child in the pair (higher ID; 2 digits).The dataset contains time-series data extracted from two different video sources, each one overviewing the play area from one side: the left-camera
and right-camera
. Each video source has its own top-level folder, with data extracted from that source inside it.
In each source folder, you will find CSV files named -
(e.g., left-camera-N249-round-1-face.csv
). There is a separate file for each round of the game; each pair typically played ~3 rounds (min: 1, max: 5). As the names suggest, face
files contain information related to head pose and facial expression, while pose
files contain information related to full body pose.
Face data was extracted with OpenFace, and contains most information that is produced by the tool. See the OpenFace documentation for more details. Time series are given at 25Hz; entries are indexed by frame
(0-indexed) and child_id
. Included data:
confidence
and success
indicators.Pose data was extracted with OpenPose. Time series are given at 25Hz; entries are indexed by frame
(0-indexed), child_id,
and joint
(named body part that the row refers to). Data provided per row:
x
: horizontal position in the frame, in pixels, left-to-right (float; range 0-width).y
: vertical position in the frame, in pixels, top-to-bottom (float; range 0-height).confidence
: OpenPose's reported prediction confidence (float; range 0-1).CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Data to illustrate our geographic RD methodological framework. Illustration is a a re-examination of the effects of political advertisements on voter turnout during a presidential campaign, exploiting the exogenous variation in the volume of presidential ads that is created by media market boundaries. We rely on two data sources. Our main source is the New Jersey voter file. This dataset has measures of party registration, gender and age directly from the voter file, and imputed values of education, income, poverty status, and employment status. The voter file also contains the address of each voter, which allows us to find each voter's geographic location and avoid the use of naive distances. Our second data source is property sales records. We acquired records for all houses sold in the appropriate zip codes in New Jersey from January 2006 to November 2008. In this time period, nearly 3,000 homes were sold in this area -- although we only used the 1,800 house sales inside one specific school district, see below. The housing sales data allow us to conduct a fine-grained analysis of the sales price differential along the boundary of interest.
NOTE: This dataset replaces a previous one. Please see below. Chicago residents who are up to date with COVID-19 vaccines by ZIP Code, based on the reported home address and age group of the person vaccinated, as provided by the medical provider in the Illinois Comprehensive Automated Immunization Registry Exchange (I-CARE). “Up to date” refers to individuals who meet the CDC’s updated COVID-19 vaccination criteria based on their age and prior vaccination history. For surveillance purposes, up to date is defined based on the following criteria: People ages 5 years and older: · Are up to date when they receive 1+ doses of a COVID-19 vaccine during the current season. Children ages 6 months to 4 years: · Children who have received at least two prior COVID-19 vaccine doses are up to date when they receive one additional dose of COVID-19 vaccine during the current season, regardless of vaccine product. · Children who have received only one prior COVID-19 vaccine dose are up to date when they receive one additional dose of the current season's Moderna COVID-19 vaccine or two additional doses of the current season's Pfizer-BioNTech COVID-19 vaccine. · Children who have never received a COVID-19 vaccination are up to date when they receive either two doses of the current season's Moderna vaccine or three doses of the current season's Pfizer-BioNTech vaccine. This dataset takes the place of a previous dataset, which covers doses administered from December 15, 2020 through September 13, 2023 and is marked as historical: - https://data.cityofchicago.org/Health-Human-Services/COVID-19-Vaccinations-by-ZIP-Code/553k-3xzc. Data Notes: Weekly cumulative totals of people up to date are shown for each combination ZIP Code and age group. Note there are rows where age group is "All ages" so care should be taken when summing rows. Coverage percentages are calculated based on the cumulative number of people in each ZIP Code and age group who are considered up to date as of the week ending date divided by the estimated number of people in that subgroup. Population counts are obtained from the 2020 U.S. Decennial Census. For ZIP Codes mostly outside Chicago, coverage percentages are not calculated reliable Chicago-only population counts are not available. Actual counts may exceed population estimates and lead to coverage estimates that are greater than 100%, especially in smaller ZIP Codes with smaller populations. Additionally, the medical provider may report a work address or incorrect home address for the person receiving the vaccination, which may lead to over- or underestimation of vaccination coverage by geography. All coverage percentages are capped at 99%. Weekly cumulative counts and coverage percentages are reported from the week ending Saturday, September 16, 2023 onward through the Saturday prior to the dataset being updated. All data are provisional and subject to change. Information is updated as additional details are received and it is, in fact, very common for recent dates to be incomplete and to be updated as time goes on. At any given time, this dataset reflects data currently known to CDPH. Numbers in this dataset may differ from other public sources due to when data are reported and how City of Chicago boundaries are defined. The Chicago Department of Public Health uses the most complete data available to estimate COVID-19 vaccination coverage among Chicagoans, but there are several limitations that impact our estimates. Individuals may receive vaccinations that are not recorded in the Illinois immunization registry, I-CARE, such as those administered in another state, causing underestimation of the number individuals who are up to date. Inconsistencies in records of separate doses administered to the same person, such as slight variations in dates of birth, can result in duplicate records for a person and underestimate the number of people who are up to date. For all datasets related to COVID-19, please