TIGER, TIGER/Line, and Census TIGER are registered trademarks of the Bureau of the Census. The Redistricting Census 2000 TIGER/Line files are an extract of selected geographic and cartographic information from the Census TIGER data base. The geographic coverage for a single TIGER/Line file is a county or statistical equivalent entity, with the coverage area based on January 1, 2000 legal boundaries. A complete set of Redistricting Census 2000 TIGER/Line files includes all counties and statistically equivalent entities in the United States and Puerto Rico. The Redistricting Census 2000 TIGER/Line files will not include files for the Island Areas. The Census TIGER data base represents a seamless national file with no overlaps or gaps between parts. However, each county-based TIGER/Line file is designed to stand alone as an independent data set or the files can be combined to cover the whole Nation. The Redistricting Census 2000 TIGER/Line files consist of line segments representing physical features and governmental and statistical boundaries. The Redistricting Census 2000 TIGER/Line files do NOT contain the ZIP Code Tabulation Areas (ZCTAs) and the address ranges are of approximately the same vintage as those appearing in the 1999 TIGER/Line files. That is, the Census Bureau is producing the Redistricting Census 2000 TIGER/Line files in advance of the computer processing that will ensure that the address ranges in the TIGER/Line files agree with the final Master Address File (MAF) used for tabulating Census 2000. The files contain information distributed over a series of record types for the spatial objects of a county. There are 17 record types, including the basic data record, the shape coordinate points, and geographic codes that can be used with appropriate software to prepare maps. Other geographic information contained in the files includes attributes such as feature identifiers/census feature class codes (CFCC) used to differentiate feature types, address ranges and ZIP Codes, codes for legal and statistical entities, latitude/longitude coordinates of linear and point features, landmark point features, area landmarks, key geographic features, and area boundaries. The Redistricting Census 2000 TIGER/Line data dictionary contains a complete list of all the fields in the 17 record types.
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This dataset contains a (mostly) complete set of results from marathons across the United States and Canada in 2024.
The dataset is restricted to races with more than 200 finishers. Some races are therefore excluded, but they account for a small share of the total number of finishers.
The dataset is also restricted to races that are USATF-certified. Most of the races are road marathons, although some trail races are included. But these are "road-like" trail marathons, where times are similar to the road and can be used for Boston qualifying purposes.
This dataset is similar to the one I created with results from 2023. The two datasets can be combined, but the race names differ in some cases. You'll have to clean up the race names to get them to group correctly.
I initially collected these results to prepare the dataset for the 2026 Boston Marathon Cutoff Time Tracker. I also used it to update my percentile-based age grade calculator, to calculate the average marathon times for each age group, to identify a list of the largest races in the United States, and to support various other analyses.
If time permits, I plan to update this dataset to include additional information about each race - including the location and the weather on race day.
https://www.usa.gov/government-works/https://www.usa.gov/government-works/
US Census data describing national gender and race demographics from 2000 to 2020.
The 2000 and 2010 data is fairly straight-forward. The US census website only had the caveat that the 2010 category for "Some other race-only" may have been between (19.1-20.1 million / 6.2-6.5%) and the category for "2 or more races" may have been a range (8.0-9.0 million / 2.6-2.9%). The numbers used in the dataset were the final numbers that the US census gives as their final numbers.
The official 2020 Census data will not be released until May 2023, so the numbers given are not official yet.
2020 Gender: The gender numbers are an estimate (163.8-164.8 million female / 166.9-167.8 million male). I used numbers that kept the ratio and summed to the total population. 2020 Race: The categories "Some other race-only" and "2 or more races" increased significantly for 2020. These changes are mainly due to a difference in how the race and ethnicity questions were asked. (It wasn't only because the demographics themselves changed, but mainly in how people answer the question.) The "Some other race-only" includes mostly Latino and Hispanic people (94%). The "2 or more races" category includes mostly people who are both White and another race(s) (86%). You should take this change into account when comparing an earlier census to the 2020 census. Race "Minority": Lastly, the minority category is calculated by subtracting the population of White-only, Non-Hispanic people from the total US population. Anyone who is any other race besides white AND anyone who is Latino/Hispanic would fall into the minority category.
Sources: 2000 Gender (1st paragraph), 2000 Race (page 3) 2010 Gender (2nd paragraph), 2010 Race (page 4) 2020 Gender Estimates (Estimates by Age and Sex table), 2020 Race (1) (throughout article), 2020 Race (2) ("What are facts for my country" section), 2020 Race (3) (Extra, similar)
The Bureau of the Census has released Census 2000 Summary File 1 (SF1) 100-Percent data. The file includes the following population items: sex, age, race, Hispanic or Latino origin, household relationship, and household and family characteristics. Housing items include occupancy status and tenure (whether the unit is owner or renter occupied). SF1 does not include information on incomes, poverty status, overcrowded housing or age of housing. These topics will be covered in Summary File 3. Data are available for states, counties, county subdivisions, places, census tracts, block groups, and, where applicable, American Indian and Alaskan Native Areas and Hawaiian Home Lands. The SF1 data are available on the Bureau's web site and may be retrieved from American FactFinder as tables, lists, or maps. Users may also download a set of compressed ASCII files for each state via the Bureau's FTP server. There are over 8000 data items available for each geographic area. The full listing of these data items is available here as a downloadable compressed data base file named TABLES.ZIP. The uncompressed is in FoxPro data base file (dbf) format and may be imported to ACCESS, EXCEL, and other software formats. While all of this information is useful, the Office of Community Planning and Development has downloaded selected information for all states and areas and is making this information available on the CPD web pages. The tables and data items selected are those items used in the CDBG and HOME allocation formulas plus topics most pertinent to the Comprehensive Housing Affordability Strategy (CHAS), the Consolidated Plan, and similar overall economic and community development plans. The information is contained in five compressed (zipped) dbf tables for each state. When uncompressed the tables are ready for use with FoxPro and they can be imported into ACCESS, EXCEL, and other spreadsheet, GIS and database software. The data are at the block group summary level. The first two characters of the file name are the state abbreviation. The next two letters are BG for block group. Each record is labeled with the code and name of the city and county in which it is located so that the data can be summarized to higher-level geography. The last part of the file name describes the contents . The GEO file contains standard Census Bureau geographic identifiers for each block group, such as the metropolitan area code and congressional district code. The only data included in this table is total population and total housing units. POP1 and POP2 contain selected population variables and selected housing items are in the HU file. The MA05 table data is only for use by State CDBG grantees for the reporting of the racial composition of beneficiaries of Area Benefit activities. The complete package for a state consists of the dictionary file named TABLES, and the five data files for the state. The logical record number (LOGRECNO) links the records across tables.
Note: This COVID-19 data set is no longer being updated as of December 1, 2023. Access current COVID-19 data on the CDPH respiratory virus dashboard (https://www.cdph.ca.gov/Programs/CID/DCDC/Pages/Respiratory-Viruses/RespiratoryDashboard.aspx) or in open data format (https://data.chhs.ca.gov/dataset/respiratory-virus-dashboard-metrics).
As of August 17, 2023, data is being updated each Friday.
For death data after December 31, 2022, California uses Provisional Deaths from the Center for Disease Control and Prevention’s National Center for Health Statistics (NCHS) National Vital Statistics System (NVSS). Prior to January 1, 2023, death data was sourced from the COVID-19 registry. The change in data source occurred in July 2023 and was applied retroactively to all 2023 data to provide a consistent source of death data for the year of 2023.
As of May 11, 2023, data on cases, deaths, and testing is being updated each Thursday. Metrics by report date have been removed, but previous versions of files with report date metrics are archived below.
All metrics include people in state and federal prisons, US Immigration and Customs Enforcement facilities, US Marshal detention facilities, and Department of State Hospitals facilities. Members of California's tribal communities are also included.
The "Total Tests" and "Positive Tests" columns show totals based on the collection date. There is a lag between when a specimen is collected and when it is reported in this dataset. As a result, the most recent dates on the table will temporarily show NONE in the "Total Tests" and "Positive Tests" columns. This should not be interpreted as no tests being conducted on these dates. Instead, these values will be updated with the number of tests conducted as data is received.
This dataset contains student responses to each item on the Views of Climate and Learning (VOCAL) survey since 2018. These responses are aggregated at the state level by grade and student group to protect student privacy.
The VOCAL survey is designed to provide information on student perceptions of school climate. There are two reports with different types of data: responses to individual items and aggregate index scaled scores that combine item responses. For more information about the VOCAL survey, please visit the VOCAL home page.
This dataset is one of two containing the same data that is also published in the VOCAL state dashboard: VOCAL Index Scaled Scores and Favorability VOCAL Item Response Scores
List of Items by Index and Topic
Engagement - Cultural Competence
By Natarajan Krishnaswami [source]
The FHFA Public Use Databases provide an unprecedented look into the flow of mortgage credit and capital in America's communities. With detailed information about the income, race, gender and census tract location of borrowers, this database can help lenders, planners, researchers and housing advocates better understand how mortgages are acquired by Fannie Mae and Freddie Mac.
This data set includes 2009-2016 single-family property loan information from the Enterprises in combination with corresponding census tract information from the 2010 decennial census. It allows for greater granularity in examining mortgage acquisition patterns within each MSA or county by combining borrower/property characteristics, such as borrower's race/ethnicity; co-borrower demographics; occupancy type; Federal guarantee program (conventional/other versus FHA-insured); age of borrowers; loan purpose (purchase, refinance or home improvement); lien status; rate spread between annual percentage rate (APR) and average prime offer rate (APOR); HOEPA status; area median family income and more.
In addition to demographic data on borrowers and properties, this dataset also provides insight into affordability metrics such as median family incomes at both the MSA/county level as well as functional owner occupied bankrupt tracts using 2010 Census based geography while taking into account American Community Survey estimates available at January 1st 2016. This allows us to calculate metrics that are important for assessing inequality such as tract income ratios which measure what portion of an area’s median family income is made up by a single borrows earnings or the ratio between borrows annual income compared to an area’s average median family iincome for those year’s reporting period. Finally each record contains Enterprise Flags associated with whether loans were purchased my Fannie Mae or Freddie Mac indicating further insights regarding who is financing policies affecting undocumented immigrant labor access as well affordable housing legislation targeted towards first time home buyers
For more datasets, click here.
- 🚨 Your notebook can be here! 🚨!
This guide will provide you with all the information needed to use the Fannie Mae and Freddie Mac Loan-Level Dataset for 2016. The dataset contains loan-level data for both Fannie Mae and Freddie Mac, including loans acquired in 2016. It includes details such as homeowner demographics, loan-to-value ratio, census tract location, and affordability of mortgage.
The first step to using this dataset is understanding how it is organized. There are 38 fields that make up the loan level data set, making it easy to understand what is being looked at. For each field there is a description of what the field represents and potential values it can take on (i.e., if it’s an integer or float). Having an understanding of the different fields will help when querying certain data points or comparing/contrasting.
Once you understand what type of information is available in this dataset you can start to create queries or visualizations that compare trends across Fannie Mae & Freddie Mac loans made in 2016. Depending on your interest areas such as homeownership rates or income disparities certain statistics may be pulled from the dataset such as borrower’s Annual Income Ratio per area median family income by state code or a comparison between Race & Ethnicity breakdown between borrowers and co-borrowers from various states respective MSAs, among other possibilities based on your inquiries . Visualizations should then be created so that clear comparisons and contrasts could be seen more easily by other users who may look into this same dataset for additional insights as well .
After creating queries/visualization , you can dive deeper into research about corresponding trends & any biases seen within these datasets related within particular racial groupings compared against US Postal & MSA codes used within the 2010 Census Tract locations throughout the US respectively by further utilizing publicly available research material that looks at these subjects with regards housing policies implemented through out years one could further draw conclusions depending on their current inquiries
- Use the dataset to analyze borrowing patterns based on race, nationality and gender, to better understand the links between minority groups and access to credit...
This web map displays data from the voter registration database as the percent of registered voters by census tract in King County, Washington. The data for this web map is compiled from King County Elections voter registration data for the years 2013-2019. The total number of registered voters is based on the geo-location of the voter's registered address at the time of the general election for each year. The eligible voting population, age 18 and over, is based on the estimated population increase from the US Census Bureau and the Washington Office of Financial Management and was calculated as a projected 6 percent population increase for the years 2010-2013, 7 percent population increase for the years 2010-2014, 9 percent population increase for the years 2010-2015, 11 percent population increase for the years 2010-2016 & 2017, 14 percent population increase for the years 2010-2018 and 17 percent population increase for the years 2010-2019. The total population 18 and over in 2010 was 1,517,747 in King County, Washington. The percentage of registered voters represents the number of people who are registered to vote as compared to the eligible voting population, age 18 and over. The voter registration data by census tract was grouped into six percentage range estimates: 50% or below, 51-60%, 61-70%, 71-80%, 81-90% and 91% or above with an overall 84 percent registration rate. In the map the lighter colors represent a relatively low percentage range of voter registration and the darker colors represent a relatively high percentage range of voter registration. PDF maps of these data can be viewed at King County Elections downloadable voter registration maps. The 2019 General Election Voter Turnout layer is voter turnout data by historical precinct boundaries for the corresponding year. The data is grouped into six percentage ranges: 0-30%, 31-40%, 41-50% 51-60%, 61-70%, and 71-100%. The lighter colors represent lower turnout and the darker colors represent higher turnout. The King County Demographics Layer is census data for language, income, poverty, race and ethnicity at the census tract level and is based on the 2010-2014 American Community Survey 5 year Average provided by the United States Census Bureau. Since the data is based on a survey, they are considered to be estimates and should be used with that understanding. The demographic data sets were developed and are maintained by King County Staff to support the King County Equity and Social Justice program. Other data for this map is located in the King County GIS Spatial Data Catalog, where data is managed by the King County GIS Center, a multi-department enterprise GIS in King County, Washington. King County has nearly 1.3 million registered voters and is the largest jurisdiction in the United States to conduct all elections by mail. In the map you can view the percent of registered voters by census tract, compare registration within political districts, compare registration and demographic data, verify your voter registration or register to vote through a link to the VoteWA, Washington State Online Voter Registration web page.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Release Date: 2018-08-10.[NOTE: Includes firms with payroll at any time during 2016. Employment reflects the number of paid employees during the March 12 pay period. Data are based on Census administrative records, and the estimates of business ownership by gender, ethnicity, race, and veteran status are from the 2016 Annual Survey of Entrepreneurs. Detail may not add to total due to rounding or because a Hispanic firm may be of any race. Moreover, each owner had the option of selecting more than one race and therefore is included in each race selected. Respondent firms include all firms that responded to the characteristic(s) tabulated in this dataset and reported gender, ethnicity, race, or veteran status for at least one owner and were not publicly held or not classifiable by gender, ethnicity, race, and veteran status. The 2016 Annual Survey of Entrepreneurs asked for information for up to four persons owning the largest percentage(s) of the business. Percentages are for owners of respondent firms only and are not recalculated when the dataset is resorted. Percentages are always based on total reporting (defined above) within a gender, ethnicity, race, veteran status, and/or industry group for the characteristics tabulated in this dataset. Firms with more than one domestic establishment are counted in each geographic area and industry in which they operate, but only once in the U.S. and state totals for all sectors. For information on confidentiality protection, sampling error, nonsampling error, and definitions, see Survey Methodology.]..Table Name. . Statistics for Owners of Respondent Employer Firms by Reasons for Owning the Business by Sector, Gender, Ethnicity, Race, Veteran Status, and Years in Business for the U.S., States, and Top 50 MSAs: 2016. ..Release Schedule. . This file was released in August 2018.. ..Key Table Information. . These data are related to all other 2016 ASE files.. Refer to the Methodology section of the Annual Survey of Entrepreneurs website for additional information.. ..Universe. . The universe for the 2016 Annual Survey of Entrepreneurs (ASE) includes all U.S. firms with paid employees operating during 2016 with receipts of $1,000 or more which are classified in the North American Industry Classification System (NAICS) sectors 11 through 99, except for NAICS 111, 112, 482, 491, 521, 525, 813, 814, and 92 which are not covered. Firms with more than one domestic establishment are counted in each geographic area and industry in which they operate, but only once in the U.S. total.. For Characteristics of Business Owners (CBO) data, all estimates are of owners of firms responding to the ASE. That is, estimates are based only on firms providing gender, ethnicity, race, or veteran status; or firms not classifiable by gender, ethnicity, race, and veteran status that returned an ASE online questionnaire with at least one question answered. The ASE online questionnaire provided space for up to four owners to report their characteristics.. CBO data are not representative of all owners of all firms operating in the United States. The data do not represent all business owners in the United States.. ..Geographic Coverage. . The data are shown for:. . United States. States and the District of Columbia. The fifty most populous metropolitan areas. . ..Industry Coverage. . The data are shown for the total of all sectors (00) and the 2-digit NAICS code level.. ..Data Items and Other Identifying Records. . Statistics for Owners of Respondent Employer Firms by Reasons for Owning the Business by Sector, Gender, Ethnicity, Race, Veteran Status, and Years in Business for the U.S., States, and Top 50 MSAs: 2016 contains data on:. . Number of owners of respondent firms with paid employees. Percent of number of owners of respondent firms with paid employees. . The data are shown for:. . Gender, ethnicity, race and veteran status of owners of respondent firms. . All owners of respondent firms. Female. Male. Hispanic. Non-Hispanic. White. Black or African American. American Indian and Alaska Native. Asian. Native Hawaiian and Other Pacific Islander. Some other race. Minority. Nonminority. Veteran. Nonveteran. . . Years in business. . All firms. Firms less than 2 years in business. Firms with 2 to 3 years in business. Firms with 4 to 5 years in business. Firms with 6 to 10 years in business. Firms with 11 to 15 years in business. Firms with 16 or more years in business. . . Owner's reasons for owning the business. . Wanted to be my own boss: Not important. ...
https://en.wikipedia.org/wiki/Public_domainhttps://en.wikipedia.org/wiki/Public_domain
This dataset contains information about the demographics of all US cities and census-designated places with a population greater or equal to 65,000. This data comes from the US Census Bureau's 2015 American Community Survey. This product uses the Census Bureau Data API but is not endorsed or certified by the Census Bureau.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Release Date: 2017-07-13.[NOTE: Includes firms with payroll at any time during 2015. Employment reflects the number of paid employees during the March 12 pay period. Data are based on Census administrative records, and the estimates of business ownership by gender, ethnicity, race, and veteran status are from the 2015 Annual Survey of Entrepreneurs. Detail may not add to total due to rounding or because a Hispanic firm may be of any race. Moreover, each owner had the option of selecting more than one race and therefore is included in each race selected. Respondent firms include all firms that responded to the characteristic(s) tabulated in this dataset and reported gender, ethnicity, race, or veteran status for at least one owner and were not publicly held or not classifiable by gender, ethnicity, race, and veteran status. The 2015 Annual Survey of Entrepreneurs asked for information for up to four persons owning the largest percentage(s) of the business. Percentages are for owners of respondent firms only and are not recalculated when the dataset is resorted. Percentages are always based on total reporting (defined above) within a gender, ethnicity, race, veteran status, and/or industry group for the characteristics tabulated in this dataset. Firms with more than one domestic establishment are counted in each geographic area and industry in which they operate, but only once in the U.S. and state totals for all sectors. For information on confidentiality protection, sampling error, nonsampling error, and definitions, see Survey Methodology.]..Table Name. . Statistics for Owners of Respondent Employer Firms by Reasons for Owning the Business by Sector, Gender, Ethnicity, Race, Veteran Status, and Years in Business for the U.S., States, and Top 50 MSAs: 2015. ..Release Schedule. . This file was released in July 2017.. ..Key Table Information. . These data are related to all other 2015 ASE files.. Refer to the Methodology section of the Annual Survey of Entrepreneurs website for additional information.. ..Universe. . The universe for the 2015 Annual Survey of Entrepreneurs (ASE) includes all U.S. firms with paid employees operating during 2015 with receipts of $1,000 or more which are classified in the North American Industry Classification System (NAICS) sectors 11 through 99, except for NAICS 111, 112, 482, 491, 521, 525, 813, 814, and 92 which are not covered. Firms with more than one domestic establishment are counted in each geographic area and industry in which they operate, but only once in the U.S. total.. For Characteristics of Business Owners (CBO) data, all estimates are of owners of firms responding to the ASE. That is, estimates are based only on firms providing gender, ethnicity, race, or veteran status; or firms not classifiable by gender, ethnicity, race, and veteran status that returned an ASE online questionnaire with at least one question answered. The ASE online questionnaire provided space for up to four owners to report their characteristics.. CBO data are not representative of all owners of all firms operating in the United States. The data do not represent all business owners in the United States.. ..Geographic Coverage. . The data are shown for:. . United States. States and the District of Columbia. The fifty most populous metropolitan areas. . ..Industry Coverage. . The data are shown for the total of all sectors (00) and the 2-digit NAICS code level.. ..Data Items and Other Identifying Records. . Statistics for Owners of Respondent Employer Firms by Reasons for Owning the Business by Sector, Gender, Ethnicity, Race, Veteran Status, and Years in Business for the U.S., States, and Top 50 MSAs: 2015 contains data on:. . Number of owners of respondent firms with paid employees. Percent of number of owners of respondent firms with paid employees. . The data are shown for:. . Gender, ethnicity, race and veteran status of owners of respondent firms. . All owners of respondent firms. Female. Male. Hispanic. Non-Hispanic. White. Black or African American. American Indian and Alaska Native. Asian. Native Hawaiian and Other Pacific Islander. Some other race. Minority. Nonminority. Veteran. Nonveteran. . . Years in business. . All firms. Firms less than 2 years in business. Firms with 2 to 3 years in business. Firms with 4 to 5 years in business. Firms with 6 to 10 years in business. Firms with 11 to 15 years in business. Firms with 16 or more years in business. . . Owner's reasons for owning the business. . Wanted to be my own boss: Not important. ...
Version 5 release notes:
Removes support for SPSS and Excel data.Changes the crimes that are stored in each file. There are more files now with fewer crimes per file. The files and their included crimes have been updated below.
Adds in agencies that report 0 months of the year.Adds a column that indicates the number of months reported. This is generated summing up the number of unique months an agency reports data for. Note that this indicates the number of months an agency reported arrests for ANY crime. They may not necessarily report every crime every month. Agencies that did not report a crime with have a value of NA for every arrest column for that crime.Removes data on runaways.
Version 4 release notes:
Changes column names from "poss_coke" and "sale_coke" to "poss_heroin_coke" and "sale_heroin_coke" to clearly indicate that these column includes the sale of heroin as well as similar opiates such as morphine, codeine, and opium. Also changes column names for the narcotic columns to indicate that they are only for synthetic narcotics.
Version 3 release notes:
Add data for 2016.Order rows by year (descending) and ORI.Version 2 release notes:
Fix bug where Philadelphia Police Department had incorrect FIPS county code.
The Arrests by Age, Sex, and Race data is an FBI data set that is part of the annual Uniform Crime Reporting (UCR) Program data. This data contains highly granular data on the number of people arrested for a variety of crimes (see below for a full list of included crimes). The data sets here combine data from the years 1980-2015 into a single file. These files are quite large and may take some time to load.
All the data was downloaded from NACJD as ASCII+SPSS Setup files and read into R using the package asciiSetupReader. All work to clean the data and save it in various file formats was also done in R. For the R code used to clean this data, see here. https://github.com/jacobkap/crime_data. If you have any questions, comments, or suggestions please contact me at jkkaplan6@gmail.com.
I did not make any changes to the data other than the following. When an arrest column has a value of "None/not reported", I change that value to zero. This makes the (possible incorrect) assumption that these values represent zero crimes reported. The original data does not have a value when the agency reports zero arrests other than "None/not reported." In other words, this data does not differentiate between real zeros and missing values. Some agencies also incorrectly report the following numbers of arrests which I change to NA: 10000, 20000, 30000, 40000, 50000, 60000, 70000, 80000, 90000, 100000, 99999, 99998.
To reduce file size and make the data more manageable, all of the data is aggregated yearly. All of the data is in agency-year units such that every row indicates an agency in a given year. Columns are crime-arrest category units. For example, If you choose the data set that includes murder, you would have rows for each agency-year and columns with the number of people arrests for murder. The ASR data breaks down arrests by age and gender (e.g. Male aged 15, Male aged 18). They also provide the number of adults or juveniles arrested by race. Because most agencies and years do not report the arrestee's ethnicity (Hispanic or not Hispanic) or juvenile outcomes (e.g. referred to adult court, referred to welfare agency), I do not include these columns.
To make it easier to merge with other data, I merged this data with the Law Enforcement Agency Identifiers Crosswalk (LEAIC) data. The data from the LEAIC add FIPS (state, county, and place) and agency type/subtype. Please note that some of the FIPS codes have leading zeros and if you open it in Excel it will automatically delete those leading zeros.
I created 9 arrest categories myself. The categories are:
Total Male JuvenileTotal Female JuvenileTotal Male AdultTotal Female AdultTotal MaleTotal FemaleTotal JuvenileTotal AdultTotal ArrestsAll of these categories are based on the sums of the sex-age categories (e.g. Male under 10, Female aged 22) rather than using the provided age-race categories (e.g. adult Black, juvenile Asian). As not all agencies report the race data, my method is more accurate. These categories also make up the data in the "simple" version of the data. The "simple" file only includes the above 9 columns as the arrest data (all other columns in the data are just agency identifier columns). Because this "simple" data set need fewer columns, I include all offenses.
As the arrest data is very granular, and each category of arrest is its own column, there are dozens of columns per crime. To keep the data somewhat manageable, there are nine different files, eight which contain different crimes and the "simple" file. Each file contains the data for all years. The eight categories each have crimes belonging to a major crime category and do not overlap in crimes other than with the index offenses. Please note that the crime names provided below are not the same as the column names in the data. Due to Stata limiting column names to 32 characters maximum, I have abbreviated the crime names in the data. The files and their included crimes are:
Index Crimes
MurderRapeRobberyAggravated AssaultBurglaryTheftMotor Vehicle TheftArsonAlcohol CrimesDUIDrunkenness
LiquorDrug CrimesTotal DrugTotal Drug SalesTotal Drug PossessionCannabis PossessionCannabis SalesHeroin or Cocaine PossessionHeroin or Cocaine SalesOther Drug PossessionOther Drug SalesSynthetic Narcotic PossessionSynthetic Narcotic SalesGrey Collar and Property CrimesForgeryFraudStolen PropertyFinancial CrimesEmbezzlementTotal GamblingOther GamblingBookmakingNumbers LotterySex or Family CrimesOffenses Against the Family and Children
Other Sex Offenses
ProstitutionRapeViolent CrimesAggravated AssaultMurderNegligent ManslaughterRobberyWeapon Offenses
Other CrimesCurfewDisorderly ConductOther Non-trafficSuspicion
VandalismVagrancy
Simple
This data set has every crime and only the arrest categories that I created (see above).
If you have any questions, comments, or suggestions please contact me at jkkaplan6@gmail.com.
https://dataverse.harvard.edu/api/datasets/:persistentId/versions/1.0/customlicense?persistentId=doi:10.7910/DVN/25732https://dataverse.harvard.edu/api/datasets/:persistentId/versions/1.0/customlicense?persistentId=doi:10.7910/DVN/25732
In developed democracies, incumbents are consistently found to have an electoral advantage over their challengers. The normative implications of this phenomenon depend on its sources. Despite a large existing literature, there is little consensus on what the sources are. In this three-paper dissertation, I find that both electoral institutions and the parties behind the incumbents appear to have a larger role than the literature has given them credit for, and that in the U.S. context, between 30 and 40 percent of the incumbents’ advantage is driven by their “scaring off” serious opposition. In “Voting for Parties or for Candidates: Do Electoral Institutions Make a Difference?” I analyze the Comparative Study of Electoral Systems (CSES) data to put the U.S. case in a comparative context and explore the impact of electoral institutions on voting behavior. My findings suggest that electoral institutions have a substantial effect on the degree to which politics is party-oriented or personalistic, and thus, they might in turn have an impact on the level of incumbency advantage in the elections. In “How Parties Help Their Incumbents Win: Evidence from Spain,” I explore a novel dataset of elections to the Spanish Senate, where the commonly studied sources of incumbency advantage are unlikely to be present and where we can use a precise measure of incumbency advantage. I find that the main source of the senator’s advantage comes from their placement on the ballot by their party leaders. In “Challenger Quality and the Incumbency Advantage,” my co-authors and I provide estimates of the incumbency advantage and the effect of previous office-holding experience that account for the strategic entry in the race by high-quality challengers. For that purpose, we use term limits as an instrument for challenger quality. Studying U.S. state legislatures, we find that between 30 and 40 percent of the inc umbency advantage in state legislative races is the result of scaring off experienced challengers.
This is a source dataset for a Let's Get Healthy California indicator at https://letsgethealthy.ca.gov/. Infant Mortality is defined as the number of deaths in infants under one year of age per 1,000 live births. Infant mortality is often used as an indicator to measure the health and well-being of a community, because factors affecting the health of entire populations can also impact the mortality rate of infants. Although California’s infant mortality rate is better than the national average, there are significant disparities, with African American babies dying at more than twice the rate of other groups. Data are from the Birth Cohort Files. The infant mortality indicator computed from the birth cohort file comprises birth certificate information on all births that occur in a calendar year (denominator) plus death certificate information linked to the birth certificate for those infants who were born in that year but subsequently died within 12 months of birth (numerator). Studies of infant mortality that are based on information from death certificates alone have been found to underestimate infant death rates for infants of all race/ethnic groups and especially for certain race/ethnic groups, due to problems such as confusion about event registration requirements, incomplete data, and transfers of newborns from one facility to another for medical care. Note there is a separate data table "Infant Mortality by Race/Ethnicity" which is based on death records only, which is more timely but less accurate than the Birth Cohort File. Single year shown to provide state-level data and county totals for the most recent year. Numerator: Infants deaths (under age 1 year). Denominator: Live births occurring to California state residents. Multiple years aggregated to allow for stratification at the county level. For this indicator, race/ethnicity is based on the birth certificate information, which records the race/ethnicity of the mother. The mother can “decline to state”; this is considered to be a valid response. These responses are not displayed on the indicator visualization.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset contains counts of deaths for California counties based on information entered on death certificates. Final counts are derived from static data and include out-of-state deaths to California residents, whereas provisional counts are derived from incomplete and dynamic data. Provisional counts are based on the records available when the data was retrieved and may not represent all deaths that occurred during the time period. Deaths involving injuries from external or environmental forces, such as accidents, homicide and suicide, often require additional investigation that tends to delay certification of the cause and manner of death. This can result in significant under-reporting of these deaths in provisional data.
The final data tables include both deaths that occurred in each California county regardless of the place of residence (by occurrence) and deaths to residents of each California county (by residence), whereas the provisional data table only includes deaths that occurred in each county regardless of the place of residence (by occurrence). The data are reported as totals, as well as stratified by age, gender, race-ethnicity, and death place type. Deaths due to all causes (ALL) and selected underlying cause of death categories are provided. See temporal coverage for more information on which combinations are available for which years.
The cause of death categories are based solely on the underlying cause of death as coded by the International Classification of Diseases. The underlying cause of death is defined by the World Health Organization (WHO) as "the disease or injury which initiated the train of events leading directly to death, or the circumstances of the accident or violence which produced the fatal injury." It is a single value assigned to each death based on the details as entered on the death certificate. When more than one cause is listed, the order in which they are listed can affect which cause is coded as the underlying cause. This means that similar events could be coded with different underlying causes of death depending on variations in how they were entered. Consequently, while underlying cause of death provides a convenient comparison between cause of death categories, it may not capture the full impact of each cause of death as it does not always take into account all conditions contributing to the death.
The Kirwan Institute for the Study of Race and Ethnicity at Ohio State University developed the Detroit Regional Opportunity Index to compare levels of opportunity for people growing up in different parts of a region. The Index was developed by combining many different data indicators for opportunity into a single score. More information on the Detroit methodology and composite data can be found here: http://kirwaninstitute.osu.edu/wp-content/uploads/2014/08/20131211neighborhood.pdf
The full report from Kirwan on the Detroit Opportunity project can be found here: http://kirwaninstitute.osu.edu/?my-product=opportunity-for-all-inequity-linked-fate-and-social-justice-in-detroit-and-michigan/
Studies of descriptive representation find that voters more positively evaluate representatives who share their ascriptive characteristics. I argue that this pattern can be upended when voters develop more positive affect towards outgroups. In the United States, Democrats have increasingly expressed more positive views towards marginalized groups, while Republicans’ attitudes about these groups have not shifted. Under such conditions, my argument predicts that the effect of representatives’ race and gender on constituent evaluations should vary more by constituents’ partisanship than by their own ascriptive characteristics. Applying a difference-in-differences design to 2008-2020 CCES data, I find that Democrats of all backgrounds now approve more highly of Congressmembers from historically marginalized groups, whereas Republicans’ approval is unrelated to Member identity. Democrats also give women and minority representatives leeway to diverge ideologically. These findings demonstrate that polarizing attitudes about race and gender can disrupt classic patterns in how constituents evaluate representatives.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
A richly phenotyped transdiagnostic dataset with behavioral and Magnetic Resonance Imaging (MRI) data from 241 individuals aged 18 to 70, comprising 148 individuals meeting diagnostic criteria for a broad range of psychiatric illnesses and a healthy comparison group of 93 individuals.
These data include high-resolution anatomical scans and 6 x resting-state, and 3 x task-based (2 x Stroop, 1 x Faces/Shapes) functional MRI runs. Participants completed over 50 psychological and cognitive questionnaires, as well as a semi-structured clinical interview.
Data was collected at the Brain Imaging Center, Yale University, New Haven, CT and McLean Hospital, Belmont, MA. This dataset will allow investigation into brain function and transdiagnostic psychopathology in a community sample. See preprint (https://www.medrxiv.org/content/10.1101/2024.06.18.24309054v1) and below for detailed information.
Participants in the study met the following inclusion criteria:
Participants meeting any of the criteria listed below were excluded from the study: * Neurological disorders * Pervasive developmental disorders (e.g., autism spectrum disorder) * Any medical condition that increases risk for MRI (e.g., pacemaker, dental braces) * MRI contraindications (e.g., claustrophobia pregnancy)
Institutional Review Board approval and consent were obtained. To characterise the sample, we collected data on race/ethnicity, income, use of psychotropic medication, and family history of medical or psychiatric conditions.
Relevant clinical measures can be found in the phenotype
folder, with each measure and its items described in the relevant _definition
.csv file. The 'qc' columns indicate quality control checks done on each measure (i.e., number of unanswered items by a participant.) '999' values indicate missing or skipped data.
MRI data were acquired at both sites using harmonized Siemens Magnetom 3T Prisma MRI scanners and a 64-channel head coil. T1-weighted (T1-w) anatomical images were acquired using a multi-echo MPRAGE sequence following parameters: acquisition duration of 132 seconds, with a repetition time (TR) of 2.2 seconds, echo times (TE) of 1.5, 3.4, 5.2, and 7.0 milliseconds, a flip angle of 7°, an inversion time (TI) of 1.1 seconds, a sagittal orientation and anterior (A) to posterior (P) phase encoding. The slice thickness was 1.2 millimeters, and 144 slices were acquired. The image resolution was 1.2 mm3. A root mean square of the four images corresponding to each echo was computed to derive a single image. T2-weighted (T2w) anatomical images with the following parameters: TR of 2800 milliseconds, TE of 326 milliseconds, a sagittal orientation, and AP phase encoding direction. The slice thickness was 1.2 millimeters, and 144 slices were acquired. All seven functional MRI runs were acquired with the same parameters matching the HCP protocol6,9, varying only the conditions (rest/task) and separately acquired phase encoding directions (AP/PA). For the resting-state, Stroop task, and Emotional Faces task, a total of 488, 510, and 493 volumes were acquired, respectively, all using the following MRI sequence parameters: TR = 800 milliseconds, TE = 37 milliseconds, flip angle = 52°, and voxel size =2mm3. A multi-band acceleration factor of 8 was applied. An auto-align pulse sequence protocol was used to align the acquisition slices of the functional scans parallel to the anterior. To enable the correction of the distortions in the EPI images, B0-field maps were acquired in both AP and PA directions with a standard Spin Echo sequence. Detailed MRI acquisition protocols for both sites are available in Appendix B. In total, four resting-state (2 AP, 2 PA), 2 Stroop task acquisitions (1 AP [Block 1], 1 PA [Block 2]), and 1 Emotional Faces task acquisition (1 AP) acquisitions were collected. Select participants out of the total sample did not complete each functional neuroimaging run; thus the sample sizes for each run were as follows: resting-state AP run 1, n = 241; resting-state PA run 1, n = 241; resting-state AP run 2, n = 237; resting-state AP run 2, n = 235; Stroop task AP, n = 226; Stroop task PA, n = 224; and Emotional Faces task AP, n = 226.
For the Emotional Faces task, the faces are fear and anger expressing (male and female groups) from the NimStim database. The faces used in each trial are outlines in each events.tsv file.For example, FA1 = female anger stimuli set number 1, or FF1 =female fear stimuli set number 1. Unfortunately, we cannot release the actual images publicly. An important consideration here might be that this task has no neutral control nor positively valenced comparison for faces (i.e., is precisely a negatively valenced face vs non-face/shape version of the task). We will soon update the events.tsv files on OpenNeuro with more informative file names (e.g. female_fear, female_anger, male_fear, male_anger).
Detailed information and protocols regarding the dataset can be found here: https://www.medrxiv.org/content/10.1101/2024.06.18.24309054v1
This dataset contains counts of deaths for California as a whole based on information entered on death certificates. Final counts are derived from static data and include out-of-state deaths to California residents, whereas provisional counts are derived from incomplete and dynamic data. Provisional counts are based on the records available when the data was retrieved and may not represent all deaths that occurred during the time period. Deaths involving injuries from external or environmental forces, such as accidents, homicide and suicide, often require additional investigation that tends to delay certification of the cause and manner of death. This can result in significant under-reporting of these deaths in provisional data.
The final data tables include both deaths that occurred in California regardless of the place of residence (by occurrence) and deaths to California residents (by residence), whereas the provisional data table only includes deaths that occurred in California regardless of the place of residence (by occurrence). The data are reported as totals, as well as stratified by age, gender, race-ethnicity, and death place type. Deaths due to all causes (ALL) and selected underlying cause of death categories are provided. See temporal coverage for more information on which combinations are available for which years.
The cause of death categories are based solely on the underlying cause of death as coded by the International Classification of Diseases. The underlying cause of death is defined by the World Health Organization (WHO) as "the disease or injury which initiated the train of events leading directly to death, or the circumstances of the accident or violence which produced the fatal injury." It is a single value assigned to each death based on the details as entered on the death certificate. When more than one cause is listed, the order in which they are listed can affect which cause is coded as the underlying cause. This means that similar events could be coded with different underlying causes of death depending on variations in how they were entered. Consequently, while underlying cause of death provides a convenient comparison between cause of death categories, it may not capture the full impact of each cause of death as it does not always take into account all conditions contributing to the death.
This dataset contains counts of live births for California counties based on information entered on birth certificates. Final counts are derived from static data and include out of state births to California residents, whereas provisional counts are derived from incomplete and dynamic data. Provisional counts are based on the records available when the data was retrieved and may not represent all births that occurred during the time period.
The final data tables include both births that occurred in California regardless of the place of residence (by occurrence) and births to California residents (by residence), whereas the provisional data table only includes births that occurred in California regardless of the place of residence (by occurrence). The data are reported as totals, as well as stratified by parent giving birth's age, parent giving birth's race-ethnicity, and birth place type. See temporal coverage for more information on which strata are available for which years.
TIGER, TIGER/Line, and Census TIGER are registered trademarks of the Bureau of the Census. The Redistricting Census 2000 TIGER/Line files are an extract of selected geographic and cartographic information from the Census TIGER data base. The geographic coverage for a single TIGER/Line file is a county or statistical equivalent entity, with the coverage area based on January 1, 2000 legal boundaries. A complete set of Redistricting Census 2000 TIGER/Line files includes all counties and statistically equivalent entities in the United States and Puerto Rico. The Redistricting Census 2000 TIGER/Line files will not include files for the Island Areas. The Census TIGER data base represents a seamless national file with no overlaps or gaps between parts. However, each county-based TIGER/Line file is designed to stand alone as an independent data set or the files can be combined to cover the whole Nation. The Redistricting Census 2000 TIGER/Line files consist of line segments representing physical features and governmental and statistical boundaries. The Redistricting Census 2000 TIGER/Line files do NOT contain the ZIP Code Tabulation Areas (ZCTAs) and the address ranges are of approximately the same vintage as those appearing in the 1999 TIGER/Line files. That is, the Census Bureau is producing the Redistricting Census 2000 TIGER/Line files in advance of the computer processing that will ensure that the address ranges in the TIGER/Line files agree with the final Master Address File (MAF) used for tabulating Census 2000. The files contain information distributed over a series of record types for the spatial objects of a county. There are 17 record types, including the basic data record, the shape coordinate points, and geographic codes that can be used with appropriate software to prepare maps. Other geographic information contained in the files includes attributes such as feature identifiers/census feature class codes (CFCC) used to differentiate feature types, address ranges and ZIP Codes, codes for legal and statistical entities, latitude/longitude coordinates of linear and point features, landmark point features, area landmarks, key geographic features, and area boundaries. The Redistricting Census 2000 TIGER/Line data dictionary contains a complete list of all the fields in the 17 record types.