Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Context
The dataset presents median household incomes for various household sizes in Middle Point, OH, as reported by the U.S. Census Bureau. The dataset highlights the variation in median household income with the size of the family unit, offering valuable insights into economic trends and disparities within different household sizes, aiding in data analysis and decision-making.
Key observations
https://i.neilsberg.com/ch/middle-point-oh-median-household-income-by-household-size.jpeg" alt="Middle Point, OH median household income, by household size (in 2022 inflation-adjusted dollars)">
When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2017-2021 5-Year Estimates.
Household Sizes:
Variables / Data Columns
Good to know
Margin of Error
Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.
Custom data
If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.
Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.
This dataset is a part of the main dataset for Middle Point median household income. You can refer the same here
This dataset and map service provides information on the U.S. Housing and Urban Development's (HUD) low to moderate income areas. The term Low to Moderate Income, often referred to as low-mod, has a specific programmatic context within the Community Development Block Grant (CDBG) program. Over a 1, 2, or 3-year period, as selected by the grantee, not less than 70 percent of CDBG funds must be used for activities that benefit low- and moderate-income persons. HUD uses special tabulations of Census data to determine areas where at least 51% of households have incomes at or below 80% of the area median income (AMI). This dataset and map service contains the following layer.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Context
The dataset presents median household incomes for various household sizes in Middle Branch Township, Michigan, as reported by the U.S. Census Bureau. The dataset highlights the variation in median household income with the size of the family unit, offering valuable insights into economic trends and disparities within different household sizes, aiding in data analysis and decision-making.
Key observations
https://i.neilsberg.com/ch/middle-branch-township-mi-median-household-income-by-household-size.jpeg" alt="Middle Branch Township, Michigan median household income, by household size (in 2022 inflation-adjusted dollars)">
When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2017-2021 5-Year Estimates.
Household Sizes:
Variables / Data Columns
Good to know
Margin of Error
Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.
Custom data
If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.
Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.
This dataset is a part of the main dataset for Middle Branch township median household income. You can refer the same here
Income of individuals by age group, sex and income source, Canada, provinces and selected census metropolitan areas, annual.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Context
The dataset presents median household incomes for various household sizes in Middle Inlet, Wisconsin, as reported by the U.S. Census Bureau. The dataset highlights the variation in median household income with the size of the family unit, offering valuable insights into economic trends and disparities within different household sizes, aiding in data analysis and decision-making.
Key observations
https://i.neilsberg.com/ch/middle-inlet-wi-median-household-income-by-household-size.jpeg" alt="Middle Inlet, Wisconsin median household income, by household size (in 2022 inflation-adjusted dollars)">
When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2017-2021 5-Year Estimates.
Household Sizes:
Variables / Data Columns
Good to know
Margin of Error
Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.
Custom data
If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.
Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.
This dataset is a part of the main dataset for Middle Inlet town median household income. You can refer the same here
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
By the middle of the 1990s, Indonesia had enjoyed over three decades of remarkable social, economic, and demographic change and was on the cusp of joining the middle-income countries. Per capita income had risen more than fifteenfold since the early 1960s, from around US$50 to more than US$800. Increases in educational attainment and decreases in fertility and infant mortality over the same period reflected impressive investments in infrastructure. In the late 1990s the economic outlook began to change as Indonesia was gripped by the economic crisis that affected much of Asia. In 1998 the rupiah collapsed, the economy went into a tailspin, and gross domestic product contracted by an estimated 12-15%-a decline rivaling the magnitude of the Great Depression. The general trend of several decades of economic progress followed by a few years of economic downturn masks considerable variation across the archipelago in the degree both of economic development and of economic setbacks related to the crisis. In part this heterogeneity reflects the great cultural and ethnic diversity of Indonesia, which in turn makes it a rich laboratory for research on a number of individual- and household-level behaviors and outcomes that interest social scientists. The Indonesia Family Life Survey is designed to provide data for studying behaviors and outcomes. The survey contains a wealth of information collected at the individual and household levels, including multiple indicators of economic and non-economic well-being: consumption, income, assets, education, migration, labor market outcomes, marriage, fertility, contraceptive use, health status, use of health care and health insurance, relationships among co-resident and non- resident family members, processes underlying household decision-making, transfers among family members and participation in community activities. In addition to individual- and household-level information, the IFLS provides detailed information from the communities in which IFLS households are located and from the facilities that serve residents of those communities. These data cover aspects of the physical and social environment, infrastructure, employment opportunities, food prices, access to health and educational facilities, and the quality and prices of services available at those facilities. By linking data from IFLS households to data from their communities, users can address many important questions regarding the impact of policies on the lives of the respondents, as well as document the effects of social, economic, and environmental change on the population. The Indonesia Family Life Survey complements and extends the existing survey data available for Indonesia, and for developing countries in general, in a number of ways. First, relatively few large-scale longitudinal surveys are available for developing countries. IFLS is the only large-scale longitudinal survey available for Indonesia. Because data are available for the same individuals from multiple points in time, IFLS affords an opportunity to understand the dynamics of behavior, at the individual, household and family and community levels. In IFLS1 7,224 households were interviewed, and detailed individual-level data were collected from over 22,000 individuals. In IFLS2, 94.4% of IFLS1 households were re-contacted (interviewed or died). In IFLS3 the re-contact rate was 95.3% of IFLS1 households. Indeed nearly 91% of IFLS1 households are complete panel households in that they were interviewed in all three waves, IFLS1, 2 and 3. These re-contact rates are as high as or higher than most longitudinal surveys in the United States and Europe. High re-interview rates were obtained in part because we were committed to tracking and interviewing individuals who had moved or split off from the origin IFLS1 households. High re-interview rates contribute significantly to data quality in a longitudinal survey because they lessen the risk of bias due to nonrandom attrition in studies using the data. Second, the multipurpose nature of IFLS instruments means that the data support analyses of interrelated issues not possible with single-purpose surveys. For example, the availability of data on household consumption together with detailed individual data on labor market outcomes, health outcomes and on health program availability and quality at the community level means that one can examine the impact of income on health outcomes, but also whether health in turn affects incomes. Third, IFLS collected both current and retrospective information on most topics. With data from multiple points of time on current status and an extensive array of retrospective information about the lives of respondents, analysts can relate dynamics to events that occurred in the past. For example, changes in labor outcomes in recent years can be explored as a function of earlier decisions about schooling and work. Fourth, IFLS collected extensive measures of health status, including self-reported measures of general health status, morbidity experience, and physical assessments conducted by a nurse (height, weight, head circumference, blood pressure, pulse, waist and hip circumference, hemoglobin level, lung capacity, and time required to repeatedly rise from a sitting position). These data provide a much richer picture of health status than is typically available in household surveys. For example, the data can be used to explore relationships between socioeconomic status and an array of health outcomes. Fifth, in all waves of the survey, detailed data were collected about respondents¹ communities and public and private facilities available for their health care and schooling. The facility data can be combined with household and individual data to examine the relationship between, for example, access to health services (or changes in access) and various aspects of health care use and health status. Sixth, because the waves of IFLS span the period from several years before the economic crisis hit Indonesia, to just prior to it hitting, to one year and then three years after, extensive research can be carried out regarding the living conditions of Indonesian households during this very tumultuous period. In sum, the breadth and depth of the longitudinal information on individuals, households, communities, and facilities make IFLS data a unique resource for scholars and policymakers interested in the processes of economic development.
Families of tax filers; Single-earner and dual-earner census families by number of children (final T1 Family File; T1FF).
https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement
Welcome to the Middle Eastern Human Face with Occlusion Dataset, meticulously curated to enhance face recognition models and support the development of advanced occlusion detection systems, biometric identification systems, KYC models, and other facial recognition technologies.
This dataset comprises over 3,000 human facial images, divided into participant-wise sets with each set including:
The dataset includes contributions from a diverse network of individuals across Middle Eastern countries:
To ensure high utility and robustness, all images are captured under varying conditions:
Each facial image set is accompanied by detailed metadata for each participant, including:
This metadata is essential for training models that can accurately recognize and identify human faces with occlusions across different demographics and conditions.
This facial image dataset is ideal for various applications in the field of computer vision, including but not limited to:
We understand the evolving nature of AI and machine
This table presents income shares, thresholds, tax shares, and total counts of individual Canadian tax filers, with a focus on high income individuals (95% income threshold, 99% threshold, etc.). Income thresholds are based on national threshold values, regardless of selected geography; for example, the number of Nova Scotians in the top 1% will be calculated as the number of taxfiling Nova Scotians whose total income exceeded the 99% national income threshold. Different definitions of income are available in the table namely market, total, and after-tax income, both with and without capital gains.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Context
The dataset presents median household incomes for various household sizes in Middle Taylor Township, Pennsylvania, as reported by the U.S. Census Bureau. The dataset highlights the variation in median household income with the size of the family unit, offering valuable insights into economic trends and disparities within different household sizes, aiding in data analysis and decision-making.
Key observations
https://i.neilsberg.com/ch/middle-taylor-township-pa-median-household-income-by-household-size.jpeg" alt="Middle Taylor Township, Pennsylvania median household income, by household size (in 2022 inflation-adjusted dollars)">
When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2017-2021 5-Year Estimates.
Household Sizes:
Variables / Data Columns
Good to know
Margin of Error
Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.
Custom data
If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.
Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.
This dataset is a part of the main dataset for Middle Taylor township median household income. You can refer the same here
The dataset is a relational dataset of 8,000 households households, representing a sample of the population of an imaginary middle-income country. The dataset contains two data files: one with variables at the household level, the other one with variables at the individual level. It includes variables that are typically collected in population censuses (demography, education, occupation, dwelling characteristics, fertility, mortality, and migration) and in household surveys (household expenditure, anthropometric data for children, assets ownership). The data only includes ordinary households (no community households). The dataset was created using REaLTabFormer, a model that leverages deep learning methods. The dataset was created for the purpose of training and simulation and is not intended to be representative of any specific country.
The full-population dataset (with about 10 million individuals) is also distributed as open data.
The dataset is a synthetic dataset for an imaginary country. It was created to represent the population of this country by province (equivalent to admin1) and by urban/rural areas of residence.
Household, Individual
The dataset is a fully-synthetic dataset representative of the resident population of ordinary households for an imaginary middle-income country.
ssd
The sample size was set to 8,000 households. The fixed number of households to be selected from each enumeration area was set to 25. In a first stage, the number of enumeration areas to be selected in each stratum was calculated, proportional to the size of each stratum (stratification by geo_1 and urban/rural). Then 25 households were randomly selected within each enumeration area. The R script used to draw the sample is provided as an external resource.
other
The dataset is a synthetic dataset. Although the variables it contains are variables typically collected from sample surveys or population censuses, no questionnaire is available for this dataset. A "fake" questionnaire was however created for the sample dataset extracted from this dataset, to be used as training material.
The synthetic data generation process included a set of "validators" (consistency checks, based on which synthetic observation were assessed and rejected/replaced when needed). Also, some post-processing was applied to the data to result in the distributed data files.
This is a synthetic dataset; the "response rate" is 100%.
U.S. Government Workshttps://www.usa.gov/government-works
License information was derived automatically
California State Income Limits reflect updated median income and household income levels for acutely low-, extremely low-, very low-, low- and moderate-income households for California’s 58 counties (required by Health and Safety Code Section 50093). These income limits apply to State and local affordable housing programs statutorily linked to HUD income limits and differ from income limits applicable to other specific federal, State, or local programs.
For detailed information, visit the Tucson Equity Priority Index StoryMap.Download the Data DictionaryWhat is the Tucson Equity Priority Index (TEPI)?The Tucson Equity Priority Index (TEPI) is a tool that describes the distribution of socially vulnerable demographics. It categorizes the dataset into 5 classes that represent the differing prioritization needs based on the presence of social vulnerability: Low (0-20), Low-Moderate (20-40), Moderate (40-60), Moderate-High (60-80) High (80-100). Each class represents 20% of the dataset’s features in order of their values. The features within the Low (0-20) classification represent the areas that, when compared to all other locations in the study area, have the lowest need for prioritization, as they tend to have less socially vulnerable demographics. The features that fall into the High (80-100) classification represent the 20% of locations in the dataset that have the greatest need for prioritization, as they tend to have the highest proportions of socially vulnerable demographics. How is social vulnerability measured?The Tucson Equity Priority Index (TEPI) examines the proportion of vulnerability per feature using 11 demographic indicators:Income Below Poverty: Households with income at or below the federal poverty level (FPL), which in 2023 was $14,500 for an individual and $30,000 for a family of fourUnemployment: Measured as the percentage of unemployed persons in the civilian labor forceHousing Cost Burdened: Homeowners who spend more than 30% of their income on housing expenses, including mortgage, maintenance, and taxesRenter Cost Burdened: Renters who spend more than 30% of their income on rentNo Health Insurance: Those without private health insurance, Medicare, Medicaid, or any other plan or programNo Vehicle Access: Households without automobile, van, or truck accessHigh School Education or Less: Those highest level of educational attainment is a High School diploma, equivalency, or lessLimited English Ability: Those whose ability to speak English is "Less Than Well."People of Color: Those who identify as anything other than Non-Hispanic White Disability: Households with one or more physical or cognitive disabilities Age: Groups that tend to have higher levels of vulnerability, including children (those below 18), and seniors (those 65 and older)An overall percentile value is calculated for each feature based on the total proportion of the above indicators in each area. How are the variables combined?These indicators are divided into two main categories that we call Thematic Indices: Economic and Personal Characteristics. The two thematic indices are further divided into five sub-indices called Tier-2 Sub-Indices. Each Tier-2 Sub-Index contains 2-3 indicators. Indicators are the datasets used to measure vulnerability within each sub-index. The variables for each feature are re-scaled using the percentile normalization method, which converts them to the same scale using values between 0 to 100. The variables are then combined first into each of the five Tier-2 Sub-Indices, then the Thematic Indices, then the overall TEPI using the mean aggregation method and equal weighting. The resulting dataset is then divided into the five classes, where:High Vulnerability (80-100%): Representing the top classification, this category includes the highest 20% of regions that are the most socially vulnerable. These areas require the most focused attention. Moderate-High Vulnerability (60-80%): This upper-middle classification includes areas with higher levels of vulnerability compared to the median. While not the highest, these areas are more vulnerable than a majority of the dataset and should be considered for targeted interventions. Moderate Vulnerability (40-60%): Representing the middle or median quintile, this category includes areas of average vulnerability. These areas may show a balanced mix of high and low vulnerability. Detailed examination of specific indicators is recommended to understand the nuanced needs of these areas. Low-Moderate Vulnerability (20-40%): Falling into the lower-middle classification, this range includes areas that are less vulnerable than most but may still exhibit certain vulnerable characteristics. These areas typically have a mix of lower and higher indicators, with the lower values predominating. Low Vulnerability (0-20%): This category represents the bottom classification, encompassing the lowest 20% of data points. Areas in this range are the least vulnerable, making them the most resilient compared to all other features in the dataset.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
United States US: Poverty Headcount Ratio at $1.90 a Day: 2011 PPP: % of Population data was reported at 1.200 % in 2016. This records an increase from the previous number of 1.000 % for 2013. United States US: Poverty Headcount Ratio at $1.90 a Day: 2011 PPP: % of Population data is updated yearly, averaging 0.700 % from Dec 1979 (Median) to 2016, with 11 observations. The data reached an all-time high of 1.200 % in 2016 and a record low of 0.500 % in 1991. United States US: Poverty Headcount Ratio at $1.90 a Day: 2011 PPP: % of Population data remains active status in CEIC and is reported by World Bank. The data is categorized under Global Database’s United States – Table US.World Bank.WDI: Poverty. Poverty headcount ratio at $1.90 a day is the percentage of the population living on less than $1.90 a day at 2011 international prices. As a result of revisions in PPP exchange rates, poverty rates for individual countries cannot be compared with poverty rates reported in earlier editions.; ; World Bank, Development Research Group. Data are based on primary household survey data obtained from government statistical agencies and World Bank country departments. Data for high-income economies are from the Luxembourg Income Study database. For more information and methodology, please see PovcalNet (http://iresearch.worldbank.org/PovcalNet/index.htm).; ; The World Bank’s internationally comparable poverty monitoring database now draws on income or detailed consumption data from more than one thousand six hundred household surveys across 164 countries in six regions and 25 other high income countries (industrialized economies). While income distribution data are published for all countries with data available, poverty data are published for low- and middle-income countries and countries eligible to receive loans from the World Bank (such as Chile) and recently graduated countries (such as Estonia) only. The aggregated numbers for low- and middle-income countries correspond to the totals of 6 regions in PovcalNet, which include low- and middle-income countries and countries eligible to receive loans from the World Bank (such as Chile) and recently graduated countries (such as Estonia). See PovcalNet (http://iresearch.worldbank.org/PovcalNet/WhatIsNew.aspx) for definitions of geographical regions and industrialized countries.
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
The dataset is a file of the raw interview scripts with my interviewees during the fieldwork conducted between 2021.6 to 2022.2.
This thesis investigates how urban middle-class working women with two children make sense of work, childcare, and self under the universal two-child policy of China. This thesis also explores how the idea of individual and family interact in these women's construction of a sense of self. On January 1st, 2016, the one-child policy was replaced by the universal two-child policy, under which all married couples in China are allowed to have two children. In the scholarships of motherhood, it is widely documented across cultures that it is a site of patriarchal oppression where women are expected to meet the unrealistic ideal of intensive mothering to be a good mother, suffer from the motherhood wage penalty and face more work-family conflict than fathers. Emprical studies of China also came to similar conclusions and such findings are not only widely regonized in scholarship but is also widespread in popular discourse in China. Despite that marriage and having children is still universal for the generation of the research target, women born in the 1970s and 1980s, due to compounding influence fo the one-child policy, increasing financial burden of raising a child etcs, having only one child has become widely acceptable and normal. Given this context, this study intend to investigate how these middle-class women, who are relatively empowered and resourceful, come to a decision that is seemingly against their own interest. Moreover, unlike in the west where the issue of childbearing and childcaring is mainly an issue of the conjugal couple and the gender realtions is at the center of the discussion, in China, extended family, especially grandparents also play a role in both the decision making process and the subsequent childcare arrangement. Therefore, to study the second-time mothers’ childcare and work experiences in contemporary urban China, we also need to situate them, as individuals, in their family. To investigate how they make sense of childcare and work is also to understand the tension between individual and family. By interviewing twenty-one parents from middle-class family in Guangzhou with a second child under six years old, this study finds that these urban working women with two children consider themselves as an individual unit and full-time paid employment is something that cannot be given up since it is the means of securing that independent self . However, they did not prioritize their personal interest to that of other family members, especially the elder child and thus the decision of having a second child is mainly for the sake of the elder child. Moreover, grandparents played an essential role to provide a childcare safety net, without which, these urban working women would not be able to work full-time and maintain the independent self as they defined it. The portrayal of these women’s experiences reflected the individualization process in China where people are indivdualized without individualism, and family are evoked as strategy to achieve personal as well as family goals. The findings of this study contributs to theories of motherhood by adding an intergenerational perspective to the existing gender perspective and also contributes to the studies of family by understanding the relation and interaction between individual and family in thse women’s construction of sense of self in the context of contemporary China.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Context
The dataset presents median household incomes for various household sizes in Middle Paxton Township, Pennsylvania, as reported by the U.S. Census Bureau. The dataset highlights the variation in median household income with the size of the family unit, offering valuable insights into economic trends and disparities within different household sizes, aiding in data analysis and decision-making.
Key observations
When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates.
Household Sizes:
Variables / Data Columns
Good to know
Margin of Error
Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.
Custom data
If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.
Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.
This dataset is a part of the main dataset for Middle Paxton township median household income. You can refer the same here
Dataset Title: A Gold Standard Corpus for Activity Information (GoSCAI)
Dataset Curators: The Epidemiology & Biostatistics Section of the NIH Clinical Center Rehabilitation Medicine Department
Dataset Version: 1.0 (May 16, 2025)
Dataset Citation and DOI: NIH CC RMD Epidemiology & Biostatistics Section. (2025). A Gold Standard Corpus for Activity Information (GoSCAI) [Data set]. Zenodo. doi: 10.5281/zenodo.15528545
This data statement is for a gold standard corpus of de-identified clinical notes that have been annotated for human functioning information based on the framework of the WHO's International Classification of Functioning, Disability and Health (ICF). The corpus includes 484 notes from a single institution within the United States written in English in a clinical setting. This dataset was curated for the purpose of training natural language processing models to automatically identify, extract, and classify information on human functioning at the whole-person, or activity, level.
This dataset is curated to be a publicly available resource for the development and evaluation of methods for the automatic extraction and classification of activity-level functioning information as defined in the ICF. The goals of data curation are to 1) create a corpus of a size that can be manually deidentified and annotated, 2) maximize the density and diversity of functioning information of interest, and 3) allow public dissemination of the data.
Language Region: en-US
Prose Description: English as written by native and bilingual English speakers in a clinical setting
The language users represented in this dataset are medical and clinical professionals who work in a research hospital setting. These individuals hold professional degrees corresponding to their respective specialties. Specific demographic characteristics of the language users such as age, gender, or race/ethnicity were not collected.
The annotator group consisted of five people, 33 to 76 years old, including four females and one male. Socioeconomically, they came from the middle and upper-middle income classes. Regarding first language, three annotators had English as their first language, one had Chinese, and one had Spanish. Proficiency in English, the language of the data being annotated, was native for three of the annotators and bilingual for the other two. The annotation team included clinical rehabilitation domain experts with backgrounds in occupational therapy, physical therapy, and individuals with public health and data science expertise. Prior to annotation, all annotators were trained on the specific annotation process using established guidelines for the given domain, and annotators were required to achieve a specified proficiency level prior to annotating notes in this corpus.
The notes in the dataset were written as part of clinical care within a U.S. research hospital between May 2008 and November 2019. These notes were written by health professionals asynchronously following the patient encounter to document the interaction and support continuity of care. The intended audience of these notes were clinicians involved in the patients' care. The included notes come from nine disciplines - neuropsychology, occupational therapy, physical medicine (physiatry), physical therapy, psychiatry, recreational therapy, social work, speech language pathology, and vocational rehabilitation. The notes were curated to support research on natural language processing for functioning information between 2018 and 2024.
The final corpus was derived from a set of clinical notes extracted from the hospital electronic medical record (EMR) for the purpose of clinical research. The original data include character-based digital content originally. We work in ASCII 8 or UNICODE encoding, and therefore part of our pre-processing includes running encoding detection and transformation from encodings such as Windows-1252 or ISO-8859 format to our preferred format.
On the larger corpus, we applied sampling to match our curation rationale. Given the resource constraints of manual annotation, we set out to create a dataset of 500 clinical notes, which would exclude notes over 10,000 characters in length.
To promote density and diversity, we used five note characteristics as sampling criteria. We used the text length as expressed in number of characters. Next, we considered the discipline group as derived from note type metadata and describes which discipline a note originated from: occupational and vocational therapy (OT/VOC), physical therapy (PT), recreation therapy (RT), speech and language pathology (SLP), social work (SW), or miscellaneous (MISC, including psychiatry, neurology and physiatry). These disciplines were selected for collecting the larger corpus because their notes are likely to include functioning information. Existing information extraction tools were used to obtain annotation counts in four areas of functioning and provided a note’s annotation count, annotation density (annotation count divided by text length), and domain count (number of domains with at least 1 annotation).
We used stratified sampling across the 6 discipline groups to ensure discipline diversity in the corpus. Because of low availability, 50 notes were sampled from SLP with relaxed criteria, and 90 notes each from the 5 other discipline groups with stricter criteria. Sampled SLP notes were those with the highest annotation density that had an annotation count of at least 5 and a domain count of at least 2. Other notes were sampled by highest annotation count and lowest text length, with a minimum annotation count of 15 and minimum domain count of 3.
The notes in the resulting sample included certain types of PHI and PII. To prepare for public dissemination, all sensitive or potentially identifying information was manually annotated in the notes and replaced with substituted content to ensure readability and enough context needed for machine learning without exposing any sensitive information. This de-identification effort was manually reviewed to ensure no PII or PHI exposure and correct any resulting readability issues. Notes about pediatric patients were excluded. No intent was made to sample multiple notes from the same patient. No metadata is provided to group notes other than by note type, discipline, or discipline group. The dataset is not organized beyond the provided metadata, but publications about models trained on this dataset should include information on the train/test splits used.
All notes were sentence-segmented and tokenized using the spaCy en_core_web_lg model with additional rules for sentence segmentation customized to the dataset. Notes are stored in an XML format readable by the GATE annotation software (https://gate.ac.uk/family/developer.html), which stores annotations separately in annotation sets.
As the clinical notes were extracted directly from the EMR in text format, the capture quality was determined to be high. The clinical notes did not have to be converted from other data formats, which means this dataset is free from noise introduced by conversion processes such as optical character recognition.
Because of the effort required to manually deidentify and annotate notes, this corpus is limited in terms of size and representation. The curation decisions skewed note selection towards specific disciplines and note types to increase the likelihood of encountering information on functioning. Some subtypes of functioning occur infrequently in the data, or not at all. The deidentification of notes was done in a manner to preserve natural language as it would occur in the notes, but some information is lost, e.g. on rare diseases.
Information on the manual annotation process is provided in the annotation guidelines for each of the four domains:
- Communication & Cognition (https://zenodo.org/records/13910167)
- Mobility (https://zenodo.org/records/11074838)
- Self-Care & Domestic Life (SCDL) (https://zenodo.org/records/11210183)
- Interpersonal Interactions & Relationships (IPIR) (https://zenodo.org/records/13774684)
Inter-annotator agreement was established on development datasets described in the annotation guidelines prior to the annotation of this gold standard corpus.
The gold standard corpus consists of 484 documents, which include 35,147 sentences in total. The distribution of annotated information is provided in the table below.
Domain |
Number of Annotated Sentences |
% of All Sentences |
Mean Number of Annotated Sentences per Document |
Communication & Cognition |
6033 |
17.2% |
U.S. Government Workshttps://www.usa.gov/government-works
License information was derived automatically
This dataset measures food availability and access for 76 low- and middle-income countries. The dataset includes annual country-level data on area, yield, production, nonfood use, trade, and consumption for grains and root and tuber crops (combined as R&T in the documentation tables), food aid, total value of imports and exports, gross domestic product, and population compiled from a variety of sources. This dataset is the basis for the International Food Security Assessment 2015-2025 released in June 2015. This annual ERS report projects food availability and access for 76 low- and middle-income countries over a 10-year period. Countries (Spatial Description, continued): Democratic Republic of the Congo, Ecuador, Egypt, El Salvador, Eritrea, Ethiopia, Gambia, Georgia, Ghana, Guatemala, Guinea, Guinea-Bissau, Haiti, Honduras, India, Indonesia, Jamaica, Kenya, Kyrgyzstan, Laos, Lesotho, Liberia, Madagascar, Malawi, Mali, Mauritania, Moldova, Mongolia, Morocco, Mozambique, Namibia, Nepal, Nicaragua, Niger, Nigeria, North Korea, Pakistan, Peru, Philippines, Rwanda, Senegal, Sierra Leone, Somalia, Sri Lanka, Sudan, Swaziland, Tajikistan, Tanzania, Togo, Tunisia, Turkmenistan, Uganda, Uzbekistan, Vietnam, Yemen, Zambia, and Zimbabwe. Resources in this dataset:Resource Title: CSV File for all years and all countries. File Name: gfa25.csvResource Title: International Food Security country data. File Name: GrainDemandProduction.xlsxResource Description: Excel files of individual country data. Please note that these files provide the data in a different layout from the CSV file. This version of the data files was updated 9-2-2021
More up-to-date files may be found at: https://www.ers.usda.gov/data-products/international-food-security.aspx
This dataset was created in order to facilitate transboundary conservation work and research projects, by integrating land cover maps into a single dataset from Cape Caution, BC, to Yakutat Bay, AK. It includes three levels of land classification, site index, elevation, hydric soils (yes/no), karst (yes/no), primary and secondary species, size class, and volume class. It also includes a number of other important attributes from individual datasets, which were not crosswalked between the different areas. This file represents karst formations in the study area.
https://dataful.in/terms-and-conditionshttps://dataful.in/terms-and-conditions
Remittances are transfers of money by a person working in a foreign location to a person or family back home as household income. As per IMF, Remittances are typically transfers from a well-meaning individual or family member to another individual or household. They are targeted to meet specific needs of the recipients, and this tends to reduce poverty. This dataset contains year and country-wise remittance inflows. It also has data related to Low and Middle income countries
Note: 1) All numbers are in current (nominal) US Dollars. 2) Venezuela has been unclassfied due to the unvailability data, thus it is not included in aggregate sum
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Context
The dataset presents median household incomes for various household sizes in Middle Point, OH, as reported by the U.S. Census Bureau. The dataset highlights the variation in median household income with the size of the family unit, offering valuable insights into economic trends and disparities within different household sizes, aiding in data analysis and decision-making.
Key observations
https://i.neilsberg.com/ch/middle-point-oh-median-household-income-by-household-size.jpeg" alt="Middle Point, OH median household income, by household size (in 2022 inflation-adjusted dollars)">
When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2017-2021 5-Year Estimates.
Household Sizes:
Variables / Data Columns
Good to know
Margin of Error
Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.
Custom data
If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.
Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.
This dataset is a part of the main dataset for Middle Point median household income. You can refer the same here