https://www.icpsr.umich.edu/web/ICPSR/studies/33321/termshttps://www.icpsr.umich.edu/web/ICPSR/studies/33321/terms
The University of Washington - Beyond High School (UW-BHS) project surveyed students in Washington State to examine factors impacting educational attainment and the transition to adulthood among high school seniors. The project began in 1999 in an effort to assess the impact of I-200 (the referendum that ended Affirmative Action) on minority enrollment in higher education in Washington. The research objectives of the project were: (1) to describe and explain differences in the transition from high school to college by race and ethnicity, socioeconomic origins, and other characteristics, (2) to evaluate the impact of the Washington State Achievers Program, and (3) to explore the implications of multiple race and ethnic identities. Following a successful pilot survey in the spring of 2000, the project eventually included baseline and one-year follow-up surveys (conducted in 2002, 2003, 2004, and 2005) of almost 10,000 high school seniors in five cohorts across several Washington school districts. The high school senior surveys included questions that explored students' educational aspirations and future career plans, as well as questions on family background, home life, perceptions of school and home environments, self-esteem, and participation in school related and non-school related activities. To supplement the 2000, 2002, and 2003 student surveys, parents of high school seniors were also queried to determine their expectations and aspirations for their child's education, as well as their own educational backgrounds and fields of employment. Parents were also asked to report any financial measures undertaken to prepare for their child's continued education, and whether the household received any form of financial assistance. In 2010, a ten-year follow-up with the 2000 senior cohort was conducted to assess educational, career, and familial outcomes. The ten year follow-up surveys collected information on educational attainment, early employment experiences, family and partnership, civic engagement, and health status. The baseline, parent, and follow-up surveys also collected detailed demographic information, including age, sex, ethnicity, language, religion, education level, employment, income, marital status, and parental status.
This study was conducted under the auspices of the Center for Studies in Demography and Ecology at the University of Washington. It is a nationally representative sample of the population of the United States in 1900, drawn from the manuscript returns of individuals enumerated in the 1900 United States Census. Household variables include region, state and county of household, size of household, and type and ownership of dwelling. Individual variables for each household member include relationship to head of household, race, sex, age, marital status, number of children, and birthplace. Immigration variables include parents' birthplace, year of immigration and number of years in the United States. Occupation variables include occupation, coded by both the 1900 and 1950 systems, and number of months unemployed. Education variables include number of months in school, whether respondents could read or write a language, and whether they spoke English. (Source: downloaded from ICPSR 7/13/10)
Please Note: This dataset is part of the historical CISER Data Archive Collection and is also available at ICPSR at https://doi.org/10.3886/ICPSR07825.v1. We highly recommend using the ICPSR version as they may make this dataset available in multiple data formats in the future.
This map displays data from the Selected Economic Indicators (DP03) dataset from the 2010 American Community Survey 5-Yr Estimates, U.S. Census Bureau. Data is shown at the level of Census Tract, County, and Small Area (aggregation of Census Tracts developed by the New Mexico Department of Health). Measuring poverty is a topic of much current discussion. See the following links: A Different Way to Measure Poverty - http://www.sanders.senate.gov/imo/media/image/census.jpg"Few topics in American society have more myths and stereotypes surrounding them than poverty, misconceptions that distort both our politics and our domestic policy making."They include the notion that poverty affects a relatively small number of Americans, that the poor are impoverished for years at a time, that most of those in poverty live in inner cities, that too much welfare assistance is provided and that poverty is ultimately a result of not working hard enough. Although pervasive, each assumption is flat-out wrong." -Mark Rank, Professor of Social Welfare at Washington University: http://opinionator.blogs.nytimes.com/2013/11/02/poverty-in-america-is-mainstream/
The China County-Level Data on Population (Census) and Agriculture, Keyed To 1:1M GIS Map consists of census, agricultural economic, and boundary data for the administrative regions of China for 1990. The census data includes urban and rural residency, age and sex distribution, educational attainment, illiteracy, marital status, childbirth, mortality, immigration (since 1985), industrial/economic activity, occupation, and ethnicity. The agricultural economic data encompasses rural population, labor force, forestry, livestock and fishery, commodities, equipment, utilities, irrigation, and output value. The boundary data are at a scale of one to one million (1:1M) at the county level. This data set is produced in collaboration with the University of Washington as part of the China in Time and Space (CITAS) project, University of California-Davis China in Time and Space (CITAS) project, and the Center for International Earth Science Information Network (CIESIN).
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
The influence of genetic drift on population dynamics during Pleistocene glacial cycles is well understood, but the role of selection in shaping patterns of genomic variation during these events is less explored. We used resequenced whole genomes to investigate how demography and natural selection interact to generate the genomic landscapes of Downy and Hairy Woodpeckers, species co-distributed in previously glaciated North America. First, we explored the spatial and temporal patterns of genomic diversity produced by neutral evolution. Next, we tested (1) whether levels of nucleotide diversity along the genome are correlated with intrinsic genomic properties, such as recombination rate and gene density, and (2) whether different demographic trajectories impacted the efficacy of selection. Our results revealed cycles of bottleneck and expansion and genetic structure associated with glacial refugia. Nucleotide diversity varied widely along the genome, but this variation was highly correlated between the species, suggesting the presence of conserved genomic features. In both taxa, nucleotide diversity was positively correlated with recombination rate and negatively correlated with gene density, suggesting that linked selection played a role in reducing diversity. Despite strong fluctuations in effective population size, the maintenance of relatively large populations during glaciations may have facilitated selection. Under these conditions, we found evidence that the individual demographic trajectory of populations modulated linked selection, with purifying selection being more efficient in removing deleterious alleles in large populations. These results highlight that while genome-wide variation reflects the expected signature of demographic change during climatic perturbations, the interaction of multiple processes produces a predictable and highly heterogeneous genomic landscape. Methods Whole genome sequencing of 140 samples of Downy (Dryobates pubescens) and Hairy (D. villosus) Woodpecker from seven (n per population = 10) North American populations.
To examine the cognitive processes of remembering and imagining and their traces in language, we introduce Hippocorpus, a dataset of 6,854 English diary-like short stories about recalled and imagined events. Using a crowdsourcing framework, we first collect recalled stories and summaries from workers, then provide these summaries to other workers who write imagined stories. Finally, months later, we collect a retold version of the recalled stories from a subset of recalled authors. Our dataset comes paired with author demographics (age, gender, race), their openness to experience, as well as some variables regarding the author's relationship to the event (e.g., how personal the event is, how often they tell its story, etc.).
https://www.icpsr.umich.edu/web/ICPSR/studies/28701/termshttps://www.icpsr.umich.edu/web/ICPSR/studies/28701/terms
The objective of the Seattle Neighborhoods and Crime Survey (SNCS) was to test multilevel theories of neighborhood social organization and criminal violence. It was funded by the National Science Foundation (SES-0004324), and the National Consortium on Violence Research (SBR-9513040). Using the concept of differential neighborhood organization, the investigators posited that neighborhood crime is a function of informal social control against crime and informal organization in favor of crime. Informal neighborhood control against crime consists of neighborhood attachment, social capital, and collective efficacy. The study tested the hypothesis that individual social ties are explained by a rational choice model, which in turn produces neighborhood social capital that can be used to achieve collective goals. It also tested the hypothesis that neighborhoods rich in social capital had greater collective efficacy, which in turn, helped produce safe neighborhoods. Organization in favor of crime consists of violent codes of the street. The study tested the hypothesis that residents from disadvantaged neighborhoods tend to distrust police and other agents of conventional institutions, and consequently are more likely to participate in street culture, in which violence is a way of obtaining street credibility and status, as well as resolving disputes. The project has also examined dimensions of neighboring, and the causes and consequences of fear of crime. The study used a telephone survey of households within all 123 census tracts in the city of Seattle, WA, conducted in 2002-2003. The sampling frame was designed by investigators at the University of Washington, with three objectives in mind: (a) to gain a random sample of households within each of 123 census tracts; (b) to obtain a disproportionate number of racial and ethnic minorities using an ethnic oversample; and (c) to obtain a replication sample of Terrance Miethe's 1990 victimization survey in 100 Seattle neighborhoods [Testing Theories of Criminality and Victimization in Seattle, 1960-1990]. Specific samples were drawn by Genesys, a sampling firm in Philadelphia, PA, using a constantly-updated compilation of white pages. Telephone interviews were conducted by the Social and Behavioral Research Institute at California State University, San Marcos, using computer-assisted telephone interviewing (CATI) technology. Respondents were asked about household demographics, such as race, gender, residential mobility, age distribution of the household, and income, their perceptions and assessments of their neighborhoods (including safety, disorder, and crime), neighbors, and relations with police. A variety of questions about neighboring were asked, including social capital (intergenerational closure, reciprocated exchange, and participation in neighborhood associations), attachment to their neighborhood, and collective efficacy (child-centered social control). Respondents were asked about routine activities including taking steps to protect their homes, spending time in bars and nightclubs, and leaving their home unattended. Questions about fear of crime included personal fear as well as altruistic fear for other members of the household, and questions about racial attitudes included residential preferences by race composition of the neighborhood. A victimization inventory modeled after the National Crime Victimization Survey was used for burglary, vandalism, stolen property, violence, and robbery. Demographic information includes age, race, sex, education, martial status, household income, whether respondent was a student, employment status, religious affiliation, approximate value of home, monthly rent including utilities, residence history in the last five years, whether respondent was born in the Unites States, and number of people currently living in the respondent's household.
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Objective: Perform a longitudinal analysis of clinical features associated with Neurofibromatosis Type 1 (NF1) based on demographic and clinical characteristics, and to apply a machine learning strategy to determine feasibility of developing exploratory predictive models of optic pathway glioma (OPG) and attention-deficit/hyperactivity disorder (ADHD) in a pediatric NF1 cohort.
Methods: Using NF1 as a model system, we perform retrospective data analyses utilizing a manually-curated NF1 clinical registry and electronic health record (EHR) information, and develop machine-learning models. Data for 798 individuals were available, with 578 comprising the pediatric cohort used for analysis.
Results: Males and females were evenly represented in the cohort. White children were more likely to develop OPG (OR: 2.11, 95%CI: 1.11-4.00, p=0.02) relative to their non-white peers. Median age at diagnosis of OPG was 6.5 years (1.7-17.0), irrespective of sex. Males were more likely than females to have a diagnosis of ADHD (OR: 1.90, 95%CI: 1.33-2.70, p<0.001), and earlier diagnosis in males relative to females was observed. The gradient boosting classification model predicted diagnosis of ADHD with an AUROC of 0.74, and predicted diagnosis of OPG with an AUROC of 0.82.
Conclusions: Using readily available clinical and EHR data, we successfully recapitulated several important and clinically-relevant patterns in NF1 semiology specifically based on demographic and clinical characteristics. Naïve machine learning techniques can be potentially used to develop and validate predictive phenotype complexes applicable to risk stratification and disease management in NF1.
Methods Patients and Data Description
This study was performed using retrospective clinical data extracted from two sources within the Washington University Neurofibromatosis (NF) Center. First, data were extracted from an existing longitudinal clinical registry that was manually curated using clinical data obtained from patients followed in the Washington University NF Clinical Program at St. Louis Children’s Hospital. All individuals included in this database had a clinical diagnosis of NF1 based on current National Institutes of Health Consensus Development Conference diagnostic criteria,9 and had been assessed over multiple visits from 2002 to 2016 for the presence of clinical features associated with NF1. Data points in this registry included demographic information, such as age, race, and sex, in addition to NF1-related clinical features and associated conditions, such as café-au-lait macules, skinfold freckling, cutaneous neurofibromas, Lisch nodules, OPG, hypertension, ADHD, and cognitive impairment. These data were maintained in a semi-structured format containing textual and binary fields, capturing each individual’s data over multiple clinical visits. From these data, clinical features and phenotypes were extracted using data manipulation, imputation, and text mining techniques. Data obtained from this NF1 clinical registry were converted to data tables, which captured each patient visit and the presence/absence of specific clinical features at each visit. Clinical features which were once marked as present were assumed to be present for all future visits, and missing data were assumed absent for that specific visit. Categorical variables are reported as frequencies and proportions, and compared using odds ratios (ORs). Continuously distributed traits, adhering to both conventional normality assumptions and homogeneity of variances, are reported as mean and standard deviations, and compared using analysis of variance methods. Non-parametric equivalents were used for data with non-normative distributions.
Clinical Feature Extraction from Clinical Registry and EHR
The NF1 Clinical Registry comprised string-based clinical feature values, such as ADHD, OPG, and asthma. From these data, we extracted 27 unique clinical features in addition to longitudinal data on the development of NF1-related clinical features and associated diagnoses. For each clinical feature, age at initial presentation and/or diagnosis was computed, and median age of occurrence was calculated for each sex. The exact age of presentation and/or diagnosis could not be definitively ascertained for any feature that was present at a child’s initial clinic visit. As such, we computed the age of diagnosis only for those clinical features for which we have at least one visit documenting feature absence prior to the manifestation of that feature.
Diagnosis codes from the EHR-derived data set were also extracted. Diagnosis codes were recorded as 15,890 unique ICD 9/10 codes. Given the large number of ICD 9/10 codes, a consistent, concept-level “roll up” of relevant codes to a single phenotype description was created by mapping the extracted ICD 9/10 values to phenome-wide association (PheWAS) codes called Phecodes, which have been demonstrated to better align with clinical disease compared to individual ICD codes.
Machine Learning Analyses
Using a combination of clinical features obtained from the NF1 Clinical Registry and EHR-derived data sets, we developed prediction models using a gradient boosting platform for identifying patients with specific NF1-related diagnoses to establish the usefulness of clinical history and documentation of clinical findings in predicting phenotypic variability of NF1. Initial analyses used a state-of-the-art classification algorithm, gradient boosting model, which uses a tree-based algorithm to produce a predictive model from an ensemble of weak predictive models. Gradient boosting model was selected, as it supports identifying importance of features used in the final prediction model. Subsequent analyses employed training each model for three different feature sets: (1) demographic features for all patients, including race, sex, and family history of NF1 [5 features]; (2) clinical features associated with NF1 [27 features] extracted from the NF1 Clinical Registry; and (3) diagnosis codes extracted from the EHR data, which were reduced to 50 Phecodes. Four-fold cross validation was then applied for the three models, and comparisons for the prediction accuracies of each model determined. Positive predictive value (PPV), F1 score and the area under the receiver operator characteristic (AUROC) curve were used as evaluation metrics. Scikit Learn, a machine learning library in Python, was employed to implement all analyses.
Standard Protocol Approvals, Registrations, and Patient Consents
The NF1 Clinical Registry is an existing longitudinal clinical registry that was manually curated using clinical data obtained from patients followed in the Washington University NF Clinical Program at St. Louis Children’s Hospital. All individuals included in this database have a clinical diagnosis of NF1 based on current National Institutes of Health criteria and have provided informed consent for participation in the clinical registry. All data collection, usage and analysis for this study were approved by the Institutional Review Board (IRB) at the Washington University School of Medicine.
Not seeing a result you expected?
Learn how you can add new datasets to our index.
https://www.icpsr.umich.edu/web/ICPSR/studies/33321/termshttps://www.icpsr.umich.edu/web/ICPSR/studies/33321/terms
The University of Washington - Beyond High School (UW-BHS) project surveyed students in Washington State to examine factors impacting educational attainment and the transition to adulthood among high school seniors. The project began in 1999 in an effort to assess the impact of I-200 (the referendum that ended Affirmative Action) on minority enrollment in higher education in Washington. The research objectives of the project were: (1) to describe and explain differences in the transition from high school to college by race and ethnicity, socioeconomic origins, and other characteristics, (2) to evaluate the impact of the Washington State Achievers Program, and (3) to explore the implications of multiple race and ethnic identities. Following a successful pilot survey in the spring of 2000, the project eventually included baseline and one-year follow-up surveys (conducted in 2002, 2003, 2004, and 2005) of almost 10,000 high school seniors in five cohorts across several Washington school districts. The high school senior surveys included questions that explored students' educational aspirations and future career plans, as well as questions on family background, home life, perceptions of school and home environments, self-esteem, and participation in school related and non-school related activities. To supplement the 2000, 2002, and 2003 student surveys, parents of high school seniors were also queried to determine their expectations and aspirations for their child's education, as well as their own educational backgrounds and fields of employment. Parents were also asked to report any financial measures undertaken to prepare for their child's continued education, and whether the household received any form of financial assistance. In 2010, a ten-year follow-up with the 2000 senior cohort was conducted to assess educational, career, and familial outcomes. The ten year follow-up surveys collected information on educational attainment, early employment experiences, family and partnership, civic engagement, and health status. The baseline, parent, and follow-up surveys also collected detailed demographic information, including age, sex, ethnicity, language, religion, education level, employment, income, marital status, and parental status.