Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
A shift in scientific publishing from paper-based to knowledge-based practices promotes reproducibility, machine actionability and knowledge discovery. This is important for disciplines like social science, as study indicators are often social constructs such as race or education; hypothesis tests are challenging to compare in demographic research due to their limited temporal and spatial coverage; and natural language in research papers is often imprecise and ambiguous. Therefore, we present the MIRA-KG, consisting of: (1) an ontology for capturing social demography research, which links hypotheses and findings to evidence, (2) annotations of papers on health inequality in terms of the ontology, gathered by (i) prompting a Large Language Model to annotate paper abstracts using the ontology, (ii) mapping concepts to terms from NCBO BioPortal ontologies and GeoNames, and (iii) refining the final graph by a set of SHACL constraints, developed according to data quality criteria. The utility of the resource lies in its use for formally representing social demography research hypotheses, discovering research biases, discovery of knowledge, and the derivation of novel questions.
This dataset was generated using the code available on Github at https://w3id.org/mira/ at version v1.0. It uses the following ontology: https://w3id.org/mira/ontology/. A dump of the requirement stories and other resources used to generate the resource can be found on the drive: https://drive.google.com/drive/folders/1QKAOVV0TXfF4vYQ7b5dkHkXQjBqnh75W?usp=sharing.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This project includes the data used for the article ‘Demography: Fast and Slow’, accepted for publication in Population and Development Review. See the paper as well as the Appendix: Supplemental Materials for details.Here is a brief description of the files containing the estimates of PTR and MST used for the paper. The data (in .csv format) can be downloaded from figshare (doi:10.6084/m9.figshare.16751869)..· The file Data_labels.csv contains the country labels used in Figures 2 and 3 in the main paper. · The file Data_paper.csv contains birth, death, immigration and emigration rates, and the derived estimates of country-level PTR and MST used in the paper (five-year intervals between 1990-95 and 2015-20), using Abel-Cohen estimates based on the “Demographic Account Pseudo Bayesian Closed” method. · The file Data_robust.csv is equivalent to Data_paper.csv but it is based on Abel-Cohen “Demographic Account Minimisation Closed” estimates of migratory flows. · The files Data_Italy.csv and Data_Germany.csv contain respectively the data on birth, death, immigration and emigration rates, and the estimates of annual PTR and MST for Italy and Germany.
Annual Resident Population Estimates by Age Group, Sex, Race, and Hispanic Origin: April 1, 2010 to July 1, 2018 // Source: U.S. Census Bureau, Population Division // The contents of this file are released on a rolling basis from December through June. // Note: 'In combination' means in combination with one or more other races. The sum of the five race-in-combination groups adds to more than the total population because individuals may report more than one race. Hispanic origin is considered an ethnicity, not a race. Hispanics may be of any race. Responses of 'Some Other Race' from the 2010 Census are modified. This results in differences between the population for specific race categories shown for the 2010 Census population in this file versus those in the original 2010 Census data. For more information, see https://www2.census.gov/programs-surveys/popest/technical-documentation/methodology/modified-race-summary-file-method/mrsf2010.pdf. // The estimates are based on the 2010 Census and reflect changes to the April 1, 2010 population due to the Count Question Resolution program and geographic program revisions. // For detailed information about the methods used to create the population estimates, see https://www.census.gov/programs-surveys/popest/technical-documentation/methodology.html. // Each year, the Census Bureau's Population Estimates Program (PEP) utilizes current data on births, deaths, and migration to calculate population change since the most recent decennial census, and produces a time series of estimates of population. The annual time series of estimates begins with the most recent decennial census data and extends to the vintage year. The vintage year (e.g., V2017) refers to the final year of the time series. The reference date for all estimates is July 1, unless otherwise specified. With each new issue of estimates, the Census Bureau revises estimates for years back to the last census. As each vintage of estimates includes all years since the most recent decennial census, the latest vintage of data available supersedes all previously produced estimates for those dates. The Population Estimates Program provides additional information including historical and intercensal estimates, evaluation estimates, demographic analysis, and research papers on its website: https://www.census.gov/programs-surveys/popest.html.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This is a repository of global and regional human population data collected from: the databases of scenarios assessed by the Intergovernmental Panel on Climate Change (Sixth Assessment Report, Special Report on 1.5 C; Fifth Assessment Report), multi-national databases of population projections (World Bank, International Database, United Nation population projections), and other very long-term population projections (Resources for the Future).
More specifically, it contains:
in other_pop_data
folder files from World Bank, the International Database from the US Census, and from IHME
in the SSP
folder, the Shared Socioeconomic Pathways, as in the version 2.0 downloaded from IIASA and as in the version 3.0 downloaded from IIASA workspace
in the UN
folder, the demographic projections from UN
IAMstat.xlsx
, an overview file of the metadata accompanying the scenarios present in the IPCC databases
RFF.csv
, an overview file containing the population projections obtained by Resources For the Future
'- the remaining .csv
files with names AR6#
, AR5#
, IAMC15#
contain the IPCC scenarios assessed by the IPCC for preparing the IPCC assessment reports. They can be downloaded from AR5, SR 1.5, and AR6
This data in intended to be downloaded for use together with the package downloadable here.
The dataset was used as a supporting material for the paper "Underestimating demographic uncertainties in the synthesis process of the IPCC" accepted on npj Climate Action (DOI : 10.1038/s44168-024-00152-y).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Datasets, conda environments and Softwares for the course "Population Genomics" of Prof Kasper Munch. This course material is maintained by the health data science sandbox. This webpage shows the latest version of the course material.
The data is connected to the following repository: https://github.com/hds-sandbox/Popgen_course_aarhus. The original course material from Prof Kasper Munch is at https://github.com/kaspermunch/PopulationGenomicsCourse.
Description
The participants will after the course have detailed knowledge of the methods and applications required to perform a typical population genomic study.
The participants must at the end of the course be able to:
The course introduces key concepts in population genomics from generation of population genetic data sets to the most common population genetic analyses and association studies. The first part of the course focuses on generation of population genetic data sets. The second part introduces the most common population genetic analyses and their theoretical background. Here topics include analysis of demography, population structure, recombination and selection. The last part of the course focus on applications of population genetic data sets for association studies in relation to human health.
Curriculum
The curriculum for each week is listed below. "Coop" refers to a set of lecture notes by Graham Coop that we will use throughout the course.
Course plan
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This is an Excel spreadsheet with all the data, calculations and necessary comments for my paper published in Demographic Review in 2017.
According to a study conducted in 2024 using the most recently available data, the average poverty rate in news deserts in the United States (counties without access to or with very limited access to local news) was around five percent higher than the country average, at ** percent. Citizens living in counties without newspapers were also earning a lower median annual income than the general population average, with the figure estimated at less than ****** U.S. dollars compared to more than **** thousand U.S. dollars for the U.S. as a whole.
Pursuant to Local Laws 126, 127, and 128 of 2016, certain demographic data is collected voluntarily and anonymously by persons voluntarily seeking social services. This data can be used by agencies and the public to better understand the demographic makeup of client populations and to better understand and serve residents of all backgrounds and identities. The data presented here has been collected through either electronic form or paper surveys offered at the point of application for services. These surveys are anonymous. Each record represents an anonymized demographic profile of an individual applicant for social services, disaggregated by response option, agency, and program. Response options include information regarding ancestry, race, primary and secondary languages, English proficiency, gender identity, and sexual orientation. Idiosyncrasies or Limitations: Note that while the dataset contains the total number of individuals who have identified their ancestry or languages spoke, because such data is collected anonymously, there may be instances of a single individual completing multiple voluntary surveys. Additionally, the survey being both voluntary and anonymous has advantages as well as disadvantages: it increases the likelihood of full and honest answers, but since it is not connected to the individual case, it does not directly inform delivery of services to the applicant. The paper and online versions of the survey ask the same questions but free-form text is handled differently. Free-form text fields are expected to be entered in English although the form is available in several languages. Surveys are presented in 11 languages. Paper Surveys 1. Are optional 2. Survey taker is expected to specify agency that provides service 2. Survey taker can skip or elect not to answer questions 3. Invalid/unreadable data may be entered for survey date or date may be skipped 4. OCRing of free-form tet fields may fail. 5. Analytical value of free-form text answers is unclear Online Survey 1. Are optional 2. Agency is defaulted based on the URL 3. Some questions must be answered 4. Date of survey is automated
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
The files included contain the development data used in the modeling of the demographic transition. The WDI indicators dataset is publicly available at the World Bank data catalog. We use the 2010 dataset in our analysis. The Barro-Lee dataset provides information on educational attainment. We use average years of schooling for the female population as our education indicator in the paper.
When and where do states coercively alter their internal demography? This paper builds a theory that predicts under what conditions states alter the demographic “facts on the ground” by resettling and expelling ethno-national populations. We predict that, under particular scope conditions, states will employ demographic engineering to shore up control over (i) non-natural frontiers, and (ii) areas populated by ethnic minorities who are coethnics with elites in a hostile power. We then substantiate our predictions using new subnational data from both China and the USSR. Causally identifying the spatially differential effect of international conflict on demographic engineering via a difference-in-differences design, we find that the Sino-Soviet split (1959-1982) led to a disproportionate increase in the expulsion of ethnic Russians and resettlement of ethnic Han in Chinese border areas lacking a natural border with the USSR, and that resettlement was targeted at areas populated by ethnic Russians. On the Soviet side, we similarly find that the Sino-Soviet split led to a significant increase in expulsion of Chinese and the resettlement of Russians in border areas, and that resettlement was targeted at areas populated by more Chinese. This paper thereby develops the nascent field of political demography by advancing our theoretical and empirical understanding of when, where and to whom states would seek to effect demographic change. Moreover, by demonstrating that both ethnic group concentration and dispersion across borders are endogenous to international conflict, our results complicate a large and influential literature linking ethnic demography to conflict.
Annual Resident Population Estimates by Age Group, Sex, Race, and Hispanic Origin: April 1, 2010 to July 1, 2016 // Source: U.S. Census Bureau, Population Division // The contents of this file are released on a rolling basis from December through June. // Note: 'In combination' means in combination with one or more other races. The sum of the five race-in-combination groups adds to more than the total population because individuals may report more than one race. Hispanic origin is considered an ethnicity, not a race. Hispanics may be of any race. Responses of 'Some Other Race' from the 2010 Census are modified. This results in differences between the population for specific race categories shown for the 2010 Census population in this file versus those in the original 2010 Census data. For more information, see https://www2.census.gov/programs-surveys/popest/technical-documentation/methodology/modified-race-summary-file-method/mrsf2010.pdf. // The estimates are based on the 2010 Census and reflect changes to the April 1, 2010 population due to the Count Question Resolution program and geographic program revisions. // For detailed information about the methods used to create the population estimates, see https://www.census.gov/programs-surveys/popest/technical-documentation/methodology.html. // Each year, the Census Bureau's Population Estimates Program (PEP) utilizes current data on births, deaths, and migration to calculate population change since the most recent decennial census, and produces a time series of estimates of population. The annual time series of estimates begins with the most recent decennial census data and extends to the vintage year. The vintage year (e.g., V2015) refers to the final year of the time series. The reference date for all estimates is July 1, unless otherwise specified. With each new issue of estimates, the Census Bureau revises estimates for years back to the last census. As each vintage of estimates includes all years since the most recent decennial census, the latest vintage of data available supersedes all previously produced estimates for those dates. The Population Estimates Program provides additional information including historical and intercensal estimates, evaluation estimates, demographic analysis, and research papers on its website: https://www.census.gov/programs-surveys/popest.html.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
For years, we have relied on population surveys to keep track of regional public health statistics, including the prevalence of non-communicable diseases. Because of the cost and limitations of such surveys, we often do not have the up-to-date data on health outcomes of a region. In this paper, we examined the feasibility of inferring regional health outcomes from socio-demographic data that are widely available and timely updated through national censuses and community surveys. Using data for 50 American states (excluding Washington DC) from 2007 to 2012, we constructed a machine-learning model to predict the prevalence of six non-communicable disease (NCD) outcomes (four NCDs and two major clinical risk factors), based on population socio-demographic characteristics from the American Community Survey. We found that regional prevalence estimates for non-communicable diseases can be reasonably predicted. The predictions were highly correlated with the observed data, in both the states included in the derivation model (median correlation 0.88) and those excluded from the development for use as a completely separated validation sample (median correlation 0.85), demonstrating that the model had sufficient external validity to make good predictions, based on demographics alone, for areas not included in the model development. This highlights both the utility of this sophisticated approach to model development, and the vital importance of simple socio-demographic characteristics as both indicators and determinants of chronic disease.
Males segment of a prairie vole population in IllinoisThe data collection was led by Lowell L. Getz and consisted of monthly captures of prairie voles over the period 1972 - 1997. For more details please refer to http://www.life.illinois.edu/getz/index.html . The data contained in data.zip is the part of the data from the overall dataset that was used in the associated paper. Please refer to the ReadMe for an explanation of all columns and files.Data.zip
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
We used this dataset to assess the strength of isolation due to geographic and macroclimatic distance across island and mainland systems, comparing published measurements of phenotypic traits and neutral genetic diversity for populations of plants and animals worldwide. The dataset includes 112 studies of 108 species (72 animals and 36 plants) in 868 island populations and 760 mainland populations, with population-level taxonomic and biogeographic information, totalling 7438 records. Methods Description of methods used for collection/generation of data: We searched the ISI Web of Science in March 2017 for comparative studies that included data on phenotypic traits and/or neutral genetic diversity of populations on true islands and on mainland sites in any taxonomic group. Search terms were 'island' and ('mainland' or 'continental') and 'population*' and ('demograph*' or 'fitness' or 'survival' or 'growth' or 'reproduc*' or 'density' or 'abundance' or 'size' or 'genetic diversity' or 'genetic structure' or 'population genetics') and ('plant*' or 'tree*' or 'shrub*or 'animal*' or 'bird*' or 'amphibian*' or 'mammal*' or 'reptile*' or 'lizard*' or 'snake*' or 'fish'), subsequently refined to the Web of Science categories 'Ecology' or 'Evolutionary Biology' or 'Zoology' or 'Genetics Heredity' or 'Biodiversity Conservation' or 'Marine Freshwater Biology' or 'Plant Sciences' or 'Geography Physical' or 'Ornithology' or 'Biochemistry Molecular Biology' or 'Multidisciplinary Sciences' or 'Environmental Sciences' or 'Fisheries' or 'Oceanography' or 'Biology' or 'Forestry' or 'Reproductive Biology' or 'Behavioral Sciences'. The search included the whole text including abstract and title, but only abstracts and titles were searchable for older papers depending on the journal. The search returned 1237 papers which were distributed among coauthors for further scrutiny. First paper filter To be useful, the papers must have met the following criteria: Overall study design criteria: Include at least two separate islands and two mainland populations; Eliminate studies comparing populations on several islands where there were no clear mainland vs. island comparisons; Present primary research data (e.g., meta-analyses were discarded); Include a field study (e.g., experimental studies and ex situ populations were discarded); Can include data from sub-populations pooled within an island or within a mainland population (but not between islands or between mainland sites); Island criteria: Island populations situated on separate islands (papers where all information on island populations originated from a single island were discarded); Can include multiple populations recorded on the same island, if there is more than one island in the study; While we accepted the authors' judgement about island vs. mainland status, in 19 papers we made our own judgement based on the relative size of the island or position relative to the mainland (e.g. Honshu Island of Japan, sized 227 960 km² was interpreted as mainland relative to islands less than 91 km²); Include islands surrounded by sea water but not islands in a lake or big river; Include islands regardless of origin (continental shelf, volcanic); Taxonomic criteria: Include any taxonomic group; The paper must compare populations within a single species; Do not include marine species (including coastline organisms); Databases used to check species delimitation: Handbook of Birds of the World (www.hbw.com/); International Plant Names Index (https://www.ipni.org/); Plants of the World Online(https://powo.science.kew.org/); Handbook of the Mammals of the World; Global Biodiversity Information Facility (https://www.gbif.org/); Biogeographic criteria: Include all continents, as well as studies on multiple continents; Do not include papers regarding migratory species; Only include old / historical invasions to islands (>50 yrs); do not include recent invasions; Response criteria: Do not include studies which report community-level responses such as species richness; Include genetic diversity measures and/or individual and population-level phenotypic trait responses; The first paper filter resulted in 235 papers which were randomly reassigned for a second round of filtering. Second paper filter In the second filter, we excluded papers that did not provide population geographic coordinates and population-level quantitative data, unless data were provided upon contacting the authors or could be obtained from figures using DataThief (Tummers 2006). We visually inspected maps plotted for each study separately and we made minor adjustments to the GPS coordinates when the coordinates placed the focal population off the island or mainland. For this study, we included only responses measured at the individual level, therefore we removed papers referring to demographic performance and traits such as immunity, behaviour and diet that are heavily reliant on ecosystem context. We extracted data on population-level mean for two broad categories of response: i) broad phenotypic measures, which included traits (size, weight and morphology of entire body or body parts), metabolism products, physiology, vital rates (growth, survival, reproduction) and mean age of sampled mature individuals; and ii) genetic diversity, which included heterozygosity,allelic richness, number of alleles per locus etc. The final dataset includes 112 studies and 108 species. Methods for processing the data: We made minor adjustments to the GPS location of some populations upon visual inspection on Google Maps of the correct overlay of the data point with the indicated island body or mainland. For each population we extracted four climate variables reflecting mean and variation in temperature and precipitation available in CliMond V1.2 (Kritikos et al. 2012) at 10 minutes resolution: mean annual temperature (Bio1), annual precipitation (Bio12), temperature seasonality (CV) (Bio4) and precipitation seasonality (CV) (Bio15) using the "prcomp function" in the stats package in R. For populations where climate variables were not available on the global climate maps mostly due to small island size not captured in CliMond, we extracted data from the geographically closest grid cell with available climate values, which was available within 3.5 km away from the focal grid cell for all localities. We normalised the four climate variables using the "normalizer" package in R (Vilela 2020), and we performed a Principal Component Analysis (PCA) using the psych package in R (Revelle 2018). We saved the loadings of the axes for further analyses. References:
Bruno Vilela (2020). normalizer: Making data normal again.. R package version 0.1.0. Kriticos, D.J., Webber, B.L., Leriche, A., Ota, N., Macadam, I., Bathols, J., et al.(2012). CliMond: global high-resolution historical and future scenario climate surfaces for bioclimatic modelling. Methods Ecol. Evol., 3, 53--64. Revelle, W. (2018) psych: Procedures for Personality and Psychological Research, Northwestern University, Evanston, Illinois, USA, https://CRAN.R-project.org/package=psych Version = 1.8.12. Tummers, B. (2006). DataThief III. https://datathief.org/
Annual Resident Population Estimates, Estimated Components of Resident Population Change, and Rates of the Components of Resident Population Change for States and Counties // Source: U.S. Census Bureau, Population Division // Note: Total population change includes a residual. This residual represents the change in population that cannot be attributed to any specific demographic component. See Population Estimates Terms and Definitions at http://www.census.gov/popest/about/terms.html. // Net international migration in the United States includes the international migration of both native and foreign-born populations. Specifically, it includes: (a) the net international migration of the foreign born, (b) the net migration between the United States and Puerto Rico, (c) the net migration of natives to and from the United States, and (d) the net movement of the Armed Forces population between the United States and overseas. // The estimates are based on the 2010 Census and reflect changes to the April 1, 2010 population due to the Count Question Resolution program and geographic program revisions. See Geographic Terms and Definitions at http://www.census.gov/popest/about/geo/terms.html for a list of the states that are included in each region and division. // For detailed information about the methods used to create the population estimates, see http://www.census.gov/popest/methodology/index.html. // Each year, the Census Bureaus Population Estimates Program (PEP) utilizes current data on births, deaths, and migration to calculate population change since the most recent decennial census, and produces a time series of estimates of population. The annual time series of estimates begins with the most recent decennial census data and extends to the vintage year. The vintage year (e.g., V2014) refers to the final year of the time series. The reference date for all estimates is July 1, unless otherwise specified. With each new issue of estimates, the Census Bureau revises estimates for years back to the last census. As each vintage of estimates includes all years since the most recent decennial census, the latest vintage of data available supersedes all previously produced estimates for those dates. The Population Estimates Program provides additional information including historical and intercensal estimates, evaluation estimates, demographic analysis, and research papers on its website: http://www.census.gov/popest/index.html.
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The data belong to a paper that empirically examines the correlation between population growth and real interest rates. Although this correlation is well founded in macroeconomic theory, the corresponding empirical results have been rather tenuous. Demographic interest rate theories are typically based on long-term relationships across generations. Accordingly, key population trends appear often only across decades, if not centuries, worth of data. To capture these trends, a distinction is made between population growth resulting from a birth surplus and net migration. Within a panel covering 12 countries and the years since 1820, the paper find robust evidence that the birth surplus is significantly correlated with the real interest rate.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset is about book subjects. It has 3 rows and is filtered where the books is Demographic trends in Scotland : context information paper. It features 10 columns including number of authors, number of books, earliest publication date, and latest publication date.
Habitat fragmentation remains a major focus of research by ecologists decades after being put forward as a threat to the integrity of ecosystems. While studies have documented myriad biotic changes in fragmented landscapes, including the local extinction of species from fragments, the demographic mechanisms underlying these extinctions are rarely known. However, many of them – especially in lowland tropical forests – are thought to be driven by one of two mechanisms: (1) reduced recruitment in fragments resulting from changes in the diversity or abundance of pollinators and seed dispersers or (2) increased rates of individual mortality in fragments due to dramatically altered abiotic conditions, especially near fragment edges. Unfortunately, there have been few tests of these potential mechanisms due to the paucity of long-term and comprehensive demographic data collected in both forest fragments and continuous forest sites. Here we report 11 years (1998-2009) of demographic data from p..., The plants in each plot were censused annually, at which time we recorded, identified, marked, and measured new seedlings, identified any previously marked plants that died, and recorded the size of surviving individuals. Each plot was also surveyed 4-5 times during the flowering season to identify reproductive plants and record the number of inflorescences each produced., The files are in .csv files and no special programs or software are required to open them. , # HDP_survey.csv and HDP_plots.csv
The complete metadata for these data sets, including detailed descriptions of why and how the data were collected and validated, are in the following Data Paper:
Bruna,E.M., M.Uriarte, M.Rosa Darrigo, P.Rubim, C.F.Jurinitz, E.R.Scott, O.Ferreira da Silva, & W.John Kress. 2023. Demography of the understory herb Heliconia acuminata (Heliconiaceae) in an experimentally fragmented tropical landscape. Ecology.
This file comprises 11 years (1998-2009) of demographic data from populations of the Amazonian understory herb Heliconia acuminata (LC Rich.) found at Brazil's Biological Dynamics of Forest Fragments Project (BDFFP). The dataset comprises >66,000 plant x year records of 8586 plants, including 3464 seedlings established after the first census. Seven populations were in experimentally isolated fragments (one in each of four 1-ha fragments and one in each of three 10-ha fragments), with the re...
https://data.4tu.nl/info/fileadmin/user_upload/Documenten/4TU.ResearchData_Restricted_Data_2022.pdfhttps://data.4tu.nl/info/fileadmin/user_upload/Documenten/4TU.ResearchData_Restricted_Data_2022.pdf
The free-ranging elephants in Amboseli have been monitored continuously since 1972, and records of over 3000 individually known elephants are maintained in the Amboseli Elephant Research Project’s (AERP) database (Moss 2001). Births and mortalities of elephants between 1972 and 1975 are known with a precision of 3-6 months, and from 1976 with a precision of 2 weeks to 3 months (Lee et al. 2013). We used data collected until 2012. We categorized elephants in age classes as young calves (0-12 months), older calves (13-24 months), immatures (2-8 years), young adults (9-24 years), prime reproductive adults (25-49) and old adults (50+).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
A shift in scientific publishing from paper-based to knowledge-based practices promotes reproducibility, machine actionability and knowledge discovery. This is important for disciplines like social science, as study indicators are often social constructs such as race or education; hypothesis tests are challenging to compare in demographic research due to their limited temporal and spatial coverage; and natural language in research papers is often imprecise and ambiguous. Therefore, we present the MIRA-KG, consisting of: (1) an ontology for capturing social demography research, which links hypotheses and findings to evidence, (2) annotations of papers on health inequality in terms of the ontology, gathered by (i) prompting a Large Language Model to annotate paper abstracts using the ontology, (ii) mapping concepts to terms from NCBO BioPortal ontologies and GeoNames, and (iii) refining the final graph by a set of SHACL constraints, developed according to data quality criteria. The utility of the resource lies in its use for formally representing social demography research hypotheses, discovering research biases, discovery of knowledge, and the derivation of novel questions.
This dataset was generated using the code available on Github at https://w3id.org/mira/ at version v1.0. It uses the following ontology: https://w3id.org/mira/ontology/. A dump of the requirement stories and other resources used to generate the resource can be found on the drive: https://drive.google.com/drive/folders/1QKAOVV0TXfF4vYQ7b5dkHkXQjBqnh75W?usp=sharing.