The world population surpassed eight billion people in 2022, having doubled from its figure less than 50 years previously. Looking forward, it is projected that the world population will reach nine billion in 2038, and 10 billion in 2060, but it will peak around 10.3 billion in the 2080s before it then goes into decline. Regional variations The global population has seen rapid growth since the early 1800s, due to advances in areas such as food production, healthcare, water safety, education, and infrastructure, however, these changes did not occur at a uniform time or pace across the world. Broadly speaking, the first regions to undergo their demographic transitions were Europe, North America, and Oceania, followed by Latin America and Asia (although Asia's development saw the greatest variation due to its size), while Africa was the last continent to undergo this transformation. Because of these differences, many so-called "advanced" countries are now experiencing population decline, particularly in Europe and East Asia, while the fastest population growth rates are found in Sub-Saharan Africa. In fact, the roughly two billion difference in population between now and the 2080s' peak will be found in Sub-Saharan Africa, which will rise from 1.2 billion to 3.2 billion in this time (although populations in other continents will also fluctuate). Changing projections The United Nations releases their World Population Prospects report every 1-2 years, and this is widely considered the foremost demographic dataset in the world. However, recent years have seen a notable decline in projections when the global population will peak, and at what number. Previous reports in the 2010s had suggested a peak of over 11 billion people, and that population growth would continue into the 2100s, however a sooner and shorter peak is now projected. Reasons for this include a more rapid population decline in East Asia and Europe, particularly China, as well as a prolongued development arc in Sub-Saharan Africa.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The total population in the United States was estimated at 341.2 million people in 2024, according to the latest census figures and projections from Trading Economics. This dataset provides - United States Population - actual values, historical data, forecast, chart, statistics, economic calendar and news.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
<ul style='margin-top:20px;'>
<li>Total population for the world in 2024 was <strong>8,118,835,999</strong>, a <strong>0.71% increase</strong> from 2023.</li>
<li>Total population for the world in 2023 was <strong>8,061,876,001</strong>, a <strong>0.9% increase</strong> from 2022.</li>
<li>Total population for the world in 2022 was <strong>7,989,981,520</strong>, a <strong>0.87% increase</strong> from 2021.</li>
</ul>Total population is based on the de facto definition of population, which counts all residents regardless of legal status or citizenship. The values shown are midyear estimates.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Context
The dataset tabulates the United States population distribution across 18 age groups. It lists the population in each age group along with the percentage population relative of the total population for United States. The dataset can be utilized to understand the population distribution of United States by age. For example, using this dataset, we can identify the largest age group in United States.
Key observations
The largest age group in United States was for the group of age 25-29 years with a population of 22,854,328 (6.93%), according to the 2021 American Community Survey. At the same time, the smallest age group in United States was the 80-84 years with a population of 5,932,196 (1.80%). Source: U.S. Census Bureau American Community Survey (ACS) 2017-2021 5-Year Estimates.
When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2017-2021 5-Year Estimates.
Age groups:
Variables / Data Columns
Good to know
Margin of Error
Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.
Custom data
If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.
Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.
This dataset is a part of the main dataset for United States Population by Age. You can refer the same here
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Because sexual orientation concealment can exact deep mental and physical health costs and dampen the public visibility necessary for advancing equal rights, estimating the proportion of the global sexual minority population that conceals its sexual orientation represents a matter of public health and policy concern. Yet a historic lack of cross-national datasets of sexual minorities has precluded accurate estimates of the size of the global closet. We extrapolated the size of the global closet (i.e., the proportion of the global sexual minority population who conceals its sexual orientation) using a large sample of sexual minorities collected across 28 countries and an objective index of structural stigma (i.e., discriminatory national laws and policies affecting sexual minorities) across 197 countries. We estimate that the majority (83.0%) of sexual minorities around the world conceal their sexual orientation from all or most people and that country-level structural stigma can serve as a useful predictor of the size of each country’s closeted sexual minority population. Our analysis also predicts that eliminating structural stigma would drastically reduce the size of the global closet. Given its costs to individual health and social equality, the closet represents a considerable burden on the global sexual minority population. The present projection suggests that the surest route to improving the wellbeing of sexual minorities worldwide is through reducing structural forms of inequality. Yet, another route to alleviating the personal and societal toll of the closet is to develop public health interventions that sensitively reach the closeted sexual minority population in high-stigma contexts worldwide. An important goal of this projection, which relies on data from Europe, is to spur future research from non-Western countries capable of refining the estimate of the association between structural stigma and sexual orientation concealment using local experiences of both.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Global patterns of current and future road infrastructure - Supplementary spatial data
Authors: Johan Meijer, Mark Huijbregts, Kees Schotten, Aafke Schipper
Research paper summary: Georeferenced information on road infrastructure is essential for spatial planning, socio-economic assessments and environmental impact analyses. Yet current global road maps are typically outdated or characterized by spatial bias in coverage. In the Global Roads Inventory Project we gathered, harmonized and integrated nearly 60 geospatial datasets on road infrastructure into a global roads dataset. The resulting dataset covers 222 countries and includes over 21 million km of roads, which is two to three times the total length in the currently best available country-based global roads datasets. We then related total road length per country to country area, population density, GDP and OECD membership, resulting in a regression model with adjusted R2 of 0.90, and found that that the highest road densities are associated with densely populated and wealthier countries. Applying our regression model to future population densities and GDP estimates from the Shared Socioeconomic Pathway (SSP) scenarios, we obtained a tentative estimate of 3.0–4.7 million km additional road length for the year 2050. Large increases in road length were projected for developing nations in some of the world's last remaining wilderness areas, such as the Amazon, the Congo basin and New Guinea. This highlights the need for accurate spatial road datasets to underpin strategic spatial planning in order to reduce the impacts of roads in remaining pristine ecosystems.
Contents: The GRIP dataset consists of global and regional vector datasets in ESRI filegeodatabase and shapefile format, and global raster datasets of road density at a 5 arcminutes resolution (~8x8km). The GRIP dataset is mainly aimed at providing a roads dataset that is easily usable for scientific global environmental and biodiversity modelling projects. The dataset is not suitable for navigation. GRIP4 is based on many different sources (including OpenStreetMap) and to the best of our ability we have verified their public availability, as a criteria in our research. The UNSDI-Transportation datamodel was applied for harmonization of the individual source datasets. GRIP4 is provided under a Creative Commons License (CC-0) and is free to use. The GRIP database and future global road infrastructure scenario projections following the Shared Socioeconomic Pathways (SSPs) are described in the paper by Meijer et al (2018). Due to shapefile file size limitations the global file is only available in ESRI filegeodatabase format.
Regional coding of the other vector datasets in shapefile and ESRI fgdb format:
Road density raster data:
Keyword: global, data, roads, infrastructure, network, global roads inventory project (GRIP), SSP scenarios
The Marshall Project, the nonprofit investigative newsroom dedicated to the U.S. criminal justice system, has partnered with The Associated Press to compile data on the prevalence of COVID-19 infection in prisons across the country. The Associated Press is sharing this data as the most comprehensive current national source of COVID-19 outbreaks in state and federal prisons.
Lawyers, criminal justice reform advocates and families of the incarcerated have worried about what was happening in prisons across the nation as coronavirus began to take hold in the communities outside. Data collected by The Marshall Project and AP shows that hundreds of thousands of prisoners, workers, correctional officers and staff have caught the illness as prisons became the center of some of the country’s largest outbreaks. And thousands of people — most of them incarcerated — have died.
In December, as COVID-19 cases spiked across the U.S., the news organizations also shared cumulative rates of infection among prison populations, to better gauge the total effects of the pandemic on prison populations. The analysis found that by mid-December, one in five state and federal prisoners in the United States had tested positive for the coronavirus -- a rate more than four times higher than the general population.
This data, which is updated weekly, is an effort to track how those people have been affected and where the crisis has hit the hardest.
The data tracks the number of COVID-19 tests administered to people incarcerated in all state and federal prisons, as well as the staff in those facilities. It is collected on a weekly basis by Marshall Project and AP reporters who contact each prison agency directly and verify published figures with officials.
Each week, the reporters ask every prison agency for the total number of coronavirus tests administered to its staff members and prisoners, the cumulative number who tested positive among staff and prisoners, and the numbers of deaths for each group.
The time series data is aggregated to the system level; there is one record for each prison agency on each date of collection. Not all departments could provide data for the exact date requested, and the data indicates the date for the figures.
To estimate the rate of infection among prisoners, we collected population data for each prison system before the pandemic, roughly in mid-March, in April, June, July, August, September and October. Beginning the week of July 28, we updated all prisoner population numbers, reflecting the number of incarcerated adults in state or federal prisons. Prior to that, population figures may have included additional populations, such as prisoners housed in other facilities, which were not captured in our COVID-19 data. In states with unified prison and jail systems, we include both detainees awaiting trial and sentenced prisoners.
To estimate the rate of infection among prison employees, we collected staffing numbers for each system. Where current data was not publicly available, we acquired other numbers through our reporting, including calling agencies or from state budget documents. In six states, we were unable to find recent staffing figures: Alaska, Hawaii, Kentucky, Maryland, Montana, Utah.
To calculate the cumulative COVID-19 impact on prisoner and prison worker populations, we aggregated prisoner and staff COVID case and death data up through Dec. 15. Because population snapshots do not account for movement in and out of prisons since March, and because many systems have significantly slowed the number of new people being sent to prison, it’s difficult to estimate the total number of people who have been held in a state system since March. To be conservative, we calculated our rates of infection using the largest prisoner population snapshots we had during this time period.
As with all COVID-19 data, our understanding of the spread and impact of the virus is limited by the availability of testing. Epidemiology and public health experts say that aside from a few states that have recently begun aggressively testing in prisons, it is likely that there are more cases of COVID-19 circulating undetected in facilities. Sixteen prison systems, including the Federal Bureau of Prisons, would not release information about how many prisoners they are testing.
Corrections departments in Indiana, Kansas, Montana, North Dakota and Wisconsin report coronavirus testing and case data for juvenile facilities; West Virginia reports figures for juvenile facilities and jails. For consistency of comparison with other state prison systems, we removed those facilities from our data that had been included prior to July 28. For these states we have also removed staff data. Similarly, Pennsylvania’s coronavirus data includes testing and cases for those who have been released on parole. We removed these tests and cases for prisoners from the data prior to July 28. The staff cases remain.
There are four tables in this data:
covid_prison_cases.csv
contains weekly time series data on tests, infections and deaths in prisons. The first dates in the table are on March 26. Any questions that a prison agency could not or would not answer are left blank.
prison_populations.csv
contains snapshots of the population of people incarcerated in each of these prison systems for whom data on COVID testing and cases are available. This varies by state and may not always be the entire number of people incarcerated in each system. In some states, it may include other populations, such as those on parole or held in state-run jails. This data is primarily for use in calculating rates of testing and infection, and we would not recommend using these numbers to compare the change in how many people are being held in each prison system.
staff_populations.csv
contains a one-time, recent snapshot of the headcount of workers for each prison agency, collected as close to April 15 as possible.
covid_prison_rates.csv
contains the rates of cases and deaths for prisoners. There is one row for every state and federal prison system and an additional row with the National
totals.
The Associated Press and The Marshall Project have created several queries to help you use this data:
Get your state's prison COVID data: Provides each week's data from just your state and calculates a cases-per-100000-prisoners rate, a deaths-per-100000-prisoners rate, a cases-per-100000-workers rate and a deaths-per-100000-workers rate here
Rank all systems' most recent data by cases per 100,000 prisoners here
Find what percentage of your state's total cases and deaths -- as reported by Johns Hopkins University -- occurred within the prison system here
In stories, attribute this data to: “According to an analysis of state prison cases by The Marshall Project, a nonprofit investigative newsroom dedicated to the U.S. criminal justice system, and The Associated Press.”
Many reporters and editors at The Marshall Project and The Associated Press contributed to this data, including: Katie Park, Tom Meagher, Weihua Li, Gabe Isman, Cary Aspinwall, Keri Blakinger, Jake Bleiberg, Andrew R. CalderĂłn, Maurice Chammah, Andrew DeMillo, Eli Hager, Jamiles Lartey, Claudia Lauer, Nicole Lewis, Humera Lodhi, Colleen Long, Joseph Neff, Michelle Pitcher, Alysia Santo, Beth Schwartzapfel, Damini Sharma, Colleen Slevin, Christie Thompson, Abbie VanSickle, Adria Watson, Andrew Welsh-Huggins.
If you have questions about the data, please email The Marshall Project at info+covidtracker@themarshallproject.org or file a Github issue.
To learn more about AP's data journalism capabilities for publishers, corporations and financial institutions, go here or email kromano@ap.org.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The PRIMAP-hist Socio-Eco dataset combines several published datasets to create a comprehensive set of population and Gross domestic product (GDP) pathways for every country covering the years 1850 to 2017, and all UNFCCC (United Nations Framework Convention on Climate Change) member states, as well as most non-UNFCCC territories. The data has no sector resolution. List of datasets included in this data publication: (1) PMHSOCIOECO21_GDP_26-Jul-2019.csv: contains the GDP data for all countries(2) PMHSOCIOECO21_Population_26-Jul-2019.csv: contains the population data for all countries(3) PRIMAP-hist_SocioEco_data_description.pdf: including CHANGELOG(all files are also included in the .zip folder) When using this dataset or one of its updates, please cite the DOI of the precise version of the dataset. Please consider also citing the relevant original sources when using the PRIMAP-hist Socio-Eco dataset. See the full citations in the References section further below. A data description article is in preparation. Until it is published we refer to the description article of the PRIMAP-hist emissions time series for the methodology used. SOURCES: - UN World Population Prospects 2019 (UN2019)- World Bank World Development Indicators 2019 (July) (WDI2019B). We use the NY.GDP.MKTP.PP.KD variable for GDP.- Penn World Table version 9.1 (PWT91). We use the cgdpe variable for GDP (Robert and Feenstra, 2019; Feenstra et al., 2015)- Maddison Project Database 2018 (MPD2018). We use the cgdppc variable for GDP (Bolt et al,, 2018)- Anthropogenic land use estimates for the Holocene – HYDE 3.2 (HYDE32)(Klein Goldewijk, 2017)- Continuous national gross domestic product (GDP) time series for 195 countries: past observations (1850–2005) harmonized with future projections according to the Shared Socio-economic Pathways (2006–2100) (Geiger2018, Geiger and Frieler, 2018)Full references are available in the data description document.
Censuses are principal means of collecting basic population and housing statistics required for social and economic development, policy interventions, their implementation and evaluation.The census plays an essential role in public administration. The results are used to ensure: • equity in distribution of government services • distributing and allocating government funds among various regions and districts for education and health services • delineating electoral districts at national and local levels, and • measuring the impact of industrial development, to name a few The census also provides the benchmark for all surveys conducted by the national statistical office. Without the sampling frame derived from the census, the national statistical system would face difficulties in providing reliable official statistics for use by government and the public. Census also provides information on small areas and population groups with minimum sampling errors. This is important, for example, in planning the location of a school or clinic. Census information is also invaluable for use in the private sector for activities such as business planning and market analyses. The information is used as a benchmark in research and analysis.
Census 2011 was the third democratic census to be conducted in South Africa. Census 2011 specific objectives included: - To provide statistics on population, demographic, social, economic and housing characteristics; - To provide a base for the selection of a new sampling frame; - To provide data at lowest geographical level; and - To provide a primary base for the mid-year projections.
National
Households, Individuals
Census/enumeration data [cen]
Face-to-face [f2f]
About the Questionnaire : Much emphasis has been placed on the need for a population census to help government direct its development programmes, but less has been written about how the census questionnaire is compiled. The main focus of a population and housing census is to take stock and produce a total count of the population without omission or duplication. Another major focus is to be able to provide accurate demographic and socio-economic characteristics pertaining to each individual enumerated. Apart from individuals, the focus is on collecting accurate data on housing characteristics and services.A population and housing census provides data needed to facilitate informed decision-making as far as policy formulation and implementation are concerned, as well as to monitor and evaluate their programmes at the smallest area level possible. It is therefore important that Statistics South Africa collects statistical data that comply with the United Nations recommendations and other relevant stakeholder needs.
The United Nations underscores the following factors in determining the selection of topics to be investigated in population censuses: a) The needs of a broad range of data users in the country; b) Achievement of the maximum degree of international comparability, both within regions and on a worldwide basis; c) The probable willingness and ability of the public to give adequate information on the topics; and d) The total national resources available for conducting a census.
In addition, the UN stipulates that census-takers should avoid collecting information that is no longer required simply because it was traditionally collected in the past, but rather focus on key demographic, social and socio-economic variables.It becomes necessary, therefore, in consultation with a broad range of users of census data, to review periodically the topics traditionally investigated and to re-evaluate the need for the series to which they contribute, particularly in the light of new data needs and alternative data sources that may have become available for investigating topics formerly covered in the population census. It was against this background that Statistics South Africa conducted user consultations in 2008 after the release of some of the Community Survey products. However, some groundwork in relation to core questions recommended by all countries in Africa has been done. In line with users' meetings, the crucial demands of the Millennium Development Goals (MDGs) should also be met. It is also imperative that Stats SA meet the demands of the users that require small area data.
Accuracy of data depends on a well-designed questionnaire that is short and to the point. The interview to complete the questionnaire should not take longer than 18 minutes per household. Accuracy also depends on the diligence of the enumerator and honesty of the respondent.On the other hand, disadvantaged populations, owing to their small numbers, are best covered in the census and not in household sample surveys.Variables such as employment/unemployment, religion, income, and language are more accurately covered in household surveys than in censuses.Users'/stakeholders' input in terms of providing information in the planning phase of the census is crucial in making it a success. However, the information provided should be within the scope of the census.
Individual particulars Section A: Demographics Section B: Migration Section C: General Health and Functioning Section D: Parental Survival and Income Section E: Education Section F: Employment Section G: Fertility (Women 12-50 Years Listed) Section H: Housing, Household Goods and Services and Agricultural Activities Section I: Mortality in the Last 12 Months The Household Questionnaire is available in Afrikaans; English; isiZulu; IsiNdebele; Sepedi; SeSotho; SiSwati;Tshivenda;Xitsonga
The Transient and Tourist Hotel Questionnaire (English) is divided into the following sections:
Name, Age, Gender, Date of Birth, Marital Status, Population Group, Country of birth, Citizenship, Province.
The Questionnaire for Institutions (English) is divided into the following sections:
Particulars of the institution
Availability of piped water for the institution
Main source of water for domestic use
Main type of toilet facility
Type of energy/fuel used for cooking, heating and lighting at the institution
Disposal of refuse or rubbish
Asset ownership (TV, Radio, Landline telephone, Refrigerator, Internet facilities)
List of persons in the institution on census night (name, date of birth, sex, population group, marital status, barcode number)
The Post Enumeration Survey Questionnaire (English)
These questionnaires are provided as external resources.
Data editing and validation system The execution of each phase of Census operations introduces some form of errors in Census data. Despite quality assurance methodologies embedded in all the phases; data collection, data capturing (both manual and automated), coding, and editing, a number of errors creep in and distort the collected information. To promote consistency and improve on data quality, editing is a paramount phase in identifying and minimising errors such as invalid values, inconsistent entries or unknown/missing values. The editing process for Census 2011 was based on defined rules (specifications).
The editing of Census 2011 data involved a number of sequential processes: selection of members of the editing team, review of Census 2001 and 2007 Community Survey editing specifications, development of editing specifications for the Census 2011 pre-tests (2009 pilot and 2010 Dress Rehearsal), development of firewall editing specifications and finalisation of specifications for the main Census.
Editing team The Census 2011 editing team was drawn from various divisions of the organisation based on skills and experience in data editing. The team thus composed of subject matter specialists (demographers and programmers), managers as well as data processors. Census 2011 editing team was drawn from various divisions of the organization based on skills and experience in data editing. The team thus composed of subject matter specialists (demographers and programmers), managers as well as data processors.
The Census 2011 questionnaire was very complex, characterised by many sections, interlinked questions and skipping instructions. Editing of such complex, interlinked data items required application of a combination of editing techniques. Errors relating to structure were resolved using structural query language (SQL) in Oracle dataset. CSPro software was used to resolve content related errors. The strategy used for Census 2011 data editing was implementation of automated error detection and correction with minimal changes. Combinations of logical and dynamic imputation/editing were used. Logical imputations were preferred, and in many cases substantial effort was undertaken to deduce a consistent value based on the rest of the household’s information. To profile the extent of changes in the dataset and assess the effects of imputation, a set of imputation flags are included in the edited dataset. Imputation flags values include the following: 0 no imputation was performed; raw data were preserved 1 Logical editing was performed, raw data were blank 2 logical editing was performed, raw data were not blank 3 hot-deck imputation was performed, raw data were blank 4 hot-deck imputation was performed, raw data were not blank
Independent monitoring and evaluation of Census field activities Independent monitoring of the Census 2011 field activities was carried out by a team of 31 professionals and 381 Monitoring
The documented dataset covers Enterprise Survey (ES) panel data collected in Liberia in 2009 and 2017, as part of the Enterprise Survey initiative of the World Bank. An Indicator Survey is similar to an Enterprise Survey; it is implemented for smaller economies where the sampling strategies inherent in an Enterprise Survey are often not applicable due to the limited universe of firms.
The objective of the 2009-2017 Enterprise Survey is to obtain feedback from enterprises in client countries on the state of the private sector as well as to build a panel of enterprise data that will make it possible to track changes in the business environment over time and allow, for example, impact assessments of reforms. Through interviews with firms in the manufacturing and services sectors, the Indicator Survey data provides information on the constraints to private sector growth and is used to create statistically significant business environment indicators that are comparable across countries.
As part of its strategic goal of building a climate for investment, job creation, and sustainable growth, the World Bank has promoted improving the business environment as a key strategy for development, which has led to a systematic effort in collecting enterprise data across countries. The Enterprise Surveys (ES) are an ongoing World Bank project in collecting both objective data based on firms' experiences and enterprises' perception of the environment in which they operate.
National
The primary sampling unit of the study is the establishment. An establishment is a physical location where business is carried out and where industrial operations take place or services are provided. A firm may be composed of one or more establishments. For example, a brewery may have several bottling plants and several establishments for distribution. For the purposes of this survey an establishment must make its own financial decisions and have its own financial statements separate from those of the firm. An establishment must also have its own management and control over its payroll.
The whole population, or the universe, covered in the Enterprise Surveys is the non-agricultural economy. It comprises: all manufacturing sectors according to the ISIC Revision 3.1 group classification (group D), construction sector (group F), services sector (groups G and H), and transport, storage, and communications sector (group I). Note that this population definition excludes the following sectors: financial intermediation (group J), real estate and renting activities (group K, except sub-sector 72, IT, which was added to the population under study), and all public or utilities sectors.
Sample survey data [ssd]
The sample for the 2009-2017 Liberia Enterprise Survey (ES) was selected using stratified random sampling, following the methodology explained in the Sampling Note. Stratified random was preferred over simple random sampling for several reasons: - To obtain unbiased estimates for different subdivisions of the population with some known level of precision. - To obtain unbiased estimates for the whole population. The whole population, or universe of the study, is the non-agricultural economy. It comprises: all manufacturing sectors according to the group classification of ISIC Revision 3.1: (group D), construction sector (group F), services sector (groups G and H), and transport, storage, and communications sector (group I). Note that this definition excludes the following sectors: financial intermediation (group J), real estate and renting activities (group K, except subsector 72, IT, which was added to the population under study), and all public or utilities sectors.
The cost per observation in the survey may be reduced by stratification of the population elements into convenient groupings.
Three levels of stratification were used in this country: industry, establishment size, and region. Industry stratification was designed as follows: the universe was stratified as into manufacturing and services industries. Manufacturing (ISIC Rev. 3.1 codes 15 - 37), and Services (ISIC codes 45, 50-52, 55, 60-64, and 72).
For the Liberia ES, size stratification was defined as follows: small (5 to 19 employees), medium (20 to 99 employees), and large (100 or more employees).
Regional stratification for the Liberia ES was done across three regions: Montserrado, Margibi, and Nimba.
Face-to-face [f2f]
The current survey instruments are available: - Services and Manufacturing Questionnaire - Screener Questionnaire.
The standard Enterprise Survey topics include firm characteristics, gender participation, access to finance, annual sales, costs of inputs/labor, workforce composition, bribery, licensing, infrastructure, trade, crime, competition, capacity utilization, land and permits, taxation, informality, business-government relations, innovation and technology, and performance measures. Over 90% of the questions objectively ascertain characteristics of a country's business environment. The remaining questions assess the survey respondents' opinions on what are the obstacles to firm growth and performance.
Data entry and quality controls are implemented by the contractor and data is delivered to the World Bank in batches (typically 10%, 50% and 100%). These data deliveries are checked for logical consistency, out of range values, skip patterns, and duplicate entries. Problems are flagged by the World Bank and corrected by the implementing contractor through data checks, callbacks, and revisiting establishments.
There was a high response rate especially as a result of positive attitude towards the international community in collaboration with the government in their reconstruction efforts after a period of civil strife.There was also very positive attitude towards World Bank initiatives.
The Pakistan Demographic and Health Survey PDHS 2017-18 was the fourth of its kind in Pakistan, following the 1990-91, 2006-07, and 2012-13 PDHS surveys.
The primary objective of the 2017-18 PDHS is to provide up-to-date estimates of basic demographic and health indicators. The PDHS provides a comprehensive overview of population, maternal, and child health issues in Pakistan. Specifically, the 2017-18 PDHS collected information on:
The information collected through the 2017-18 PDHS is intended to assist policymakers and program managers at the federal and provincial government levels, in the private sector, and at international organisations in evaluating and designing programs and strategies for improving the health of the country’s population. The data also provides information on indicators relevant to the Sustainable Development Goals.
National coverage
The survey covered all de jure household members (usual residents), children age 0-5 years, women age 15-49 years and men age 15-49 years resident in the household.
Sample survey data [ssd]
The sampling frame used for the 2017-18 PDHS is a complete list of enumeration blocks (EBs) created for the Pakistan Population and Housing Census 2017, which was conducted from March to May 2017. The Pakistan Bureau of Statistics (PBS) supported the sample design of the survey and worked in close coordination with NIPS. The 2017-18 PDHS represents the population of Pakistan including Azad Jammu and Kashmir (AJK) and the former Federally Administrated Tribal Areas (FATA), which were not included in the 2012-13 PDHS. The results of the 2017-18 PDHS are representative at the national level and for the urban and rural areas separately. The survey estimates are also representative for the four provinces of Punjab, Sindh, Khyber Pakhtunkhwa, and Balochistan; for two regions including AJK and Gilgit Baltistan (GB); for Islamabad Capital Territory (ICT); and for FATA. In total, there are 13 secondlevel survey domains.
The 2017-18 PDHS followed a stratified two-stage sample design. The stratification was achieved by separating each of the eight regions into urban and rural areas. In total, 16 sampling strata were created. Samples were selected independently in every stratum through a two-stage selection process. Implicit stratification and proportional allocation were achieved at each of the lower administrative levels by sorting the sampling frame within each sampling stratum before sample selection, according to administrative units at different levels, and by using a probability-proportional-to-size selection at the first stage of sampling.
The first stage involved selecting sample points (clusters) consisting of EBs. EBs were drawn with a probability proportional to their size, which is the number of households residing in the EB at the time of the census. A total of 580 clusters were selected.
The second stage involved systematic sampling of households. A household listing operation was undertaken in all of the selected clusters, and a fixed number of 28 households per cluster was selected with an equal probability systematic selection process, for a total sample size of approximately 16,240 households. The household selection was carried out centrally at the NIPS data processing office. The survey teams only interviewed the pre-selected households. To prevent bias, no replacements and no changes to the pre-selected households were allowed at the implementing stages.
For further details on sample design, see Appendix A of the final report.
Face-to-face [f2f]
Six questionnaires were used in the 2017-18 PDHS: Household Questionnaire, Woman’s Questionnaire, Man’s Questionnaire, Biomarker Questionnaire, Fieldworker Questionnaire, and the Community Questionnaire. The first five questionnaires, based on The DHS Program’s standard Demographic and Health Survey (DHS-7) questionnaires, were adapted to reflect the population and health issues relevant to Pakistan. The Community Questionnaire was based on the instrument used in the previous rounds of the Pakistan DHS. Comments were solicited from various stakeholders representing government ministries and agencies, nongovernmental organisations, and international donors. The survey protocol was reviewed and approved by the National Bioethics Committee, Pakistan Health Research Council, and ICF Institutional Review Board. After the questionnaires were finalised in English, they were translated into Urdu and Sindhi. The 2017-18 PDHS used paper-based questionnaires for data collection, while computerassisted field editing (CAFE) was used to edit the questionnaires in the field.
The processing of the 2017-18 PDHS data began simultaneously with the fieldwork. As soon as data collection was completed in each cluster, all electronic data files were transferred via IFSS to the NIPS central office in Islamabad. These data files were registered and checked for inconsistencies, incompleteness, and outliers. The field teams were alerted to any inconsistencies and errors. Secondary editing was carried out in the central office, which involved resolving inconsistencies and coding the openended questions. The NIPS data processing manager coordinated the exercise at the central office. The PDHS core team members assisted with the secondary editing. Data entry and editing were carried out using the CSPro software package. The concurrent processing of the data offered a distinct advantage as it maximised the likelihood of the data being error-free and accurate. The secondary editing of the data was completed in the first week of May 2018. The final cleaning of the data set was carried out by The DHS Program data processing specialist and completed on 25 May 2018.
A total of 15,671 households were selected for the survey, of which 15,051 were occupied. The response rates are presented separately for Pakistan, Azad Jammu and Kashmir, and Gilgit Baltistan. Of the 12,338 occupied households in Pakistan, 11,869 households were successfully interviewed, yielding a response rate of 96%. Similarly, the household response rates were 98% in Azad Jammu and Kashmir and 99% in Gilgit Baltistan.
In the interviewed households, 94% of ever-married women age 15-49 in Pakistan, 97% in Azad Jammu and Kashmir, and 94% in Gilgit Baltistan were interviewed. In the subsample of households selected for the male survey, 87% of ever-married men age 15-49 in Pakistan, 94% in Azad Jammu and Kashmir, and 84% in Gilgit Baltistan were successfully interviewed.
Overall, the response rates were lower in urban than in rural areas. The difference is slightly less pronounced for Azad Jammu and Kashmir and Gilgit Baltistan. The response rates for men are lower than those for women, as men are often away from their households for work.
The estimates from a sample survey are affected by two types of errors: nonsampling errors and sampling errors. Nonsampling errors are the results of mistakes made in implementing data collection and data processing, such as failure to locate and interview the correct household, misunderstanding of the questions on the part of either the interviewer or the respondent, and data entry errors. Although numerous efforts were made during the implementation of the 2017-18 Pakistan Demographic and Health Survey (2017-18 PDHS) to minimise this type of error, nonsampling errors are impossible to avoid and difficult to evaluate statistically.
Sampling errors, on the other hand, can be evaluated statistically. The sample of respondents selected in the 2017-18 PDHS is only one of many samples that could have been selected from the same population, using the same design and expected size. Each of these samples would yield results that differ somewhat from the results of the actual sample selected. Sampling errors are a measure of the variability among all possible samples. Although the degree of variability is not known exactly, it can be estimated from the survey results.
Sampling error is usually measured in terms of the standard error for a particular statistic (mean, percentage, etc.), which is the square root of the variance. The standard error can be used to calculate confidence intervals within which the true value for the population can reasonably be assumed to fall. For example, for any given statistic calculated from a sample survey, the value of that
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The Faroese have accurate statistics of whale catches dating back to 1584. These are most probably the longest continuous statistics for the use of wildlife anywhere in the world. The data collection may enable a better understanding of pilot whale population dynamics and population status, and potentially inform management decisions on pilot whale (e.g. quotas).
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Medically Validated, Age-Accurate, and Balanced
Samples: 35,000 | Features: 16 | Targets: 2 (Binary + Regression)
This dataset is designed for predicting stroke risk using symptoms, demographics, and medical literature-inspired risk modeling. Version 2 significantly improves upon Version 1 by incorporating age-dependent symptom probabilities, gender-specific risk modifiers, and medically validated feature engineering.
Age-Accurate Risk Modeling:
Gender-Specific Risk:
Balanced and Expanded Data:
Column | Type | Description |
---|---|---|
age | Integer | Age (18–90) |
gender | String | Male/Female |
chest_pain | Binary | 1 = Present, 0 = Absent |
shortness_of_breath | Binary | 1 = Present, 0 = Absent |
irregular_heartbeat | Binary | 1 = Present, 0 = Absent |
fatigue_weakness | Binary | 1 = Present, 0 = Absent |
dizziness | Binary | 1 = Present, 0 = Absent |
swelling_edema | Binary | 1 = Present, 0 = Absent |
neck_jaw_pain | Binary | 1 = Present, 0 = Absent |
excessive_sweating | Binary | 1 = Present, 0 = Absent |
persistent_cough | Binary | 1 = Present, 0 = Absent |
nausea_vomiting | Binary | 1 = Present, 0 = Absent |
high_blood_pressure | Binary | 1 = Present, 0 = Absent |
chest_discomfort | Binary | 1 = Present, 0 = Absent |
cold_hands_feet | Binary | 1 = Present, 0 = Absent |
snoring_sleep_apnea | Binary | 1 = Present, 0 = Absent |
anxiety_doom | Binary | 1 = Present, 0 = Absent |
at_risk | Binary | Target for classification (1 = At Risk, 0 = Not At Risk) |
stroke_risk_percentage | Float | Target for regression (0–100%) |
Age distribution in Version 2 vs. Version 1
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F21100322%2F6317df05bc7526268853e24a5ce831ba%2FAge%20Distribution%20Plot.png?generation=1740875866152537&alt=media" alt="">
This dataset is grounded in peer-reviewed medical literature, with symptom probabilities, risk weights, and demographic relationships directly derived from clinical guidelines and epidemiological studies. Below is a detailed breakdown of how medical knowledge was translated into dataset parameters:
The prevalence of symptoms increases with age, reflecting real-world clinical observations. Probabilities are calibrated using population-level data from medical literature:
The documented dataset covers Enterprise Survey (ES) panel data collected in Sierra Leone in 2009 and 2017, as part of the Enterprise Survey initiative of the World Bank. An Indicator Survey is similar to an Enterprise Survey; it is implemented for smaller economies where the sampling strategies inherent in an Enterprise Survey are often not applicable due to the limited universe of firms.
The objective of the 2009-2017 survey is to obtain feedback from enterprises in client countries on the state of the private sector as well as to build a panel of enterprise data that will make it possible to track changes in the business environment over time and allow, for example, impact assessments of reforms. Through interviews with firms in the manufacturing and services sectors, the Indicator Survey data provides information on the constraints to private sector growth and is used to create statistically significant business environment indicators that are comparable across countries. As part of its strategic goal of building a climate for investment, job creation, and sustainable growth, the World Bank has promoted improving the business environment as a key strategy for development, which has led to a systematic effort in collecting enterprise data across countries. The Enterprise Surveys (ES) are an ongoing World Bank project in collecting both objective data based on firms' experiences and enterprises' perception of the environment in which they operate.
Questionnaire topics include firm characteristics, gender participation, access to finance, annual sales, costs of inputs/labor, workforce composition, bribery, licensing, infrastructure, trade, crime, competition, land and permits, taxation, business-government relations, performance measures, AIDS and sickness. The mode of data collection is face-to-face interviews.
National
The primary sampling unit of the study is the establishment. An establishment is a physical location where business is carried out and where industrial operations take place or services are provided. A firm may be composed of one or more establishments. For example, a brewery may have several bottling plants and several establishments for distribution. For the purposes of this survey an establishment must make its own financial decisions and have its own financial statements separate from those of the firm. An establishment must also have its own management and control over its payroll.
The whole population, or the universe, covered in the Enterprise Surveys is the non-agricultural economy. It comprises: all manufacturing sectors according to the ISIC Revision 3.1 group classification (group D), construction sector (group F), services sector (groups G and H), and transport, storage, and communications sector (group I). Note that this population definition excludes the following sectors: financial intermediation (group J), real estate and renting activities (group K, except sub-sector 72, IT, which was added to the population under study), and all public or utilities sectors.
Sample survey data [ssd]
The sample for registered establishments in Sierra Leone was selected using stratified random sampling, following the methodology explained in the Sampling Note.
Stratified random sampling was preferred over simple random sampling for several reasons: a. To obtain unbiased estimates for different subdivisions of the population with some known level of precision. b. To obtain unbiased estimates for the whole population. The whole population, or universe of the study, is the non-agricultural economy. It comprises: all manufacturing sectors according to the group classification of ISIC Revision 3.1: (group D), construction sector (group F), services sector (groups G and H), and transport, storage, and communications sector (group I). Note that this definition excludes the following sectors: financial intermediation (group J), real estate and renting activities (group K, except sub-sector 72, IT, which was added to the population under study), and all public or utilities-sectors. c. To make sure that the final total sample includes establishments from all different sectors and that it is not concentrated in one or two of industries/sizes/regions. d. To exploit the benefits of stratified sampling where population estimates, in most cases, will be more precise than using a simple random sampling method (i.e., lower standard errors, other things being equal.) e. Stratification may produce a smaller bound on the error of estimation than would be produced by a simple random sample of the same size. This result is particularly true if measurements within strata are homogeneous. f. The cost per observation in the survey may be reduced by stratification of the population elements into convenient groupings.
Three levels of stratification were used in the Sierra Leone sample: firm sector, firm size, and geographic region.
Industry stratification was designed as follows: the universe was stratified into one manufacturing industry and one services industry (retail).
Size stratification was defined following the standardized definition used for the Indicator Surveys: small (5 to 19 employees), medium (20 to 99 employees), and large (more than 99 employees). For stratification purposes, the number of employees was defined on the basis of reported permanent full-time workers.
Regional stratification was defined in terms of the geographic regions with the largest commercial presence in the country: Kenema and W/A Urban. In 2017, regional stratification was done across four regions: Bo, Western Urban, Kenema, and Bombali.
Given the stratified design, sample frames containing a complete and updated list of establishments as well as information on all stratification variables (number of employees, industry, and region) are required to draw the sample. Great efforts were made to obtain the best source for these listings.
The sample frame consisted of listings of firms from two sources: For panel firms the list of 150 firms from the Sierra Leone 2009 ES was used and for fresh firms (i.e., firms not covered in 2009) firm data from 2016 Business Establishment Census and Dun & Bradstreet Global database (June 2017), was used.
Necessary measures were taken to ensure the quality of the frame; however, the sample frame was not immune to the typical problems found in establishment surveys: positive rates of non-eligibility, repetition, non-existent units, etc.
Given the impact that non-eligible units included in the sample universe may have on the results, adjustments may be needed when computing the appropriate weights for individual observations. The percentage of confirmed non-eligible units as a proportion of the total number of sampled establishments contacted for the survey was 8.9% (18 out of 202 establishments).
Face-to-face [f2f]
The current survey instruments are available: - Services and Manufacturing Questionnaire - Screener Questionnaire.
The standard Enterprise Survey topics include firm characteristics, gender participation, access to finance, annual sales, costs of inputs/labor, workforce composition, bribery, licensing, infrastructure, trade, crime, competition, capacity utilization, land and permits, taxation, informality, business-government relations, innovation and technology, and performance measures. Over 90% of the questions objectively ascertain characteristics of a country's business environment. The remaining questions assess the survey respondents' opinions on what are the obstacles to firm growth and performance.
Data entry and quality controls are implemented by the contractor and data is delivered to the World Bank in batches (typically 10%, 50% and 100%). These data deliveries are checked for logical consistency, out of range values, skip patterns, and duplicate entries. Problems are flagged by the World Bank and corrected by the implementing contractor through data checks, callbacks, and revisiting establishments.
There was a high response rate especially as a result of positive attitude towards the international community in collaboration with the government in their reconstruction efforts after a period of civil strife. It is period in which a lot of statistics is being collected by the Sierra Leone Statistics for reconstruction thus most respondents were enlightened on research benefits.
The documented dataset covers Enterprise Survey (ES) panel data collected in Argentina in 2006, 2010 and 2017, as part of the Enterprise Survey initiative of the World Bank. An Indicator Survey is similar to an Enterprise Survey; it is implemented for smaller economies where the sampling strategies inherent in an Enterprise Survey are often not applicable due to the limited universe of firms.
The objective of the 2006-2017 Enterprise Survey is to obtain feedback from enterprises in client countries on the state of the private sector as well as to build a panel of enterprise data that will make it possible to track changes in the business environment over time and allow, for example, impact assessments of reforms. Through interviews with firms in the manufacturing and services sectors, the Indicator Survey data provides information on the constraints to private sector growth and is used to create statistically significant business environment indicators that are comparable across countries.
As part of its strategic goal of building a climate for investment, job creation, and sustainable growth, the World Bank has promoted improving the business environment as a key strategy for development, which has led to a systematic effort in collecting enterprise data across countries. The Enterprise Surveys (ES) are an ongoing World Bank project in collecting both objective data based on firms' experiences and enterprises' perception of the environment in which they operate.
National
The primary sampling unit of the study is the establishment. An establishment is a physical location where business is carried out and where industrial operations take place or services are provided. A firm may be composed of one or more establishments. For example, a brewery may have several bottling plants and several establishments for distribution. For the purposes of this survey an establishment must make its own financial decisions and have its own financial statements separate from those of the firm. An establishment must also have its own management and control over its payroll.
The whole population, or the universe, covered in the Enterprise Surveys is the non-agricultural economy. It comprises: all manufacturing sectors according to the ISIC Revision 3.1 group classification (group D), construction sector (group F), services sector (groups G and H), and transport, storage, and communications sector (group I). Note that this population definition excludes the following sectors: financial intermediation (group J), real estate and renting activities (group K, except sub-sector 72, IT, which was added to the population under study), and all public or utilities-sectors.
Sample survey data [ssd]
The sample for the 2006-2017 Argentina Enterprise Survey (ES) was selected using stratified random sampling, following the methodology explained in the Sampling Manual. Stratified random sampling was preferred over simple random sampling for several reasons: - To obtain unbiased estimates for different subdivisions of the population with some known level of precision. - To obtain unbiased estimates for the whole population. The whole population, or universe of the study, is the non-agricultural economy. It comprises: all manufacturing sectors (group D), construction (group F), services (groups G and H), and transport, storage, and communications (group I). Groups are defined following ISIC revision 3.1. Note that this definition excludes the following sectors: financial intermediation (group J), real estate and renting activities (group K, excluding sub-sector 72, IT, which was added to the population under study), and all public or utilities-sectors. - To make sure that the final total sample includes establishments from all different sectors and that it is not concentrated in one or two of industries/sizes/regions. - To exploit the benefits of stratified sampling where population estimates, in most cases, will be more precise than using a simple random sampling method (i.e., lower standard errors, other things being equal.)
Three levels of stratification were used in every country: industry, establishment size, and region.
Industry stratification was designed in the following way: In small economies the population was stratified into 3 manufacturing industries, one services industry - retail-, and one residual sector as defined in the sampling manual. Each industry had a target of 120 interviews. In middle size economies the population was stratified into 4 manufacturing industries, 2 services industries -retail and IT-, and one residual sector. For the manufacturing industries sample sizes were inflated by 25% to account for potential non-response in the financing data.
For the Argentina ES, size stratification was defined following the standardized definition for the rollout: small (5 to 19 employees), medium (20 to 99 employees), and large (more than 99 employees). For stratification purposed, the number of employees was defined on the basis of reported permanent full-time workers. This resulted in some difficulties in certain countries where seasonal/casual/part-time labor is common.
Face-to-face [f2f]
The current survey instruments are available: - Core Questionnaire + Manufacturing Module [ISIC Rev.3.1: 15-37] - Core Questionnaire + Retail Module [ISIC Rev.3.1: 52] - Core Questionnaire [ISIC Rev.3.1: 45, 50, 51, 55, 60-64, 72] - Screener Questionnaire.
The "Core Questionnaire" is the heart of the Enterprise Survey and contains the survey questions asked of all firms across the world. There are also two other survey instruments - the "Core Questionnaire + Manufacturing Module" and the "Core Questionnaire + Retail Module." The survey is fielded via three instruments in order to not ask questions that are irrelevant to specific types of firms, e.g. a question that relates to production and nonproduction workers should not be asked of a retail firm. In addition to questions that are asked across countries, all surveys are customized and contain country-specific questions. An example of customization would be including tourism-related questions that are asked in certain countries when tourism is an existing or potential sector of economic growth.
The standard Enterprise Survey topics include firm characteristics, gender participation, access to finance, annual sales, costs of inputs/labor, workforce composition, bribery, licensing, infrastructure, trade, crime, competition, capacity utilization, land and permits, taxation, informality, business-government relations, innovation and technology, and performance measures.
Data entry and quality controls are implemented by the contractor and data is delivered to the World Bank in batches (typically 10%, 50% and 100%). These data deliveries are checked for logical consistency, out of range values, skip patterns, and duplicate entries. Problems are flagged by the World Bank and corrected by the implementing contractor through data checks, callbacks, and revisiting establishments.
Survey non-response must be differentiated from item non-response. The former refers to refusals to participate in the survey altogether whereas the latter refers to the refusals to answer some specific questions. Enterprise Surveys suffer from both problems and different strategies were used to address these issues.
Item non-response was addressed by two strategies:
a- For sensitive questions that may generate negative reactions from the respondent, such as corruption or tax evasion, enumerators were instructed to collect the refusal to respond (-8) as a different option from don't know (-9).
b- Establishments with incomplete information were re-contacted in order to complete this information, whenever necessary. However, there were clear cases of low response. The following graph shows non-response rates for the sales variable, d2, by sector. Please, note that for this specific question, refusals were not separately identified from "Don't know" responses.
Survey non-response was addressed by maximizing efforts to contact establishments that were initially selected for interview. Attempts were made to contact the establishment for interview at different times/days of the week before a replacement establishment (with similar strata characteristics) was suggested for interview. Survey non-response did occur but substitutions were made in order to potentially achieve strata-specific goals; whenever this was done, strict rules were followed to ensure replacements were randomly selected within the same stratum. Further research is needed on survey non-response in the Enterprise Surveys regarding potential introduction of bias.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
The South Shetland Antarctic fur seal pup census dataset is part of long-term monitoring efforts in the South Shetland Islands archipelago (SSI), based at Cape Shirreff, Livingston Island. These efforts, which include conducting annual synoptic census counts of South Shetland Antarctic fur seals (SSAFS) throughout the region, have been primarily carried out by the Chilean Antarctic Institute (INACH) and the National Oceanic and Atmospheric Administration (NOAA) United States Antarctic Marine Living Resources Program (U.S. AMLR). These census data will continue to be collected by the U.S. AMLR program, and updated yearly. Recent studies have demonstrated Antarctic fur seals (Arctocephalus gazella) are composed of at least four distinct subpopulations (Bonin et al. 2013, Paijmans et al. 2020), including one breeding throughout the SSI. These SSAFS are the highest latitude population of otariids in the world. As such, this subpopulation faces a unique array of environmental and ecological challenges, harbors a disproportionately large reservoir of genetic diversity for the species, and has experienced catastrophic population decline between 2008 and 2023 (Krause et al. 2023 and references therein). Therefore, ensuring access to accurate and updated population data for SSAFS is particularly important for managers and decision makers. Due to regular absences by foraging females throughout the breeding season, and the irregular haul out patterns of males and subadults, the most informative measure of fur seal population size is to annually count pups (Payne, 1979; Bengtson et al., 1990). This dataset consists of all known total synoptic Antarctic fur seal pup counts (i.e., live and dead pups) from the SSI during the austral summers since 1959. Counts from the subset breeding colonies at Cape Shirreff (CS, reported with standard deviation (±SD) where available) and the San Telmo Islets (STI) are also included. Data were collected by the U.S. AMLR Program, unless otherwise indicated. Most of these annual census counts were conducted during the optimal biological window (late December and early January) when the vast majority of pups are born, but have not yet been subject to substantial mortality (Krause et al. 2022). The authors are confident that all counts included in this dataset are comparable and representative of South Shetland Antarctic fur seal population trends. However, census dates, or at least best estimates of the census date, are included for all records for any parties wishing to apply correction factors. The data are published as a standardized Darwin Core Archive, which contains count data for SSAFS pups from the specified locations during the specified seasons. This dataset is published under the license CC0. Please follow the guidelines from the SCAR Data Policy (SCAR, 2023) when using the data. If you have any questions regarding this dataset, please contact us via the contact information provided in the metadata or via data-biodiversity-aq@naturalsciences.be. Issues with the dataset can be reported at https://github.com/us-amlr/ssafs-pup-census. This dataset is maintained by the U.S. Antarctic Marine Living Resources Program, funded by NOAA.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Accurate estimation of the finite population mean is a fundamental challenge in survey sampling, especially when dealing with large or complex populations. Traditional methods like simple random sampling may not always provide reliable or efficient estimates in such cases. Motivated by this, the current study explores complex sampling techniques to improve the precision and accuracy of mean estimators. Specifically, we employ two-stage and three-stage cluster sampling methods to develop unbiased estimators for the finite population mean. Building upon these, the next phase of the study formulates unbiased mean estimators using stratified two- and three-stage cluster sampling. To further enhance the precision of these estimators, a ranked-set sampling strategy is applied to the secondary and tertiary sampling stages. Additionally, unbiased variance estimators corresponding to the proposed mean estimators are derived. Real-world datasets are utilized to demonstrate the application of these complex survey sampling methodologies, with results showing that the mean estimates derived using ranked set sampling are more accurate than those obtained via simple random sampling.
Notice of data discontinuation: Since the start of the pandemic, AP has reported case and death counts from data provided by Johns Hopkins University. Johns Hopkins University has announced that they will stop their daily data collection efforts after March 10. As Johns Hopkins stops providing data, the AP will also stop collecting daily numbers for COVID cases and deaths. The HHS and CDC now collect and visualize key metrics for the pandemic. AP advises using those resources when reporting on the pandemic going forward.
April 9, 2020
April 20, 2020
April 29, 2020
September 1st, 2020
February 12, 2021
new_deaths
column.February 16, 2021
The AP is using data collected by the Johns Hopkins University Center for Systems Science and Engineering as our source for outbreak caseloads and death counts for the United States and globally.
The Hopkins data is available at the county level in the United States. The AP has paired this data with population figures and county rural/urban designations, and has calculated caseload and death rates per 100,000 people. Be aware that caseloads may reflect the availability of tests -- and the ability to turn around test results quickly -- rather than actual disease spread or true infection rates.
This data is from the Hopkins dashboard that is updated regularly throughout the day. Like all organizations dealing with data, Hopkins is constantly refining and cleaning up their feed, so there may be brief moments where data does not appear correctly. At this link, you’ll find the Hopkins daily data reports, and a clean version of their feed.
The AP is updating this dataset hourly at 45 minutes past the hour.
To learn more about AP's data journalism capabilities for publishers, corporations and financial institutions, go here or email kromano@ap.org.
Use AP's queries to filter the data or to join to other datasets we've made available to help cover the coronavirus pandemic
Filter cases by state here
Rank states by their status as current hotspots. Calculates the 7-day rolling average of new cases per capita in each state: https://data.world/associatedpress/johns-hopkins-coronavirus-case-tracker/workspace/query?queryid=481e82a4-1b2f-41c2-9ea1-d91aa4b3b1ac
Find recent hotspots within your state by running a query to calculate the 7-day rolling average of new cases by capita in each county: https://data.world/associatedpress/johns-hopkins-coronavirus-case-tracker/workspace/query?queryid=b566f1db-3231-40fe-8099-311909b7b687&showTemplatePreview=true
Join county-level case data to an earlier dataset released by AP on local hospital capacity here. To find out more about the hospital capacity dataset, see the full details.
Pull the 100 counties with the highest per-capita confirmed cases here
Rank all the counties by the highest per-capita rate of new cases in the past 7 days here. Be aware that because this ranks per-capita caseloads, very small counties may rise to the very top, so take into account raw caseload figures as well.
The AP has designed an interactive map to track COVID-19 cases reported by Johns Hopkins.
@(https://datawrapper.dwcdn.net/nRyaf/15/)
<iframe title="USA counties (2018) choropleth map Mapping COVID-19 cases by county" aria-describedby="" id="datawrapper-chart-nRyaf" src="https://datawrapper.dwcdn.net/nRyaf/10/" scrolling="no" frameborder="0" style="width: 0; min-width: 100% !important;" height="400"></iframe><script type="text/javascript">(function() {'use strict';window.addEventListener('message', function(event) {if (typeof event.data['datawrapper-height'] !== 'undefined') {for (var chartId in event.data['datawrapper-height']) {var iframe = document.getElementById('datawrapper-chart-' + chartId) || document.querySelector("iframe[src*='" + chartId + "']");if (!iframe) {continue;}iframe.style.height = event.data['datawrapper-height'][chartId] + 'px';}}});})();</script>
Johns Hopkins timeseries data - Johns Hopkins pulls data regularly to update their dashboard. Once a day, around 8pm EDT, Johns Hopkins adds the counts for all areas they cover to the timeseries file. These counts are snapshots of the latest cumulative counts provided by the source on that day. This can lead to inconsistencies if a source updates their historical data for accuracy, either increasing or decreasing the latest cumulative count. - Johns Hopkins periodically edits their historical timeseries data for accuracy. They provide a file documenting all errors in their timeseries files that they have identified and fixed here
This data should be credited to Johns Hopkins University COVID-19 tracking project
Census of the architectural, urban and landscape of local interest at the level of the four intercommunalities of Côte D’Or (the Grand Dijon, the community of municipalities of Gevrey-Chambertin, the community of communes of Nuits-Saint— Georges and the Beaune-Côte Sud community of agglomeration). This census was established as part of the Burgundy candidacy for World Heritage as a real a tool for knowledge and management of the heritage of the wine coast. The objective was to define precisely the heritage property brought to the list, both quantitatively (identification of cabrots, Meurgers, habitats and wine-growing holdings) and qualitatively (respect of the criteria of authenticity and integrity very clearly defined by the international body: criterion of authenticity, i.e. the “exact” character of the property in terms of its distinctive design, environment, character or components; integrity criterion, i.e. the “intact” character of the natural and/or cultural heritage and its attributes). In the end, the identification of heritage of local interest provides a solid scientific basis for many regulatory, cultural, scientific and educational applications, etc. The scope of this census it is not exclusive of the wine heritage but must correspond to a broad understanding of the heritage in its multiple aspects (religious architecture, domestic architecture, urban sequence, local heritage, civil architecture, etc.). This layer is not definitive elements need to be located more precisely, see added or deleted. Others elements should depend on a surface object.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The World Health Organization reported 6932591 Coronavirus Deaths since the epidemic began. In addition, countries reported 766440796 Coronavirus Cases. This dataset provides - World Coronavirus Deaths- actual values, historical data, forecast, chart, statistics, economic calendar and news.
The world population surpassed eight billion people in 2022, having doubled from its figure less than 50 years previously. Looking forward, it is projected that the world population will reach nine billion in 2038, and 10 billion in 2060, but it will peak around 10.3 billion in the 2080s before it then goes into decline. Regional variations The global population has seen rapid growth since the early 1800s, due to advances in areas such as food production, healthcare, water safety, education, and infrastructure, however, these changes did not occur at a uniform time or pace across the world. Broadly speaking, the first regions to undergo their demographic transitions were Europe, North America, and Oceania, followed by Latin America and Asia (although Asia's development saw the greatest variation due to its size), while Africa was the last continent to undergo this transformation. Because of these differences, many so-called "advanced" countries are now experiencing population decline, particularly in Europe and East Asia, while the fastest population growth rates are found in Sub-Saharan Africa. In fact, the roughly two billion difference in population between now and the 2080s' peak will be found in Sub-Saharan Africa, which will rise from 1.2 billion to 3.2 billion in this time (although populations in other continents will also fluctuate). Changing projections The United Nations releases their World Population Prospects report every 1-2 years, and this is widely considered the foremost demographic dataset in the world. However, recent years have seen a notable decline in projections when the global population will peak, and at what number. Previous reports in the 2010s had suggested a peak of over 11 billion people, and that population growth would continue into the 2100s, however a sooner and shorter peak is now projected. Reasons for this include a more rapid population decline in East Asia and Europe, particularly China, as well as a prolongued development arc in Sub-Saharan Africa.