Official statistics are produced impartially and free from political influence.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The Census Tree is the largest-ever database of record links among the historical U.S. censuses, with over 700 million links for people living in the United States between 1850 and 1940. These links allow researchers to construct a longitudinal dataset that is highly representative of the population, and that includes women, Black Americans, and other under-represented populations at unprecedented rates. Each .csv file consists of a crosswalk between the two years indicated in the filename, using the IPUMS histids. For more information, consult the included Read Me file, and visit https://censustree.org.
Starting in mid-July of 2020, despite many delays due to Covid-19, census takers began interviewing households who had not yet responded online or via the mail to the U.S. 2020 Census. The federal census, required by the United States’ Constitution, happens once every 10 years and each time, there are new variations in enumeration (counting) techniques and what statistical data to collect. There are processes around “how” to count and then also “what” to count; the data collected needs to be useful for governance and allocation yet also respectful of privacy and remain fair and impartial for the entire U.S. population. In 2019 and 2020, hundreds of thousands of temporary workers from local communities were hired to go out into the field as census takers as well as staff offices and provide supervision. This 22nd federal census count began in January 2020 with remote portions of Alaska, where the territory was still frozen and traversable. These employed citizens are just one aspect of how the census is truly a community event. Let’s dive into the history of the U.S. Census and also learn why this count is so important.
In the past four centuries, the population of the United States has grown from a recorded 350 people around the Jamestown colony of Virginia in 1610, to an estimated 331 million people in 2020. The pre-colonization populations of the indigenous peoples of the Americas have proven difficult for historians to estimate, as their numbers decreased rapidly following the introduction of European diseases (namely smallpox, plague and influenza). Native Americans were also omitted from most censuses conducted before the twentieth century, therefore the actual population of what we now know as the United States would have been much higher than the official census data from before 1800, but it is unclear by how much. Population growth in the colonies throughout the eighteenth century has primarily been attributed to migration from the British Isles and the Transatlantic slave trade; however it is also difficult to assert the ethnic-makeup of the population in these years as accurate migration records were not kept until after the 1820s, at which point the importation of slaves had also been illegalized. Nineteenth century In the year 1800, it is estimated that the population across the present-day United States was around six million people, with the population in the 16 admitted states numbering at 5.3 million. Migration to the United States began to happen on a large scale in the mid-nineteenth century, with the first major waves coming from Ireland, Britain and Germany. In some aspects, this wave of mass migration balanced out the demographic impacts of the American Civil War, which was the deadliest war in U.S. history with approximately 620 thousand fatalities between 1861 and 1865. The civil war also resulted in the emancipation of around four million slaves across the south; many of whose ancestors would take part in the Great Northern Migration in the early 1900s, which saw around six million black Americans migrate away from the south in one of the largest demographic shifts in U.S. history. By the end of the nineteenth century, improvements in transport technology and increasing economic opportunities saw migration to the United States increase further, particularly from southern and Eastern Europe, and in the first decade of the 1900s the number of migrants to the U.S. exceeded one million people in some years. Twentieth and twenty-first century The U.S. population has grown steadily throughout the past 120 years, reaching one hundred million in the 1910s, two hundred million in the 1960s, and three hundred million in 2007. In the past century, the U.S. established itself as a global superpower, with the world's largest economy (by nominal GDP) and most powerful military. Involvement in foreign wars has resulted in over 620,000 further U.S. fatalities since the Civil War, and migration fell drastically during the World Wars and Great Depression; however the population continuously grew in these years as the total fertility rate remained above two births per woman, and life expectancy increased (except during the Spanish Flu pandemic of 1918).
Since the Second World War, Latin America has replaced Europe as the most common point of origin for migrants, with Hispanic populations growing rapidly across the south and border states. Because of this, the proportion of non-Hispanic whites, which has been the most dominant ethnicity in the U.S. since records began, has dropped more rapidly in recent decades. Ethnic minorities also have a much higher birth rate than non-Hispanic whites, further contributing to this decline, and the share of non-Hispanic whites is expected to fall below fifty percent of the U.S. population by the mid-2000s. In 2020, the United States has the third-largest population in the world (after China and India), and the population is expected to reach four hundred million in the 2050s.
Historical population as enumerated and corrected from 1790 through 2020. North Carolina was one of the 13 original States and by the time of the 1790 census had essentially its current boundaries. The Census is mandated by the United States Constitution and was first completed for 1790. The population has been counted every ten years hence, with some limitations. In 1790 census coverage included most of the State, except for areas in the west, parts of which were not enumerated until 1840. The population for 1810 includes Walton County, enumerated as part of Georgia although actually within North Carolina. Historical populations shown here reflect the population of the respective named county and not necessarily the population of the area of the county as it was defined for a particular census. County boundaries shown in maps reflect boundaries as defined in 2020. Historic boundaries for some counties may include additional geographic areas or may be smaller than the current geographic boundaries. Notes below list the county or counties with which the population of a currently defined county were enumerated historically (Current County: Population counted in). The current 100 counties have been in place since the 1920 Census, although some modifications to the county boundaries have occurred since that time. For historical county boundaries see: Atlas of Historical County Boundaries Project (newberry.org)County Notes: Note 1: Total for 1810 includes population (1,026) of Walton County, reported as a Georgia county but later determined to be situated in western North Carolina. Total for 1890 includes 2 Indians in prison, not reported by county. Note 2: Alexander: *Iredell, Burke, Wilkes. Note 3: Avery: *Caldwell, Mitchell, Watauga. Note 4: Buncombe: *Burke, Rutherford; see also note 22. Note 5: Caldwell: *Burke, Wilkes, Yancey. Note 6: Cleveland: *Rutherford, Lincoln. Note 7: Columbus: *Bladen, Brunswick. Note 8: Dare: *Tyrrell, Currituck, Hyde. Note 9: Hoke: *Cumberland, Robeson. Note 10: Jackson: *Macon, Haywood. Note 11: Lee: *Moore, Chatham. Note 12: Lenoir: *Dobbs (Greene); Craven. Note 13: McDowell: *Burke, Rutherford. Note 14: Madison: *Buncombe, Yancey. Note 15: Mitchell: *Yancey, Watauga. Note 16: Pamlico: *Craven, Beaufort. Note 17: Polk: *Rutherford, Henderson. Note 18: Swain: *Jackson, Macon. Note 19: Transylvania: *Henderson, Jackson. Note 20: Union: *Mecklenburg, Anson. Note 21: Vance: *Granville, Warren, Franklin. Note 22: Walton: Created in 1803 as a Georgia county and reported in 1810 as part of Georgia; abolished after a review of the State boundary determined that its area was located in North Carolina. By 1820 it was part of Buncombe County. Note 23: Watauga: *Ashe, Yancey, Wilkes; Burke. Note 24: Wilson: *Edgecombe, Nash, Wayne, Johnston. Note 25: Yancey: *Burke, Buncombe. Note 26: Alleghany: *Ashe. Note 27: Haywood: *Buncombe. Note 28: Henderson: *Buncombe. Note 29: Person: Caswell. Note 30: Clay: Cherokee. Note 31: Graham: Cherokee. Note 32: Harnett: Cumberland. Note 33: Macon: Haywood.
Note 34: Catawba: Lincoln. Note 35: Gaston: Lincoln. Note 36: Cabarrus: Mecklenburg.
Note 37: Stanly: Montgomery. Note 38: Pender: New Hanover. Note 39: Alamance: Orange.
Note 40: Durham: Orange, Wake. Note 41: Scotland: Richmond. Note 42: Davidson: Rowan. Note 43: Davie: Rowan.Note 44: Forsyth: Stokes. Note 45: Yadkin: Surry.
Note 46: Washington: Tyrrell.Note 47: Ashe: Wilkes. Part III. Population of Counties, Earliest Census to 1990The 1840 population of Person County, NC should be 9,790. The 1840 population of Perquimans County, NC should be 7,346.
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Historical population counts for municipalities in the State of Vermont (1791-2020) compiled by the Vermont Historical Society (years 1791-2010) then appended with 2020 Census counts.An attempt was made to convert counts to current town names to allow for analyses of population change of an area over time. The Historical Society notes, “For example, the census numbers from Kellyvale are counted as the town of Lowell because the name was changed in 1831. Cabot is included in Washington County records, even though it was in Caledonia County through the 1850 census.” This does create some issues where there are changes in geography such as boundary changes, annexations, and new incorporations (such as Rutland City splitting off from Rutland Town).The Historical Society collected the data from a variety of sources.The 1791-2010 data was extracted from PDF’s by VCGI Open Data Fellow Kendal Fortney in 2017.
This data collection contains detailed county and state-level ecological and descriptive data for the United States for the years 1790 to 2002. Parts 1-43 are an update to HISTORICAL, DEMOGRAPHIC, ECONOMIC, AND SOCIAL DATA: THE UNITED STATES, 1790-1970 (ICPSR 0003). Parts 1-41 contain data from the 1790-1970 censuses. They include extensive information about the social and political character of the United States, including a breakdown of population by state, race, nationality, number of families, size of the family, births, deaths, marriages, occupation, religion, and general economic condition. Parts 42 and 43 contain data from the 1840 and 1870 Censuses of Manufacturing, respectively. These files include information about the number of persons employed in various industries and the quantities of different types of manufactured products. Parts 44-50 provide county-level data from the United States Census of Agriculture for 1840 to 1900. They also include the state and national totals for the variables. The files provide data about the number, types, and prices of various agricultural products. Parts 51-57 contain data on religious bodies and church membership for 1906, 1916, 1926, 1936, and 1952, respectively. Parts 58-69 consist of data from the CITY DATA BOOKS for 1944, 1948, 1952, 1956, 1962, 1967, 1972, 1977, 1983, 1988, 1994, and 2000, respectively. These files contain information about population, climate, housing units, hotels, birth and death rates, school enrollment and education expenditures, employment in various industries, and city government finances. Parts 70-81 consist of data from the COUNTY DATA BOOKS for 1947, 1949, 1952, 1956, 1962, 1967, 1972, 1977, 1983, 1988, 1994, and 2000, respectively. These files include information about population, employment, housing, agriculture, manufacturing, retail, services, trade, banking, Social Security, local governments, school enrollment, hospitals, crime, and income. Parts 82-84 contain data from USA COUNTIES 1998. Due to the large number of variables from this source, the data were divided into into three separate data files. Data include information on population, vital statistics, school enrollment, educational attainment, Social Security, labor force, personal income, poverty, housing, trade, farms, ancestry, commercial banks, and transfer payments. Parts 85-106 provide data from the United States Census of Agriculture for 1910 to 2002. They provide data about the amount, types, and prices of various agricultural products. Also, these datasets contain extensive information on the amount, expenses, sales, values, and production of farms and machinery. (Source: downloaded from ICPSR 7/13/10)
Please Note: This dataset is part of the historical CISER Data Archive Collection and is also available at ICPSR -- https://doi.org/10.3886/ICPSR02896.v3. We highly recommend using the ICPSR version, as they made this dataset available in multiple data formats and updated the data through 2002.
Abstract copyright UK Data Service and data collection copyright owner.
The Great Britain Historical Database has been assembled as part of the ongoing Great Britain Historical GIS Project. The project aims to trace the emergence of the north-south divide in Britain and to provide a synoptic view of the human geography of Britain at sub-county scales. Further information about the project is available on A Vision of Britain webpages, where users can browse the database's documentation system online.
These data were originally collected by the Censuses of Population for England and Wales, and for Scotland. They were computerised by the Great Britain Historical GIS Project.
The 19th century censuses gathered data only on "occupations", meaning individuals' roles in the workplace, but the changing nature of work created a need for separate counts by "employer's business". The first such industry statistics resulted from the 1911 census, but the first data included here are from 1931. The 1931 data, unlike the later data, are tabulated by place of residence, as data on journeys to work were not gathered by that census.
Numbers of workers in each industry, usually cross-classified by gender. The industrial classifications used change substantially over time, and by modern standards generally go into great detail about the manufacturing sector. For 1931 and 1951, one set of tables provide a detailed classification for counties and large towns and another provides a simplified classification for small towns and rural districts.
Abstract copyright UK Data Service and data collection copyright owner.
The Ashwell History Data Project is a collaborative project between local and family historians, the computer centre at Hatfield Polytechnic and the staff of the Ashwell Field Studies Centre to create computer files of historical documents for Ashwell and the surrounding area. The data in this study cover 19th century census enumerators' books.Annual Resident Population Estimates by Age Group, Sex, Race, and Hispanic Origin; for the United States, States, Counties; and for Puerto Rico and its Municipios: April 1, 2010 to July 1, 2019 // Source: U.S. Census Bureau, Population Division // The contents of this file are released on a rolling basis from December through June. // Note: 'In combination' means in combination with one or more other races. The sum of the five race-in-combination groups adds to more than the total population because individuals may report more than one race. Hispanic origin is considered an ethnicity, not a race. Hispanics may be of any race. Responses of 'Some Other Race' from the 2010 Census are modified. This results in differences between the population for specific race categories shown for the 2010 Census population in this file versus those in the original 2010 Census data. The estimates are based on the 2010 Census and reflect changes to the April 1, 2010 population due to the Count Question Resolution program and geographic program revisions. // Current data on births, deaths, and migration are used to calculate population change since the 2010 Census. An annual time series of estimates is produced, beginning with the census and extending to the vintage year. The vintage year (e.g., Vintage 2019) refers to the final year of the time series. The reference date for all estimates is July 1, unless otherwise specified. With each new issue of estimates, the entire estimates series is revised. Additional information, including historical and intercensal estimates, evaluation estimates, demographic analysis, research papers, and methodology is available on website: https://www.census.gov/programs-surveys/popest.html.
An appreciation of historical landuse and its effects is crucial when interpreting the structure, composition, and spatial characteristics of modern forests. The Harvard Forest has compiled many different historical data sources in an ongoing effort to understand how anthropogenic disturbances have shaped our modern landscapes. Estimates of town land use and land cover were gathered from a variety of sources, including tax valuations (1801-1860) and state agricultural census records (1865-1905). Data prior to 1801 rarely cover the entire state and are excluded from these datasets. Data on forest structure are available for several time periods, including 1885 and 1895 (Agricultural Censuses) and 1916-1920s (State Forester’s reports).
https://www.icpsr.umich.edu/web/ICPSR/studies/3/termshttps://www.icpsr.umich.edu/web/ICPSR/studies/3/terms
Detailed county and state-level ecological or descriptive data for the United States for the years 1790 to 1970 are contained in this collection. These data files contain extensive information about the social and political character of the United States, including a breakdown of population by state, race, nationality, number of families, size of the family, births, deaths, marriages, occupation, religion, and general economic conditions. Though not complete over the full time span of this study, statistics are available on such diverse subjects as total numbers of newspapers and periodicals, total capital invested in manufacturing, total numbers of educational institutions, total number of churches, taxation by state, and land surface area in square miles.
https://borealisdata.ca/api/datasets/:persistentId/versions/3.1/customlicense?persistentId=doi:10.5683/SP3/ZORCSDhttps://borealisdata.ca/api/datasets/:persistentId/versions/3.1/customlicense?persistentId=doi:10.5683/SP3/ZORCSD
Data tables on the social and economic conditions in Pre-Confederation Canada from the first census in 1665 to Confederation in 1867. This dataset is one of three that cover the history of the censuses in Quebec. These tables cover Lower Canada 1825-1861. For census data for the years 1765-1790, see the Province of Quebec dataset; for census data for the years 1676-1754, see the New France dataset. The tables were transcribed from the fourth volume of the 1871 Census of Canada: Reprint of the Censuses of Canada, 1665-1871, available online from Statistics Canada, Canadiana, Government of Canada Publications, and the Internet Archive. Note on terminology: Due to the nature of some of the data sources, terminology may include language that is problematic and/or offensive to researchers. Certain vocabulary used to refer to racial, ethnic, religious and cultural groups is specific to the time period when the data were collected. When exploring or using these data do so in the context of historical thinking concepts – analyzing not only the content but asking questions of who shaped the content and why.
The 1940 Census Public Use Microdata Sample Project was assembled through a collaborative effort between the United States Bureau of the Census and the Center for Demography and Ecology at the University of Wisconsin. The collection contains a stratified 1-percent sample of households, with separate records for each household, for each "sample line" respondent, and for each person in the household. These records were encoded from microfilm copies of original handwritten enumeration schedules from the 1940 Census of Population. Geographic identification of the location of the sampled households includes Census regions and divisions, states (except Alaska and Hawaii), standard metropolitan areas (SMAs), and state economic areas (SEAs). Accompanying the data collection is a codebook that includes an abstract, descriptions of sample design, processing procedures and file structure, a data dictionary (record layout), category code lists, and a glossary. Also included is a procedural history of the 1940 Census. Each of the 20 subsamples contains three record types: household, sample line, and person. Household variables describe the location and condition of the household. The sample line records contain variables describing demographic characteristics such as nativity, marital status, number of children, veteran status, wage deductions for Social Security, and occupation. Person records also contain variables describing demographic characteristics including nativity, marital status, family membership, education, employment status, income, and occupation. (Source: downloaded from ICPSR 7/13/10)
Please Note: This dataset is part of the historical CISER Data Archive Collection and is also available at ICPSR at https://doi.org/10.3886/ICPSR08236.v1. We highly recommend using the ICPSR version as they may make this dataset available in multiple data formats in the future.
This dataset contains data on population by sex and age on the basis of the results of the Census Data of Estonia, which was carried out on 28 December 1922. Dataset "Estonian Population by Sex and Age in 1922 Census Data" was published implementing project "Historical Sociology of Modern Restorations: a Cross-Time Comparative Study of Post-Communist Transformation in the Baltic States" from 2018 to 2022. Project leader is prof. Zenonas Norkus. Project is funded by the European Social Fund according to the activity "Improvement of researchers' qualification by implementing world-class R&D projects' of Measure No. 09.3.3-LMT-K-712".
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Context. This historical dataset stems from the project of automatic extraction of 72 census records of Lausanne, Switzerland. The complete dataset covers a century of historical demography in Lausanne (1805-1898), which corresponds to 18,831 pages, and nearly 6 million cells.
Content. The data published in this repository correspond to a first release, i.e. a diachronic slice of one register every 8 to 9 years. Unfortunately, the remaining data are currently under embargo. Their publication will take place as soon as possible, and at the latest by the end of 2023. In the meantime, the data presented here correspond to a large subset of 2,844 pages, which already allows to investigate most research hypotheses.
Description. The population censuses, digitized by the Archives of the city of Lausanne, continuously cover the evolution of the population in Lausanne throughout the 19th century, starting in 1805, with only one long interruption from 1814 to 1831. Highly detailed, they are an invaluable source for studying migration, economic and social history, and traces of cultural exchanges not only with Bern, but also with France and Italy. Indeed, the system of tracing family origin, specific to Switzerland, allows to follow the migratory movements of families long before the censuses appeared. The bourgeoisie is also an essential economic tracer. In addition, censuses extensively describe the organization of the social fabric into family nuclei, around which gravitate various boarders, workers, servants or apprentices, often living in the same apartment with the family.
Production. The structure and richness of censuses have also provided an opportunity to develop automatic methods for processing structured documents. The processing of censuses includes several steps, from the identification of text segments to the restructuring of information as digital tabular data, through Handwritten Text Recognition and the automatic segmentation of the structure using neural networks. Please note that the detailed extraction methodology, as well as the complete evaluation of performance and reliability is published in:
Data structure. The data are structured in rows and columns, with each row corresponding to a household. Multiple entries in the same column for a single household are separated by vertical bars 〈|〉. The center point 〈·〉 indicates an empty entry. For some columns (e.g., street name, house number, owner name), an empty entry indicates that the last non-empty value should be carried over. The page number is in the last column.
Liability. The data presented here are not curated nor verified. They are the raw results of the extraction, the reliability of which was thoroughly assessed in the above-mentioned publication. We insist on the fact that for any reuse of this data for research purposes, the implementation of an appropriate methodology is necessary. This may typically include string distance heuristics, or statistical methodologies to deal with noise and uncertainty.
This data collection supplies standard monthly labor force data as well as supplemental data on work experience, income, noncash benefits, and migration. Comprehensive information is given on the employment status, occupation, and industry of persons 15 years old and older. Additional data are available concerning weeks worked and hours per week worked, reason not working full-time, total income and income components, and residence on March 1, 2000. This file also contains data covering noncash income sources such as food stamps, school lunch programs, employer-provided group health insurance plans, employer-provided pension plans, personal health insurance, Medicaid, Medicare, CHAMPUS or military health care, and energy assistance. Information on demographic characteristics, such as age, sex, race, household relationships, and Hispanic origin, is available for each person in the household enumerated.
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
The United States Census is a decennial census mandated by Article I, Section 2 of the United States Constitution, which states: "Representatives and direct Taxes shall be apportioned among the several States ... according to their respective Numbers."
Source: https://en.wikipedia.org/wiki/United_States_Census
The United States census count (also known as the Decennial Census of Population and Housing) is a count of every resident of the US. The census occurs every 10 years and is conducted by the United States Census Bureau. Census data is publicly available through the census website, but much of the data is available in summarized data and graphs. The raw data is often difficult to obtain, is typically divided by region, and it must be processed and combined to provide information about the nation as a whole.
The United States census dataset includes nationwide population counts from the 2000 and 2010 censuses. Data is broken out by gender, age and location using zip code tabular areas (ZCTAs) and GEOIDs. ZCTAs are generalized representations of zip codes, and often, though not always, are the same as the zip code for an area. GEOIDs are numeric codes that uniquely identify all administrative, legal, and statistical geographic areas for which the Census Bureau tabulates data. GEOIDs are useful for correlating census data with other censuses and surveys.
Fork this kernel to get started.
https://bigquery.cloud.google.com/dataset/bigquery-public-data:census_bureau_usa
https://cloud.google.com/bigquery/public-data/us-census
Dataset Source: United States Census Bureau
Use: This dataset is publicly available for anyone to use under the following terms provided by the Dataset Source - http://www.data.gov/privacy-policy#data_policy - and is provided "AS IS" without any warranty, express or implied, from Google. Google disclaims all liability for any damages, direct or indirect, resulting from the use of the dataset.
Banner Photo by Steve Richey from Unsplash.
What are the ten most populous zip codes in the US in the 2010 census?
What are the top 10 zip codes that experienced the greatest change in population between the 2000 and 2010 censuses?
https://cloud.google.com/bigquery/images/census-population-map.png" alt="https://cloud.google.com/bigquery/images/census-population-map.png">
https://cloud.google.com/bigquery/images/census-population-map.png
(1) This hierarchical file contains 202,112 records. There are approximately 157 variables and two record types: family and person. Family records contain approximately 58 variables, and person records contain approximately 99 variables. (2) Each family and person record contains a weight, which must be used in any analysis. (3) This data file was obtained from the Data Program and Library Service (DPLS), University of Wisconsin. Some data management operations intended to store the data more efficiently were performed by DPLS. That organization also revised the original Census Bureau documentation. (4) The codebook is provided by ICPSR as a Portable Document Format (PDF) file. The PDF file format was developed by Adobe Systems Incorporated and can be accessed using PDF reader software, such as the Adobe Acrobat Reader. Information on how to obtain a copy of the Acrobat Reader is provided on the ICPSR Web site. This data collection supplies standard monthly labor force data as well as supplemental data on work experience, income, and migration. Comprehensive information is given on the employment status, occupation, and industry of persons 14 years old and older. Additional data are available concerning weeks worked and hours per week worked, reason not working full-time, total income and income components, and residence. Information on demographic characteristics, such as age, sex, race, educational attainment, marital status, veteran status, household relationship, and Hispanic origin, is available for each person in the household enumerated. Persons in the civilian noninstitutional population of the United States living in households and members of the armed forces living in civilian housing units in 1969. Datasets: DS1: Current Population Survey: Annual Demographic File, 1969 A national probability sample was used in selecting housing units.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This training dataset includes a total of 34,913 manually transcribed text segments. It is dedicated to the handwritten text recognition (HTR) of historical sources, typically tabular records, such as censuses. This dataset is based on a sample of 83 pages from the 19th century (1805-1898) censuses of Lausanne, Switzerland. The primary language of the documents is French, although many germanic names and toponyms are also found.
The training data are formatted and provided on the model of the Bentham dataset. The format thus simply consists in a list of jpeg images, one per text segments, and their corresponding transcription, stored in a txt file. The file naming convention is 'yyyy-ppp-n', where 'y' stands for the year of publication of the census, and 'p' for the page number.
The digitized documents are provided by the Archives of the City of Lausanne.
Please note that the annotation and extraction methodology, as well as the complete evaluation of performance, including HTR benchmark and post-correction performance is published in :
Petitpierre R., Rappo L., Kramer M. (2023). An end-to-end pipeline for historical censuses processing. International Journal on Document Analysis and Recognition (IJDAR). doi: 10.1007/s10032-023-00428-9
Tabular dataset resulting from automatic extraction are also available on Zenodo :
Petitpierre R., Rappo L., Kramer M., di Lenardo I. (2023). 1805-1898 Census Records of Lausanne : a Long Digital Dataset for Demographic History. Zenodo. doi: 10.5281/zenodo.7711640
Official statistics are produced impartially and free from political influence.