The 119th Congressional Districts dataset reflects boundaries from January 03, 2025 from the United States Census Bureau (USCB), and the attributes are updated every Sunday from the United States House of Representatives and is part of the U.S. Department of Transportation (USDOT)/Bureau of Transportation Statistics (BTS) National Transportation Atlas Database (NTAD). The TIGER/Line shapefiles and related database files (.dbf) are an extract of selected geographic and cartographic information from the U.S. Census Bureau's Master Address File / Topologically Integrated Geographic Encoding and Referencing (MAF/TIGER) Database (MTDB). The MTDB represents a seamless national file with no overlaps or gaps between parts, however, each TIGER/Line shapefile is designed to stand alone as an independent data set, or they can be combined to cover the entire nation. Information for each member of Congress is appended to the Census Congressional District shapefile using information from the Office of the Clerk, U.S. House of Representatives' website https://clerk.house.gov/xml/lists/MemberData.xml and its corresponding XML file. Congressional districts are the 435 areas from which people are elected to the U.S. House of Representatives. This dataset also includes 9 geographies for non-voting at large delegate districts, resident commissioner districts, and congressional districts that are not defined. After the apportionment of congressional seats among the states based on census population counts, each state is responsible for establishing congressional districts for the purpose of electing representatives. Each congressional district is to be as equal in population to all other congressional districts in a state as practicable. The 119th Congress is seated from January 3, 2025 through January 3, 2027. In Connecticut, Illinois, and New Hampshire, the Redistricting Data Program (RDP) participant did not define the CDs to cover all of the state or state equivalent area. In these areas with no CDs defined, the code "ZZ" has been assigned, which is treated as a single CD for purposes of data presentation. The TIGER/Line shapefiles for the District of Columbia, Puerto Rico, and the Island Areas (American Samoa, Guam, the Commonwealth of the Northern Mariana Islands, and the U.S. Virgin Islands) each contain a single record for the non-voting delegate district in these areas. The boundaries of all other congressional districts reflect information provided to the Census Bureau by the states by May 31, 2024. A data dictionary, or other source of attribute information, is accessible at https://doi.org/10.21949/1529006
These data depict the 117th Congressional Districts and their representatives for the United States. Congressional districts are the 435 areas from which members are elected to the U.S. House of Representatives. After the apportionment of congressional seats among the states, which is based on decennial census population counts, each state with multiple seats is responsible for establishing con...
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Context
The dataset tabulates the population of Congress by race. It includes the population of Congress across racial categories (excluding ethnicity) as identified by the Census Bureau. The dataset can be utilized to understand the population distribution of Congress across relevant racial categories.
Key observations
The percent distribution of Congress population by race (across all racial categories recognized by the U.S. Census Bureau): 97.17% are white and 2.83% are multiracial.
When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2018-2022 5-Year Estimates.
Racial categories include:
Variables / Data Columns
Good to know
Margin of Error
Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.
Custom data
If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.
Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.
This dataset is a part of the main dataset for Congress Population by Race & Ethnicity. You can refer the same here
Us Senators serving Macon-Bibb County.The two Senators that serve the State of Georgia are Johnny Isakson and David Perdue.The United States Senate is the upper chamber of the United States Congress, which along with the United States House of Representatives—the lower chamber—comprise the legislature of the United States.The composition and powers of the Senate are established by Article One of the United States Constitution. The Senate is composed of senators, each of whom represents a single state in its entirety, with each state being equally represented by two senators, regardless of its population, serving staggered terms of six years; with fifty states presently in the Union, there are 100 U.S. Senators. From 1789 until 1913, Senators were appointed by legislatures of the states they represented; following the ratification of the Seventeenth Amendment in 1913, they are now popularly elected. The Senate chamber is located in the north wing of the Capitol, in Washington, D.C.As the upper house, the Senate has several powers of advice and consent which are unique to it; these include the ratification of treaties and the confirmation of Cabinet secretaries, Supreme Court justices, federal judges, other federal executive officials, flag officers, regulatory officials, ambassadors, and other federal uniformed officers. In addition to these, in cases wherein no candidate receives a majority of electors for Vice President, the duty befalls upon the Senate to elect one of the top two recipients of electors for that office. It further has the responsibility of conducting trials of those impeached by the House. The Senate is widely considered both a more deliberative and more prestigious body than the House of Representatives due to its longer terms, smaller size, and statewide constituencies, which historically led to a more collegial and less partisan atmosphere.The presiding officer of the Senate is the Vice President of the United States, who is President of the Senate. In the Vice President's absence, the President Pro Tempore, who is customarily the senior member of the party holding a majority of seats, presides over the Senate. In the early 20th century, the practice of majority and minority parties electing their floor leaders began, although they are not constitutional officers.
https://www.icpsr.umich.edu/web/ICPSR/studies/2/termshttps://www.icpsr.umich.edu/web/ICPSR/studies/2/terms
The United States Historical Election Returns Series consists of several datasets, the major files are the United States Historical Election Returns, 1788-1968 (ICPSR 00001) and General Election Data for the United States, 1950-1990 (ICPSR 00013). ICPSR 00001 includes county-level returns for over 90 percent of all elections to the offices of president, governor, United States representative (1824-1990), and United States senator (1912-1990).The dataset also includes returns for approximately two-thirds of all elections to the offices of president, governor, and United States representative for the period 1788-1823. Study ICPSR 00013 contains county-level returns for all elections to the same national and state offices, plus one additional state-wide office, usually attorney general or secretary of state. This data collection provides summary information about candidates contesting elections and special elections anywhere in the nation, political party name and ICPSR party ID code, and the number of votes received by each candidate in the constituency for elections between 1788 and 1990. The information also include election for which returns are available solely at the constituency level and not found in the country-level files of elections returns described above. For detailed information about candidates and contests, please refer to study Constituency Statistics of Elections in the United States, 1788-1990 (ICPSR 7757). This release further include 1990 data from the District of Columbia election for United States senator and United States representative. The offices of two senators and one representative were created by the "District of Columbia Statehood Constitutional Convention Initiative," which was approved by District voters in 1980. Elections for these offices were postponed until the 1990 general election. The three offices are currently local District positions, which will turn into federal offices if the District becomes a state.
Abstract copyright UK Data Service and data collection copyright owner. This dataset is the product of three related research projects. The first project (funded by the Leverhulme Trust) examined the impact of devolution on the work of British Members of Parliament (MPs), particularly in Scotland and Wales. The second project (also funded by the Leverhulme Trust) gathered the views of Members of the Scottish Parliament (MSPs) and Members of the Welsh Assembly (AMs) about the effectiveness of their institutions in their first five years, and collected information about working patterns. The third and largest project (funded by the Economic and Social Research Council (ESRC)) extended the first two projects and added extra data, making particular reference to the impact of the new devolved institutions on local constituency representation. The previous importance of Scottish and Welsh MPs' constituency roles has been well documented. Devolution meant the arrival of one additional elected representative for each constituency in Scotland and Wales, as well as list members in each region (four in each of five regions in Wales, and seven in each of eight regions in Scotland). This dataset documents the local constituency roles adopted by members of the new institutions, the resultant changes to the local roles of Scottish and Welsh MPs, the local relationships that developed between these different sets of members, and the effectiveness of official rulings and guidance about these relationships. It allows some assessment both of the additional member electoral systems used in Scotland and Wales, and the new multi-tier system of representation in the United Kingdom (UK). Although some data were collected from English MPs in several of the surveys, the focus of the project is specifically on Scotland and Wales, hence the title of this study. Main Topics: The questionnaires cover various topics including hours worked, time spent on parliamentary, party and constituency tasks, correspondence and enquiries received from constituents, attitudes to devolved government, political systems and related issues, ideas about the role of MPs, and political representation. Responses to some open-ended questions are also included in the dataset.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Object recognition predominately still relies on many high-quality training examples per object category. In contrast, learning new objects from only a few examples could enable many impactful applications from robotics to user personalization. Most few-shot learning research, however, has been driven by benchmark datasets that lack the high variation that these applications will face when deployed in the real-world. To close this gap, we present the ORBIT dataset, grounded in a real-world application of teachable object recognizers for people who are blind/low vision. We provide a full, unfiltered dataset of 4,733 videos of 588 objects recorded by 97 people who are blind/low-vision on their mobile phones, and a benchmark dataset of 3,822 videos of 486 objects collected by 77 collectors. The code for loading the dataset, computing all benchmark metrics, and running the baseline models is available at https://github.com/microsoft/ORBIT-DatasetThis version comprises several zip files:- train, validation, test: benchmark dataset, organised by collector, with raw videos split into static individual frames in jpg format at 30FPS- other: data not in the benchmark set, organised by collector, with raw videos split into static individual frames in jpg format at 30FPS (please note that the train, validation, test, and other files make up the unfiltered dataset)- *_224: as for the benchmark, but static individual frames are scaled down to 224 pixels.- *_unfiltered_videos: full unfiltered dataset, organised by collector, in mp4 format.
AP VoteCast is a survey of the American electorate conducted by NORC at the University of Chicago for Fox News, NPR, PBS NewsHour, Univision News, USA Today Network, The Wall Street Journal and The Associated Press.
AP VoteCast combines interviews with a random sample of registered voters drawn from state voter files with self-identified registered voters selected using nonprobability approaches. In general elections, it also includes interviews with self-identified registered voters conducted using NORC’s probability-based AmeriSpeak® panel, which is designed to be representative of the U.S. population.
Interviews are conducted in English and Spanish. Respondents may receive a small monetary incentive for completing the survey. Participants selected as part of the random sample can be contacted by phone and mail and can take the survey by phone or online. Participants selected as part of the nonprobability sample complete the survey online.
In the 2020 general election, the survey of 133,103 interviews with registered voters was conducted between Oct. 26 and Nov. 3, concluding as polls closed on Election Day. AP VoteCast delivered data about the presidential election in all 50 states as well as all Senate and governors’ races in 2020.
This is survey data and must be properly weighted during analysis: DO NOT REPORT THIS DATA AS RAW OR AGGREGATE NUMBERS!!
Instead, use statistical software such as R or SPSS to weight the data.
National Survey
The national AP VoteCast survey of voters and nonvoters in 2020 is based on the results of the 50 state-based surveys and a nationally representative survey of 4,141 registered voters conducted between Nov. 1 and Nov. 3 on the probability-based AmeriSpeak panel. It included 41,776 probability interviews completed online and via telephone, and 87,186 nonprobability interviews completed online. The margin of sampling error is plus or minus 0.4 percentage points for voters and 0.9 percentage points for nonvoters.
State Surveys
In 20 states in 2020, AP VoteCast is based on roughly 1,000 probability-based interviews conducted online and by phone, and roughly 3,000 nonprobability interviews conducted online. In these states, the margin of sampling error is about plus or minus 2.3 percentage points for voters and 5.5 percentage points for nonvoters.
In an additional 20 states, AP VoteCast is based on roughly 500 probability-based interviews conducted online and by phone, and roughly 2,000 nonprobability interviews conducted online. In these states, the margin of sampling error is about plus or minus 2.9 percentage points for voters and 6.9 percentage points for nonvoters.
In the remaining 10 states, AP VoteCast is based on about 1,000 nonprobability interviews conducted online. In these states, the margin of sampling error is about plus or minus 4.5 percentage points for voters and 11.0 percentage points for nonvoters.
Although there is no statistically agreed upon approach for calculating margins of error for nonprobability samples, these margins of error were estimated using a measure of uncertainty that incorporates the variability associated with the poll estimates, as well as the variability associated with the survey weights as a result of calibration. After calibration, the nonprobability sample yields approximately unbiased estimates.
As with all surveys, AP VoteCast is subject to multiple sources of error, including from sampling, question wording and order, and nonresponse.
Sampling Details
Probability-based Registered Voter Sample
In each of the 40 states in which AP VoteCast included a probability-based sample, NORC obtained a sample of registered voters from Catalist LLC’s registered voter database. This database includes demographic information, as well as addresses and phone numbers for registered voters, allowing potential respondents to be contacted via mail and telephone. The sample is stratified by state, partisanship, and a modeled likelihood to respond to the postcard based on factors such as age, race, gender, voting history, and census block group education. In addition, NORC attempted to match sampled records to a registered voter database maintained by L2, which provided additional phone numbers and demographic information.
Prior to dialing, all probability sample records were mailed a postcard inviting them to complete the survey either online using a unique PIN or via telephone by calling a toll-free number. Postcards were addressed by name to the sampled registered voter if that individual was under age 35; postcards were addressed to “registered voter” in all other cases. Telephone interviews were conducted with the adult that answered the phone following confirmation of registered voter status in the state.
Nonprobability Sample
Nonprobability participants include panelists from Dynata or Lucid, including members of its third-party panels. In addition, some registered voters were selected from the voter file, matched to email addresses by V12, and recruited via an email invitation to the survey. Digital fingerprint software and panel-level ID validation is used to prevent respondents from completing the AP VoteCast survey multiple times.
AmeriSpeak Sample
During the initial recruitment phase of the AmeriSpeak panel, randomly selected U.S. households were sampled with a known, non-zero probability of selection from the NORC National Sample Frame and then contacted by mail, email, telephone and field interviewers (face-to-face). The panel provides sample coverage of approximately 97% of the U.S. household population. Those excluded from the sample include people with P.O. Box-only addresses, some addresses not listed in the U.S. Postal Service Delivery Sequence File and some newly constructed dwellings. Registered voter status was confirmed in field for all sampled panelists.
Weighting Details
AP VoteCast employs a four-step weighting approach that combines the probability sample with the nonprobability sample and refines estimates at a subregional level within each state. In a general election, the 50 state surveys and the AmeriSpeak survey are weighted separately and then combined into a survey representative of voters in all 50 states.
State Surveys
First, weights are constructed separately for the probability sample (when available) and the nonprobability sample for each state survey. These weights are adjusted to population totals to correct for demographic imbalances in age, gender, education and race/ethnicity of the responding sample compared to the population of registered voters in each state. In 2020, the adjustment targets are derived from a combination of data from the U.S. Census Bureau’s November 2018 Current Population Survey Voting and Registration Supplement, Catalist’s voter file and the Census Bureau’s 2018 American Community Survey. Prior to adjusting to population totals, the probability-based registered voter list sample weights are adjusted for differential non-response related to factors such as availability of phone numbers, age, race and partisanship.
Second, all respondents receive a calibration weight. The calibration weight is designed to ensure the nonprobability sample is similar to the probability sample in regard to variables that are predictive of vote choice, such as partisanship or direction of the country, which cannot be fully captured through the prior demographic adjustments. The calibration benchmarks are based on regional level estimates from regression models that incorporate all probability and nonprobability cases nationwide.
Third, all respondents in each state are weighted to improve estimates for substate geographic regions. This weight combines the weighted probability (if available) and nonprobability samples, and then uses a small area model to improve the estimate within subregions of a state.
Fourth, the survey results are weighted to the actual vote count following the completion of the election. This weighting is done in 10–30 subregions within each state.
National Survey
In a general election, the national survey is weighted to combine the 50 state surveys with the nationwide AmeriSpeak survey. Each of the state surveys is weighted as described. The AmeriSpeak survey receives a nonresponse-adjusted weight that is then adjusted to national totals for registered voters that in 2020 were derived from the U.S. Census Bureau’s November 2018 Current Population Survey Voting and Registration Supplement, the Catalist voter file and the Census Bureau’s 2018 American Community Survey. The state surveys are further adjusted to represent their appropriate proportion of the registered voter population for the country and combined with the AmeriSpeak survey. After all votes are counted, the national data file is adjusted to match the national popular vote for president.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Context
The dataset tabulates the population of Preston town by race. It includes the population of Preston town across racial categories (excluding ethnicity) as identified by the Census Bureau. The dataset can be utilized to understand the population distribution of Preston town across relevant racial categories.
Key observations
The percent distribution of Preston town population by race (across all racial categories recognized by the U.S. Census Bureau): 90.15% are white, 1.61% are Black or African American, 0.66% are American Indian and Alaska Native and 7.59% are multiracial.
When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates.
Racial categories include:
Variables / Data Columns
Good to know
Margin of Error
Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.
Custom data
If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.
Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.
This dataset is a part of the main dataset for Preston town Population by Race & Ethnicity. You can refer the same here
https://doi.org/10.17026/fp39-0x58https://doi.org/10.17026/fp39-0x58
The Dutch Parliamentary Election Study 2023 has been conducted by the Foundation of Electoral Studies in The Netherlands (Stichting Kiezeronderzoek Nederland; SKON). The Dutch Parliamentary Election Studies (DPES) are a series of national surveys carried out under the auspices of the Dutch Electoral Research Foundation (SKON). These surveys have been conducted since 1971. Many questions are replicated across studies, although fresh questions are included in each new round. The major substantive areas consistently covered include the respondents' attitudes toward and expectations of the government and its effectiveness in both domestic and foreign policy, the most important problems facing the people of the Netherlands, the respondents' voting behavior and participation history, and his/her knowledge of and faith in the nation's political leaders. The DPES data were collected via two different panels: via the LISS-panel monitored by CenterData and via the non-self subscribe panel from I&O-research. Both samples were used to build the DPES2023. The DPES2023 data consists of three datasets, which can be merged, but if done so, the data will not be representative anymore. 1. The standard DPES2023 dataset, aimed to be representative for the 2023 eligible voters. It includes weights to address population distortions. This dataset also includes items that were formulated by Young Scholars; these items become available from February 2025 onwards. 2. [not available yet] A dataset including eligible voters with a migration background 3. [not available yet] A dataset including eligible voters with an education other than tertiary higher vocational (hbo) or university
This dataset is the product of three related research projects. The first project (funded by the Leverhulme Trust) examined the impact of devolution on the work of British Members of Parliament (MPs), particularly in Scotland and Wales. The second project (also funded by the Leverhulme Trust) gathered the views of Members of the Scottish Parliament (MSPs) and Members of the Welsh Assembly (AMs) about the effectiveness of their institutions in their first five years, and collected information about working patterns. The third and largest project (funded by the Economic and Social Research Council (ESRC)) extended the first two projects and added extra data, making particular reference to the impact of the new devolved institutions on local constituency representation.
The previous importance of Scottish and Welsh MPs' constituency roles has been well documented. Devolution meant the arrival of one additional elected representative for each constituency in Scotland and Wales, as well as list members in each region (four in each of five regions in Wales, and seven in each of eight regions in Scotland). This dataset documents the local constituency roles adopted by members of the new institutions, the resultant changes to the local roles of Scottish and Welsh MPs, the local relationships that developed between these different sets of members, and the effectiveness of official rulings and guidance about these relationships. It allows some assessment both of the additional member electoral systems used in Scotland and Wales, and the new multi-tier system of representation in the United Kingdom (UK).
Although some data were collected from English MPs in several of the surveys, the focus of the project is specifically on Scotland and Wales, hence the title of this study.
https://dataverse.ada.edu.au/api/datasets/:persistentId/versions/2.0/customlicense?persistentId=doi:10.26193/EB1DA0https://dataverse.ada.edu.au/api/datasets/:persistentId/versions/2.0/customlicense?persistentId=doi:10.26193/EB1DA0
Summary details for each election year for the House of Representatives elections since 1901. This data includes electoral system characteristics, seats in chamber, number of enrolled voters, ballots cast, rate of voter turnout and rate of informal voting for Victoria.
Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
National and subnational mid-year population estimates for the UK and its constituent countries by administrative area, age and sex (including components of population change, median age and population density).
Round 1 of the Afrobarometer survey was conducted from July 1999 through June 2001 in 12 African countries, to solicit public opinion on democracy, governance, markets, and national identity. The full 12 country dataset released was pieced together out of different projects, Round 1 of the Afrobarometer survey,the old Southern African Democracy Barometer, and similar surveys done in West and East Africa.
The 7 country dataset is a subset of the Round 1 survey dataset, and consists of a combined dataset for the 7 Southern African countries surveyed with other African countries in Round 1, 1999-2000 (Botswana, Lesotho, Malawi, Namibia, South Africa, Zambia and Zimbabwe). It is a useful dataset because, in contrast to the full 12 country Round 1 dataset, all countries in this dataset were surveyed with the identical questionnaire
Botswana Lesotho Malawi Namibia South Africa Zambia Zimbabwe
Basic units of analysis that the study investigates include: individuals and groups
Sample survey data [ssd]
A new sample has to be drawn for each round of Afrobarometer surveys. Whereas the standard sample size for Round 3 surveys will be 1200 cases, a larger sample size will be required in societies that are extremely heterogeneous (such as South Africa and Nigeria), where the sample size will be increased to 2400. Other adaptations may be necessary within some countries to account for the varying quality of the census data or the availability of census maps.
The sample is designed as a representative cross-section of all citizens of voting age in a given country. The goal is to give every adult citizen an equal and known chance of selection for interview. We strive to reach this objective by (a) strictly applying random selection methods at every stage of sampling and by (b) applying sampling with probability proportionate to population size wherever possible. A randomly selected sample of 1200 cases allows inferences to national adult populations with a margin of sampling error of no more than plus or minus 2.5 percent with a confidence level of 95 percent. If the sample size is increased to 2400, the confidence interval shrinks to plus or minus 2 percent.
Sample Universe
The sample universe for Afrobarometer surveys includes all citizens of voting age within the country. In other words, we exclude anyone who is not a citizen and anyone who has not attained this age (usually 18 years) on the day of the survey. Also excluded are areas determined to be either inaccessible or not relevant to the study, such as those experiencing armed conflict or natural disasters, as well as national parks and game reserves. As a matter of practice, we have also excluded people living in institutionalized settings, such as students in dormitories and persons in prisons or nursing homes.
What to do about areas experiencing political unrest? On the one hand we want to include them because they are politically important. On the other hand, we want to avoid stretching out the fieldwork over many months while we wait for the situation to settle down. It was agreed at the 2002 Cape Town Planning Workshop that it is difficult to come up with a general rule that will fit all imaginable circumstances. We will therefore make judgments on a case-by-case basis on whether or not to proceed with fieldwork or to exclude or substitute areas of conflict. National Partners are requested to consult Core Partners on any major delays, exclusions or substitutions of this sort.
Sample Design
The sample design is a clustered, stratified, multi-stage, area probability sample.
To repeat the main sampling principle, the objective of the design is to give every sample element (i.e. adult citizen) an equal and known chance of being chosen for inclusion in the sample. We strive to reach this objective by (a) strictly applying random selection methods at every stage of sampling and by (b) applying sampling with probability proportionate to population size wherever possible.
In a series of stages, geographically defined sampling units of decreasing size are selected. To ensure that the sample is representative, the probability of selection at various stages is adjusted as follows:
The sample is stratified by key social characteristics in the population such as sub-national area (e.g. region/province) and residential locality (urban or rural). The area stratification reduces the likelihood that distinctive ethnic or language groups are left out of the sample. And the urban/rural stratification is a means to make sure that these localities are represented in their correct proportions. Wherever possible, and always in the first stage of sampling, random sampling is conducted with probability proportionate to population size (PPPS). The purpose is to guarantee that larger (i.e., more populated) geographical units have a proportionally greater probability of being chosen into the sample. The sampling design has four stages
A first-stage to stratify and randomly select primary sampling units;
A second-stage to randomly select sampling start-points;
A third stage to randomly choose households;
A final-stage involving the random selection of individual respondents
We shall deal with each of these stages in turn.
STAGE ONE: Selection of Primary Sampling Units (PSUs)
The primary sampling units (PSU's) are the smallest, well-defined geographic units for which reliable population data are available. In most countries, these will be Census Enumeration Areas (or EAs). Most national census data and maps are broken down to the EA level. In the text that follows we will use the acronyms PSU and EA interchangeably because, when census data are employed, they refer to the same unit.
We strongly recommend that NIs use official national census data as the sampling frame for Afrobarometer surveys. Where recent or reliable census data are not available, NIs are asked to inform the relevant Core Partner before they substitute any other demographic data. Where the census is out of date, NIs should consult a demographer to obtain the best possible estimates of population growth rates. These should be applied to the outdated census data in order to make projections of population figures for the year of the survey. It is important to bear in mind that population growth rates vary by area (region) and (especially) between rural and urban localities. Therefore, any projected census data should include adjustments to take such variations into account.
Indeed, we urge NIs to establish collegial working relationships within professionals in the national census bureau, not only to obtain the most recent census data, projections, and maps, but to gain access to sampling expertise. NIs may even commission a census statistician to draw the sample to Afrobarometer specifications, provided that provision for this service has been made in the survey budget.
Regardless of who draws the sample, the NIs should thoroughly acquaint themselves with the strengths and weaknesses of the available census data and the availability and quality of EA maps. The country and methodology reports should cite the exact census data used, its known shortcomings, if any, and any projections made from the data. At minimum, the NI must know the size of the population and the urban/rural population divide in each region in order to specify how to distribute population and PSU's in the first stage of sampling. National investigators should obtain this written data before they attempt to stratify the sample.
Once this data is obtained, the sample population (either 1200 or 2400) should be stratified, first by area (region/province) and then by residential locality (urban or rural). In each case, the proportion of the sample in each locality in each region should be the same as its proportion in the national population as indicated by the updated census figures.
Having stratified the sample, it is then possible to determine how many PSU's should be selected for the country as a whole, for each region, and for each urban or rural locality.
The total number of PSU's to be selected for the whole country is determined by calculating the maximum degree of clustering of interviews one can accept in any PSU. Because PSUs (which are usually geographically small EAs) tend to be socially homogenous we do not want to select too many people in any one place. Thus, the Afrobarometer has established a standard of no more than 8 interviews per PSU. For a sample size of 1200, the sample must therefore contain 150 PSUs/EAs (1200 divided by 8). For a sample size of 2400, there must be 300 PSUs/EAs.
These PSUs should then be allocated proportionally to the urban and rural localities within each regional stratum of the sample. Let's take a couple of examples from a country with a sample size of 1200. If the urban locality of Region X in this country constitutes 10 percent of the current national population, then the sample for this stratum should be 15 PSUs (calculated as 10 percent of 150 PSUs). If the rural population of Region Y constitutes 4 percent of the current national population, then the sample for this stratum should be 6 PSU's.
The next step is to select particular PSUs/EAs using random methods. Using the above example of the rural localities in Region Y, let us say that you need to pick 6 sample EAs out of a census list that contains a total of 240 rural EAs in Region Y. But which 6? If the EAs created by the national census bureau are of equal or roughly equal population size, then selection is relatively straightforward. Just number all EAs consecutively, then make six selections using a table of random numbers. This procedure, known as simple random sampling (SRS), will
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Offline reinforcement learning (RL) is a promising direction that allows RL agents to be pre-trained from large datasets avoiding recurrence of expensive data collection. To advance the field, it is crucial to generate large-scale datasets. Compositional RL is particularly appealing for generating such large datasets, since 1) it permits creating many tasks from few components, and 2) the task structure may enable trained agents to solve new tasks by combining relevant learned components. This submission provides four offline RL datasets for simulated robotic manipulation created using the 256 tasks from CompoSuite Mendez et al., 2022. In every task in CompoSuite, a robot arm is used to manipulate an object to achieve an objective all while trying to avoid an obstacle. There are for components for each of these four axes that can be combined arbitrarily leading to a total of 256 tasks. The component choices are * Robot: IIWA, Jaco, Kinova3, Panda* Object: Hollow box, box, dumbbell, plate* Objective: Push, pick and place, put in shelf, put in trashcan* Obstacle: None, wall between robot and object, wall between goal and object, door between goal and object The four included datasets are collected using separate agents each trained to a different degree of performance, and each dataset consists of 256 million transitions. The degrees of performance are expert data, medium data, warmstart data and replay data: * Expert dataset: Transitions from an expert agent that was trained to achieve 90% success on every task.* Medium dataset: Transitions from a medium agent that was trained to achieve 30% success on every task.* Warmstart dataset: Transitions from a Soft-actor critic agent trained for a fixed duration of one million steps.* Medium-replay-subsampled dataset: Transitions that were stored during the training of a medium agent up to 30% success. These datasets are intended for the combined study of compositional generalization and offline reinforcement learning. Methods The datasets were collected by using several deep reinforcement learning agents trained to the various degrees of performance described above on the CompoSuite benchmark (https://github.com/Lifelong-ML/CompoSuite) which builds on top of robosuite (https://github.com/ARISE-Initiative/robosuite) and uses the MuJoCo simulator (https://github.com/deepmind/mujoco). During reinforcement learning training, we stored the data that was collected by each agent in a separate buffer for post-processing. Then, after training, to collect the expert and medium dataset, we run the trained agents for 2000 trajectories of length 500 online in the CompoSuite benchmark and store the trajectories. These add up to a total of 1 million state-transitions tuples per dataset, totalling a full 256 million datapoints per dataset. The warmstart and medium-replay-subsampled dataset contain trajectories from the stored training buffer of the SAC agent trained for a fixed duration and the medium agent respectively. For medium-replay-subsampled data, we uniformly sample trajectories from the training buffer until we reach more than 1 million transitions. Since some of the tasks have termination conditions, some of these trajectories are trunctated and not of length 500. This sometimes results in a number of sampled transitions larger than 1 million. Therefore, after sub-sampling, we artificially truncate the last trajectory and place a timeout at the final position. This can in some rare cases lead to one incorrect trajectory if the datasets are used for finite horizon experimentation. However, this truncation is required to ensure consistent dataset sizes, easy data readability and compatibility with other standard code implementations. The four datasets are split into four tar.gz folders each yielding a total of 12 compressed folders. Every sub-folder contains all the tasks for one of the four robot arms for that dataset. In other words, every tar.gz folder contains a total of 64 tasks using the same robot arm and four tar.gz files form a full dataset. This is done to enable people to only download a part of the dataset in case they do not need all 256 tasks. For every task, the data is separately stored in an hdf5 file allowing for the usage of arbitrary task combinations and mixing of data qualities across the four datasets. Every task is contained in a folder that is named after the CompoSuite elements it uses. In other words, every task is represented as a folder named
The TIGER/Line shapefiles and related database files (.dbf) are an extract of selected geographic and cartographic information from the U.S. Census Bureau's Master Address File / Topologically Integrated Geographic Encoding and Referencing (MAF/TIGER) Database (MTDB). The MTDB represents a seamless national file with no overlaps or gaps between parts, however, each TIGER/Line shapefile is designed to stand alone as an independent data set, or they can be combined to cover the entire nation. After each decennial census, the Census Bureau delineates urban areas that represent densely developed territory, encompassing residential, commercial, and other nonresidential urban land uses. In general, this territory consists of areas of high population density and urban land use resulting in a representation of the "urban footprint." There are two types of urban areas: urbanized areas (UAs) that contain 50,000 or more people and urban clusters (UCs) that contain at least 2,500 people, but fewer than 50,000 people (except in the U.S. Virgin Islands and Guam which each contain urban clusters with populations greater than 50,000). Each urban area is identified by a 5-character numeric census code that may contain leading zeroes.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset comprises a comprehensive mapping of vine yield at the plant scale over two vine fields located in the southern region of France. Both vine fields were planted with the Vitis vinifera : cv. Syrah. The first field (Field 1) occupies 0.8 ha and data were collected in 2022, while the second field (Field 2) has an area of 0.5 ha and data were collected in 2008. Throughout the growing season, information regarding unproductive vines, inflorescence number, and bunch weight was collected for both vine fields. For both fields, at the flowering stage, the location of each productive and unproductive vines (dead and missing vines) was georeferenced, and the number of inflorescences was manually counted for all productive vines. For Field 1, at harvest, all bunches of the field were manually weighed with an accuracy of ±1 gram and georeferenced precisely (one point per vine). For each vine, total yield (grams per vine) was then computed as as the sum of the weight of its bunches. For Field 2, at harvest, the total yield per vine was estimated based on the weighing of representative bunches obtained from several regularly spaced set of 5 vines. In addition to the yield data, two ancillary data, including soil apparent resistivity measurements and common vegetative index derived from remote sensed imagery, are provided for both vine fields. Overall, the dataset consists of 3644 vines, with 2151 being productive, along with a total count of 33354 inflorescences and 21242 manually weighed bunches at harvest.
Raw data includes 9 geopackage files (.gpkg), one per data type and per field: • “Field1_Dead_Missing_Vines.gpkg” (Figure 1.A) contains the location of missing and dead vine identified in Field 1; • “Field1_Inflorescences.gpkg” and “Field2_Inflorescences.gpkg” (Figure 1.B and Figure 1.C) both contain the location and the number of inflorescence per vine counted during flowering; • “Field1_Final_Yield.gpkg” and “Field2_ Final_Yield.gpkg” (Figure 1.D and Figure 1.E) both contain location and measured values of yield weight per vine at harvest. For Field 1, the list of the bunch weight is also available; • “Field1_Soil_Resistivity.gpkg” and “Field2_ Soil_Resistivity.gpkg” (Figure 1.F and Figure 1.G) contain the electrical resistivity measurements of the soil on each field; • “Field1_Vegetation_Index.gpkg” and “Field2_ Vegetation_Index.gpkg” (Figure 1.H and Figure 1.I) contain vegetation index values, NDVI (Normalized Difference Vegetation Index, without unit) for Field 1 and FCover (Fraction of vegetation Cover, in %) for Field 2.
Aggregated data are composed of two csv files, one for each vine field. Both files aggregate all available yield data for each vine plant (either productive or other types). In both these csv files, each line represent a planted vine. They are 2614 planted vines for Field 1 and 1030 planted vines for Field 2.
A Data in Brief article is associated to this dataset.
Data for CDC’s COVID Data Tracker site on Rates of COVID-19 Cases and Deaths by Vaccination Status. Click 'More' for important dataset description and footnotes
Dataset and data visualization details: These data were posted on October 21, 2022, archived on November 18, 2022, and revised on February 22, 2023. These data reflect cases among persons with a positive specimen collection date through September 24, 2022, and deaths among persons with a positive specimen collection date through September 3, 2022.
Vaccination status: A person vaccinated with a primary series had SARS-CoV-2 RNA or antigen detected on a respiratory specimen collected ≥14 days after verifiably completing the primary series of an FDA-authorized or approved COVID-19 vaccine. An unvaccinated person had SARS-CoV-2 RNA or antigen detected on a respiratory specimen and has not been verified to have received COVID-19 vaccine. Excluded were partially vaccinated people who received at least one FDA-authorized vaccine dose but did not complete a primary series ≥14 days before collection of a specimen where SARS-CoV-2 RNA or antigen was detected. Additional or booster dose: A person vaccinated with a primary series and an additional or booster dose had SARS-CoV-2 RNA or antigen detected on a respiratory specimen collected ≥14 days after receipt of an additional or booster dose of any COVID-19 vaccine on or after August 13, 2021. For people ages 18 years and older, data are graphed starting the week including September 24, 2021, when a COVID-19 booster dose was first recommended by CDC for adults 65+ years old and people in certain populations and high risk occupational and institutional settings. For people ages 12-17 years, data are graphed starting the week of December 26, 2021, 2 weeks after the first recommendation for a booster dose for adolescents ages 16-17 years. For people ages 5-11 years, data are included starting the week of June 5, 2022, 2 weeks after the first recommendation for a booster dose for children aged 5-11 years. For people ages 50 years and older, data on second booster doses are graphed starting the week including March 29, 2022, when the recommendation was made for second boosters. Vertical lines represent dates when changes occurred in U.S. policy for COVID-19 vaccination (details provided above). Reporting is by primary series vaccine type rather than additional or booster dose vaccine type. The booster dose vaccine type may be different than the primary series vaccine type. ** Because data on the immune status of cases and associated deaths are unavailable, an additional dose in an immunocompromised person cannot be distinguished from a booster dose. This is a relevant consideration because vaccines can be less effective in this group. Deaths: A COVID-19–associated death occurred in a person with a documented COVID-19 diagnosis who died; health department staff reviewed to make a determination using vital records, public health investigation, or other data sources. Rates of COVID-19 deaths by vaccination status are reported based on when the patient was tested for COVID-19, not the date they died. Deaths usually occur up to 30 days after COVID-19 diagnosis. Participating jurisdictions: Currently, these 31 health departments that regularly link their case surveillance to immunization information system data are included in these incidence rate estimates: Alabama, Arizona, Arkansas, California, Colorado, Connecticut, District of Columbia, Florida, Georgia, Idaho, Indiana, Kansas, Kentucky, Louisiana, Massachusetts, Michigan, Minnesota, Nebraska, New Jersey, New Mexico, New York, New York City (New York), North Carolina, Philadelphia (Pennsylvania), Rhode Island, South Dakota, Tennessee, Texas, Utah, Washington, and West Virginia; 30 jurisdictions also report deaths among vaccinated and unvaccinated people. These jurisdictions represent 72% of the total U.S. population and all ten of the Health and Human Services Regions. Data on cases among people who received additional or booster doses were reported from 31 jurisdictions; 30 jurisdictions also reported data on deaths among people who received one or more additional or booster dose; 28 jurisdictions reported cases among people who received two or more additional or booster doses; and 26 jurisdictions reported deaths among people who received two or more additional or booster doses. This list will be updated as more jurisdictions participate. Incidence rate estimates: Weekly age-specific incidence rates by vaccination status were calculated as the number of cases or deaths divided by the number of people vaccinated with a primary series, overall or with/without a booster dose (cumulative) or unvaccinated (obtained by subtracting the cumulative number of people vaccinated with a primary series and partially vaccinated people from the 2019 U.S. intercensal population estimates) and multiplied by 100,000. Overall incidence rates were age-standardized using the 2000 U.S. Census standard population. To estimate population counts for ages 6 months through 1 year, half of the single-year population counts for ages 0 through 1 year were used. All rates are plotted by positive specimen collection date to reflect when incident infections occurred. For the primary series analysis, age-standardized rates include ages 12 years and older from April 4, 2021 through December 4, 2021, ages 5 years and older from December 5, 2021 through July 30, 2022 and ages 6 months and older from July 31, 2022 onwards. For the booster dose analysis, age-standardized rates include ages 18 years and older from September 19, 2021 through December 25, 2021, ages 12 years and older from December 26, 2021, and ages 5 years and older from June 5, 2022 onwards. Small numbers could contribute to less precision when calculating death rates among some groups. Continuity correction: A continuity correction has been applied to the denominators by capping the percent population coverage at 95%. To do this, we assumed that at least 5% of each age group would always be unvaccinated in each jurisdiction. Adding this correction ensures that there is always a reasonable denominator for the unvaccinated population that would prevent incidence and death rates from growing unrealistically large due to potential overestimates of vaccination coverage. Incidence rate ratios (IRRs): IRRs for the past one month were calculated by dividing the average weekly incidence rates among unvaccinated people by that among people vaccinated with a primary series either overall or with a booster dose. Publications: Scobie HM, Johnson AG, Suthar AB, et al. Monitoring Incidence of COVID-19 Cases, Hospitalizations, and Deaths, by Vaccination Status — 13 U.S. Jurisdictions, April 4–July 17, 2021. MMWR Morb Mortal Wkly Rep 2021;70:1284–1290. Johnson AG, Amin AB, Ali AR, et al. COVID-19 Incidence and Death Rates Among Unvaccinated and Fully Vaccinated Adults with and Without Booster Doses During Periods of Delta and Omicron Variant Emergence — 25 U.S. Jurisdictions, April 4–December 25, 2021. MMWR Morb Mortal Wkly Rep 2022;71:132–138. Johnson AG, Linde L, Ali AR, et al. COVID-19 Incidence and Mortality Among Unvaccinated and Vaccinated Persons Aged ≥12 Years by Receipt of Bivalent Booster Doses and Time Since Vaccination — 24 U.S. Jurisdictions, October 3, 2021–December 24, 2022. MMWR Morb Mortal Wkly Rep 2023;72:145–152. Johnson AG, Linde L, Payne AB, et al. Notes from the Field: Comparison of COVID-19 Mortality Rates Among Adults Aged ≥65 Years Who Were Unvaccinated and Those Who Received a Bivalent Booster Dose Within the Preceding 6 Months — 20 U.S. Jurisdictions, September 18, 2022–April 1, 2023. MMWR Morb Mortal Wkly Rep 2023;72:667–669.
We interact with people everyday and much of this interaction requires easy access to information about those people. Particularly important in current research is the influence of personal and emotional significance. Recent research has shown that personal memories can facilitate access to information about famous people. However, we do not know how these personal memories interact with our emotional responses to people, nor is it clear at what stages in the face recognition process these factors have their greatest influence. The aim of this project is to examine the contribution of these variables in accessing knowledge about people. Three studies are proposed; The first will focus on our emotional responses to famous people to determine whether this variable improves our ability to recognise faces as familiar and to access; biographical information (eg occupation); The second will examine the influence of personal significance when making familiarity and semantic judgements; The final study addresses questions about the independent and combined influence of emotional and personal significance in face recognition. These issues will be examined in healthy adults and people with prosopagnosia using behavioural and eye tracking measures. Seven SPSS files: One file comprises data collected via on-line survey on familiarity with famous faces and contains responses from 49 older people. The remaining six files contain experimental data from healthy controls, two for each study with familiarity and semantic judgment data stored separately. Number of participants were 20, 20 and 14 for studies 1 to 3, respectively.
Updated 10/6/2022: In the Time/Distance analysis process, points that were found to have been included initially, but with no significant or year-round population were removed. The layer of removed points is also available for viewing. MCNA - Removed Population PointsThe Network Adequacy Standards Representative Population Points feature layer contains 97,694 points spread across California that were created from USPS postal delivery route data and US Census data. Each population point also contains the variables for Time and Distance Standards for the County that the point is within. These standards differ by County due to the County "type" which is based on the population density of the county. There are 5 county categories within California: Rural (<50 people/sq mile), Small (51-200 people/sq mile), Medium (201-599 people/sq mile), and Dense (>600 people/sq mile). The Time and Distance data is divided out by Provider Type, Adult and Pediatric separately, so that the Time or Distance analysis can be performed with greater detail. HospitalsOB/GYN SpecialtyAdult Cardiology/Interventional CardiologyAdult DermatologyAdult EndocrinologyAdult ENT/OtolaryngologyAdult GastroenterologyAdult General SurgeryAdult HematologyAdult HIV/AIDS/Infectious DiseaseAdult Mental Health Outpatient ServicesAdult NephrologyAdult NeurologyAdult OncologyAdult OphthalmologyAdult Orthopedic SurgeryAdult PCPAdult Physical Medicine and RehabilitationAdult PsychiatryAdult PulmonologyPediatric Cardiology/Interventional CardiologyPediatric DermatologyPediatric EndocrinologyPediatric ENT/OtolaryngologyPediatric GastroenterologyPediatric General SurgeryPediatric HematologyPediatric HIV/AIDS/Infectious DiseasePediatric Mental Health Outpatient ServicesPediatric NephrologyPediatric NeurologyPediatric OncologyPediatric OphthalmologyPediatric Orthopedic SurgeryPediatric PCPPediatric Physical Medicine and RehabilitationPediatric PsychiatryPediatric Pulmonology
The 119th Congressional Districts dataset reflects boundaries from January 03, 2025 from the United States Census Bureau (USCB), and the attributes are updated every Sunday from the United States House of Representatives and is part of the U.S. Department of Transportation (USDOT)/Bureau of Transportation Statistics (BTS) National Transportation Atlas Database (NTAD). The TIGER/Line shapefiles and related database files (.dbf) are an extract of selected geographic and cartographic information from the U.S. Census Bureau's Master Address File / Topologically Integrated Geographic Encoding and Referencing (MAF/TIGER) Database (MTDB). The MTDB represents a seamless national file with no overlaps or gaps between parts, however, each TIGER/Line shapefile is designed to stand alone as an independent data set, or they can be combined to cover the entire nation. Information for each member of Congress is appended to the Census Congressional District shapefile using information from the Office of the Clerk, U.S. House of Representatives' website https://clerk.house.gov/xml/lists/MemberData.xml and its corresponding XML file. Congressional districts are the 435 areas from which people are elected to the U.S. House of Representatives. This dataset also includes 9 geographies for non-voting at large delegate districts, resident commissioner districts, and congressional districts that are not defined. After the apportionment of congressional seats among the states based on census population counts, each state is responsible for establishing congressional districts for the purpose of electing representatives. Each congressional district is to be as equal in population to all other congressional districts in a state as practicable. The 119th Congress is seated from January 3, 2025 through January 3, 2027. In Connecticut, Illinois, and New Hampshire, the Redistricting Data Program (RDP) participant did not define the CDs to cover all of the state or state equivalent area. In these areas with no CDs defined, the code "ZZ" has been assigned, which is treated as a single CD for purposes of data presentation. The TIGER/Line shapefiles for the District of Columbia, Puerto Rico, and the Island Areas (American Samoa, Guam, the Commonwealth of the Northern Mariana Islands, and the U.S. Virgin Islands) each contain a single record for the non-voting delegate district in these areas. The boundaries of all other congressional districts reflect information provided to the Census Bureau by the states by May 31, 2024. A data dictionary, or other source of attribute information, is accessible at https://doi.org/10.21949/1529006