PROBLEM AND OPPORTUNITY In the United States, voting is largely a private matter. A registered voter is given a randomized ballot form or machine to prevent linkage between their voting choices and their identity. This disconnect supports confidence in the election process, but it provides obstacles to an election's analysis. A common solution is to field exit polls, interviewing voters immediately after leaving their polling location. This method is rife with bias, however, and functionally limited in direct demographics data collected. For the 2020 general election, though, most states published their election results for each voting location. These publications were additionally supported by the geographical areas assigned to each location, the voting precincts. As a result, geographic processing can now be applied to project precinct election results onto Census block groups. While precinct have few demographic traits directly, their geographies have characteristics that make them projectable onto U.S. Census geographies. Both state voting precincts and U.S. Census block groups: are exclusive, and do not overlap are adjacent, fully covering their corresponding state and potentially county have roughly the same size in area, population and voter presence Analytically, a projection of local demographics does not allow conclusions about voters themselves. However, the dataset does allow statements related to the geographies that yield voting behavior. One could say, for example, that an area dominated by a particular voting pattern would have mean traits of age, race, income or household structure. The dataset that results from this programming provides voting results allocated by Census block groups. The block group identifier can be joined to Census Decennial and American Community Survey demographic estimates. DATA SOURCES The state election results and geographies have been compiled by Voting and Election Science team on Harvard's dataverse. State voting precincts lie within state and county boundaries. The Census Bureau, on the other hand, publishes its estimates across a variety of geographic definitions including a hierarchy of states, counties, census tracts and block groups. Their definitions can be found here. The geometric shapefiles for each block group are available here. The lowest level of this geography changes often and can obsolesce before the next census survey (Decennial or American Community Survey programs). The second to lowest census level, block groups, have the benefit of both granularity and stability however. The 2020 Decennial survey details US demographics into 217,740 block groups with between a few hundred and a few thousand people. Dataset Structure The dataset's columns include: Column Definition BLOCKGROUP_GEOID 12 digit primary key. Census GEOID of the block group row. This code concatenates: 2 digit state 3 digit county within state 6 digit Census Tract identifier 1 digit Census Block Group identifier within tract STATE State abbreviation, redundent with 2 digit state FIPS code above REP Votes for Republican party candidate for president DEM Votes for Democratic party candidate for president LIB Votes for Libertarian party candidate for president OTH Votes for presidential candidates other than Republican, Democratic or Libertarian AREA square kilometers of area associated with this block group GAP total area of the block group, net of area attributed to voting precincts PRECINCTS Number of voting precincts that intersect this block group ASSUMPTIONS, NOTES AND CONCERNS: Votes are attributed based upon the proportion of the precinct's area that intersects the corresponding block group. Alternative methods are left to the analyst's initiative. 50 states and the District of Columbia are in scope as those U.S. possessions voting in the general election for the U.S. Presidency. Three states did not report their results at the precinct level: South Dakota, Kentucky and West Virginia. A dummy block group is added for each of these states to maintain national totals. These states represent 2.1% of all votes cast. Counties are commonly coded using FIPS codes. However, each election result file may have the county field named differently. Also, three states do not share county definitions - Delaware, Massachusetts, Alaska and the District of Columbia. Block groups may be used to capture geographies that do not have population like bodies of water. As a result, block groups without intersection voting precincts are not uncommon. In the U.S., elections are administered at a state level with the Federal Elections Commission compiling state totals against the Electoral College weights. The states have liberty, though, to define and change their own voting precincts https://en.wikipedia.org/wiki/Electoral_precinct. The Census Bureau... Visit https://dataone.org/datasets/sha256%3A05707c1dc04a814129f751937a6ea56b08413546b18b351a85bc96da16a7f8b5 for complete metadata about this dataset.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
PROBLEM AND OPPORTUNITY In the United States, voting is largely a private matter. A registered voter is given a randomized ballot form or machine to prevent linkage between their voting choices and their identity. This disconnect supports confidence in the election process, but it provides obstacles to an election's analysis. A common solution is to field exit polls, interviewing voters immediately after leaving their polling location. This method is rife with bias, however, and functionally limited in direct demographics data collected. For the 2020 general election, though, most states published their election results for each voting location. These publications were additionally supported by the geographical areas assigned to each location, the voting precincts. As a result, geographic processing can now be applied to project precinct election results onto Census block groups. While precinct have few demographic traits directly, their geographies have characteristics that make them projectable onto U.S. Census geographies. Both state voting precincts and U.S. Census block groups: are exclusive, and do not overlap are adjacent, fully covering their corresponding state and potentially county have roughly the same size in area, population and voter presence Analytically, a projection of local demographics does not allow conclusions about voters themselves. However, the dataset does allow statements related to the geographies that yield voting behavior. One could say, for example, that an area dominated by a particular voting pattern would have mean traits of age, race, income or household structure. The dataset that results from this programming provides voting results allocated by Census block groups. The block group identifier can be joined to Census Decennial and American Community Survey demographic estimates. DATA SOURCES The state election results and geographies have been compiled by Voting and Election Science team on Harvard's dataverse. State voting precincts lie within state and county boundaries. The Census Bureau, on the other hand, publishes its estimates across a variety of geographic definitions including a hierarchy of states, counties, census tracts and block groups. Their definitions can be found here. The geometric shapefiles for each block group are available here. The lowest level of this geography changes often and can obsolesce before the next census survey (Decennial or American Community Survey programs). The second to lowest census level, block groups, have the benefit of both granularity and stability however. The 2020 Decennial survey details US demographics into 217,740 block groups with between a few hundred and a few thousand people. Dataset Structure The dataset's columns include: Column Definition BLOCKGROUP_GEOID 12 digit primary key. Census GEOID of the block group row. This code concatenates: 2 digit state 3 digit county within state 6 digit Census Tract identifier 1 digit Census Block Group identifier within tract STATE State abbreviation, redundent with 2 digit state FIPS code above REP Votes for Republican party candidate for president DEM Votes for Democratic party candidate for president LIB Votes for Libertarian party candidate for president OTH Votes for presidential candidates other than Republican, Democratic or Libertarian AREA square kilometers of area associated with this block group GAP total area of the block group, net of area attributed to voting precincts PRECINCTS Number of voting precincts that intersect this block group ASSUMPTIONS, NOTES AND CONCERNS: Votes are attributed based upon the proportion of the precinct's area that intersects the corresponding block group. Alternative methods are left to the analyst's initiative. 50 states and the District of Columbia are in scope as those U.S. possessions voting in the general election for the U.S. Presidency. Three states did not report their results at the precinct level: South Dakota, Kentucky and West Virginia. A dummy block group is added for each of these states to maintain national totals. These states represent 2.1% of all votes cast. Counties are commonly coded using FIPS codes. However, each election result file may have the county field named differently. Also, three states do not share county definitions - Delaware, Massachusetts, Alaska and the District of Columbia. Block groups may be used to capture geographies that do not have population like bodies of water. As a result, block groups without intersection voting precincts are not uncommon. In the U.S., elections are administered at a state level with the Federal Elections Commission compiling state totals against the Electoral College weights. The states have liberty, though, to define and change their own voting precincts https://en.wikipedia.org/wiki/Electoral_precinct. The Census Bureau practices "data suppression", filtering some block groups from demographic publication because they do not meet a population threshold. This practice...
In early February 2024, we will be retiring the Mpox Vaccinations Given to SF Residents by Demographics dataset. This dataset will be archived and no longer update. A historic record of this data will remain available.
A. SUMMARY This dataset represents doses of mpox vaccine (JYNNEOS) administered in California to residents of San Francisco ages 18 years or older. This dataset only includes doses of the JYNNEOS vaccine given on or after 5/1/2022. All vaccines given to people who live in San Francisco are included, no matter where the vaccination took place. The data are broken down by multiple demographic stratifications.
B. HOW THE DATASET IS CREATED Information on doses administered to those who live in San Francisco is from the California Immunization Registry (CAIR2), run by the California Department of Public Health (CDPH). Information on individuals’ city of residence, age, race, ethnicity, and sex are recorded in CAIR2 and are self-reported at the time of vaccine administration. Because CAIR2 does not include information on sexual orientation, we pull information from the San Francisco Department of Public Health’s Epic Electronic Health Record (EHR). The populations represented in our Epic data and the CAIR2 data are different. Epic data only include vaccinations administered at SFDPH managed sites to SF residents.
Data notes for population characteristic types are listed below.
Age * Data only include individuals who are 18 years of age or older.
Race/ethnicity * The response option "Other Race" is categorized by the data source system, and the response option "Unknown" refers to a lack of data.
Sex * The response option "Other" is categorized by the source system, and the response option "Unknown" refers to a lack of data.
Sexual orientation * The response option “Unknown/Declined” refers to a lack of data or individuals who reported multiple different sexual orientations during their most recent interaction with SFDPH.
For convenience, we provide the 2020 5-year American Community Survey population estimates.
C. UPDATE PROCESS Updated daily via automated process.
D. HOW TO USE THIS DATASET This dataset includes many different types of demographic groups. Filter the “demographic_group” column to explore a topic area. Then, the “demographic_subgroup” column shows each group or category within that topic area and the total count of doses administered to that population subgroup.
E. CHANGE LOG
If you know any further standard populations worth integrating in this dataset, please let me know in the discussion part. I would be happy to integrate further data to make this dataset more useful for everybody.
"Standard populations are "artificial populations" with fictitious age structures, that are used in age standardization as uniform basis for the calculation of comparable measures for the respective reference population(s).
Use: Age standardizations based on a standard population are often used at cancer registries to compare morbidity or mortality rates. If there are different age structures in populations of different regions or in a population in one region over time, the comparability of their mortality or morbidity rates is only limited. For interregional or inter-temporal comparisons, therefore, an age standardization is necessary. For this purpose the age structure of a reference population, the so-called standard population, is assumed for the study population. The age specific mortality or morbidity rates of the study population are weighted according to the age structure of the standard population. Selection of a standard population:
Which standard population is used for comparison basically, does not matter. It is important, however, that
The aim of this dataset is to provide a variety of the most commonly used 'standard populations'.
Currently, two files with 22 standard populations are provided: - standard_populations_20_age_groups.csv - 20 age groups: '0', '01-04', '05-09', '10-14', '15-19', '20-24', '25-29', '30-34', '35-39', '40-44', '45-49', '50-54', '55-59', '60-64', '65-69', '70-74', '75-79', '80-84', '85-89', '90+' - 7 standard populations: 'Standard population Germany 2011', 'Standard population Germany 1987', 'Standard population of Europe 2013', 'Standard population Old Laender 1987', 'Standard population New Laender 1987', 'New standard population of Europe', 'World standard population' - source: German Federal Health Monitoring System
No restrictions are known to the author. Standard populations are published by different organisations for public usage.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Some racial and ethnic categories are suppressed to avoid misleading estimates when the relative standard error exceeds 30%. Margins of error are estimated at the 90% confidence level.
Data Source: Current Population Survey (CPS) Voting Supplement, 2020
Why This Matters
Voting is one of the primary ways residents can have their voices heard by the government. By voting for elected officials and on ballot initiatives, residents help decide the future of their community.
For much of our nation’s history, non-white residents were explicitly prohibited from voting or discriminated against in the voting process. It was not until the Voting Rights Act of 1965 that the Federal Government enacted voting rights protections for Black voters and voters of color.
Nationally, BIPOC citizens and especially Hispanic and Asian citizens have consistently lower voter turnout rates and voter registration rates. While local DC efforts have been taken to remove these barriers, restrictive voter ID requirements and the disenfranchisement of incarcerated and returning residents act as institutionally racist barriers to voting in many jurisdictions.
The District's Response
The DC Board of Elections has lowered the barriers to participate in local elections through online voter registration, same day registration, voting by mail, and non-ID proof of residence.
Unlike in many states, incarcerated and returning residents in D.C. never lose the right to vote. Since 2024, DC has also extended the right to vote in local elections to residents of the District who are not citizens of the U.S.
Although DC residents pay federal taxes and can vote in the presidential election, the District does not have full representation in Congress. Efforts to advocate for DC statehood aim to remedy this.
https://www.icpsr.umich.edu/web/ICPSR/studies/48/termshttps://www.icpsr.umich.edu/web/ICPSR/studies/48/terms
This data collection consists of 161 selected social, demographic, and educational datasets for France in the period 1801-1897. The data were collected from published reports of three national statistical series: (1) National Censuses, (2) Vital Statistics, and (3) Primary Education. This project was supported by grants from the National Endowment for the Humanities and the National Science Foundation. The National Census data were derived from the quinquennial population censuses of France from 1801 to 1896 and were obtained from the Statistique Generale de la France. The data provide detailed social and economic information for the period 1851 to 1896. The data for 1801-1851 are less rich in subject matter coverage but do present some basic information on population characteristics. The National Census data in general describe the population, including the composition of the population by categories of age, sex, place of birth, marital status, religion, place of residence, and occupation. There is also some limited information on migration, transportation and communication, housing, and families. A large segment of the census data pertains to occupations of the population, specifying job classifications within professions, as well as information on non-employed household members that were dependent on employees in the various industries, in addition to enumerations of persons employed in various professions and trades. The Vital Statistics data files contain annual vital statistics for the French population. These data were obtained from two printed series, MOUVEMENT DE LA POPULATION (1801-1868), and STATISTIQUE ANNUELLE (1869-1897). The basic variables included in the vital statistics datasets record births, deaths, and marriages in France. Detailed cross-tabulations of these demographic indicators are presented for births, tabulated by sex, month, legitimacy status, and characteristics of the parents, and deaths, categorized by age and previous marital status of the partners. Additional cross-tabulations are provided for variables such as divorces, passports issued, medical personnel and hospitals, and a literacy indicator (signing of marriage certificates). The Primary Education data files provide information on primary schools and were obtained from the Statistique de l'enseignement Primaire. The data obtained from the series basically cover the period 1829-1897, although some recapitulative information for earlier years is also presented. The main focus of the data in this series is on primary schools, classes and buildings, enrollment, teachers, sources of funding and expenditure, and academic proficiency of the pupils. Additional information is included on literacy, teacher training (normal) schools, school age population, and libraries. A machine-readable French language codebook, describing the data items as well as the sources from which they were obtained, is provided with each dataset supplied. In addition, lists of the variables included in each dataset are included in Parts 162-164. See the related collection, DEMOGRAPHIC, SOCIAL, EDUCATIONAL AND ECONOMIC DATA FOR FRANCE, 1833-1925 (ICPSR 7529).
More details about each file are in the individual file descriptions.
This is a dataset from Joint Research Centre hosted by the EU Open Data Portal. The Open Data Portal is found here and they update their information according the frequency that the data is collected. Explore Joint Research Centre data using Kaggle and all of the data sources available through the Joint Research Centre organization page!
This dataset is maintained using the EU ODP API and Kaggle's API.
This dataset is distributed under the following licenses: Dataset License
Cover photo by freestocks.org on Unsplash
Unsplash Images are distributed under a unique Unsplash License.
Future fine particulate matter (PM2.5) concentrations and health impacts will be largely determined by factors such as energy use, fuel choices, emission controls, state and national policies, and demographics. In this study, a human-earth system model is used to estimate US state-level PM2.5 mortality costs from 2015 to 2050 considering current major air quality and energy regulations. The Logarithmic Mean Divisia Index is applied to quantify the contributions of socioeconomic and energy factors to future changes in PM2.5 mortality costs. National PM2.5 mortality costs are estimated to decrease by 25% from 2015 to 2050, primarily driven by decreases in energy intensity and decreases in PM2.5 mortality cost per unit consumption of electric sector coal and transportation liquids. These factors together contribute to 68% of the net decrease, primarily because of technology improvements and air pollutant emission regulations. Furthermore, the results suggest that states with greater population and economic growth, but with fewer clean energy resources, are more likely to face significant challenges in reducing future PM2.5 mortality costs. In contrast, states with larger projected decreases in mortality costs have smaller increases in population and per capita GDP and greater decreases in electric sector coal share and PM2.5 mortality cost per unit fuel consumption. This dataset includes source code, input data, and model output from the Global Change Assessment Model (GCAM-USA) human-earth system model used in this study. It also includes Excel workbooks and R scripts used in producing the figures in the manuscript.
This dataset is associated with the following publication: Ou, Y., S. Smith, J.J. West, C. Nolte, and D. Loughlin. State-level drivers of future fine particulate matter mortality in the United States.. Environmental Research Letters. IOP Publishing LIMITED, Bristol, UK, 14(12): 124071, (2019).
Between Oct. 14, 2014, and May 21, 2015, Pew Research Center, with generous funding from The Pew Charitable Trusts and the Neubauer Family Foundation, completed 5,601 face-to-face interviews with non-institutionalized adults ages 18 and older living in Israel.
The survey sampling plan was based on six districts defined in the 2008 Israeli census. In addition, Jewish residents of West Bank (Judea and Samaria) were included.
The sample includes interviews with 3,789 respondents defined as Jews, 871 Muslims, 468 Christians and 439 Druze. An additional 34 respondents belong to other religions or are religiously unaffiliated. Five groups were oversampled as part of the survey design: Jews living in the West Bank, Haredim, Christian Arabs, Arabs living in East Jerusalem and Druze.
Interviews were conducted under the direction of Public Opinion and Marketing Research of Israel (PORI). Surveys were administered through face-to-face, paper and pencil interviews conducted at the respondent's place of residence. Sampling was conducted through a multi-stage stratified area probability sampling design based on national population data available through the Israel's Central Bureau of Statistics' 2008 census.
The questionnaire was designed by Pew Research Center staff in consultation with subject matter experts and advisers to the project. The questionnaire was translated into Hebrew, Russian and Arabic, independently verified by professional linguists conversant in regional dialects and pretested prior to fieldwork.
The questionnaire was divided into four sections. All respondents who took the survey in Russian or Hebrew were branched into the Jewish questionnaire (Questionnaire A). Arabic-speaking respondents were branched into the Muslim (Questionnaire B), Christian (Questionnaire C) or Druze questionnaire (D) based on their response to the religious identification question. For the full question wording and exact order of questions, please see the questionnaire.
Note that not all respondents who took the questionnaire in Hebrew or Russian are classified as Jews in this study. For further details on how respondents were classified as Jews, Muslims, Christians and Druze in the study, please see sidebar in the report titled "http://www.pewforum.org/2016/03/08/israels-religiously-divided-society/" Target="_blank">"How Religious are Defined".
Following fieldwork, survey performance was assessed by comparing the results for key demographic variables with population statistics available through the census. Data were weighted to account for different probabilities of selection among respondents. Where appropriate, data also were weighted through an iterative procedure to more closely align the samples with official population figures for gender, age and education. The reported margins of sampling error and the statistical tests of significance used in the analysis take into account the design effects due to weighting and sample design.
In addition to sampling error and other practical difficulties, one should bear in mind that question wording also can have an impact on the findings of opinion polls.
Abstract copyright UK Data Service and data collection copyright owner. The project's main objectives were to analyse the public discourses surrounding the 2001 Northern Ireland Census and relate them to the already available data on population trends, segregation and projected future changes and to assess their political implications in terms of voting patterns. It assesses the politics of demography through 25 in-depth interviews with political party representatives about the relationships of religion ratios and segregation with electoral strategies, voting patterns and levels of conflict. The interviews examined politicians' views on how demography influences territorial politics. The interviews were semi-structured and covered a commonality of themes while also allowing the interviewees to raise and develop their own concerns. Interviewees were selected on the basis of the media analysis with those most frequently reported as commenting on demographic issues being approached first, though the researchers ensured a geographical spread across Northern Ireland and coverage that broadly reflected the electoral strength of the various political parties. The interviewees included Westminster MPs, Members of the Northern Ireland Assembly, local councillors and party advisers. Some senior politicians had to cancel interviews but three party leaders were interviewed and in other cases senior colleagues deputised. For a wider perspective, officials from the Housing Executive and from the Census Office were also interviewed because they interact closely with local politicians on demographic matters. The project also assesses the limitations and problems of census data, particularly with respect to discourses on ratios and segregation. It builds on earlier work and it links with research on political demography in other divided societies.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
is the total number of elements, is the total number of distinct elements, is the Zipf's exponent obtained by the maximum likelihood estimation[3], [43], is the asymptotic solution of the Heaps' exponent as shown in Eq. 7, is the numerical value of the Heaps' exponent given and as shown in Fig. 3, and is the empirical result of the Heaps' exponent obtained by the least square method. The effective number of the 34th data set is only two digits since the size of this data set is very small. Except the 4th data set, in all other 34 real data sets, the numerical results based on Eq. 6 outperform the asymptotic solution shown in Eq. 7. Detailed description of these data sets can be found in Materials and Methods.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This repository is linked to the paper "Radiative transfer modeling in structurally-complex stands: what aspects matter most?" submitted to Annals of Forest Science and written by Frédéric ANDRÉ (corresponding author), Louis DE WERGIFOSSE, François DE COLIGNY, Nicolas BEUDEZ, Gauthier LIGOT, Vincent GAUTHRAY-GUYÉNET, Benoit COURBAUD and Mathieu JONARD.
The repository contains the three following files :
For more information concerning this repository or the study, please do not hesitate to contact Frédéric ANDRÉ (frederic.andre@uclouvain.be) or Mathieu JONARD (mathieu.jonard@uclouvain.be).
Success.ai’s Consumer Sentiment Data offers businesses unparalleled insights into global audience attitudes, preferences, and emotional triggers. Sourced from continuous analysis of consumer behaviors, conversations, and feedback, this dataset includes psychographic profiles, interest data, and sentiment trends that help marketers, product teams, and strategists better understand their target customers. Whether you’re exploring a new market, refining your brand message, or enhancing product offerings, Success.ai ensures your consumer intelligence efforts are guided by timely, accurate, and context-rich data.
Why Choose Success.ai’s Consumer Sentiment Data?
Comprehensive Audience Insights
Global Reach Across Industries and Demographics
Continuously Updated Datasets
Ethical and Compliant
Data Highlights:
Key Features of the Dataset:
Granular Segmentation
Contextual Sentiment Analysis
AI-Driven Enrichment
Strategic Use Cases:
Marketing and Campaign Optimization
Product Development and Innovation
Brand Management and Positioning
Competitive Analysis and Market Entry
Why Choose Success.ai?
Best Price Guarantee
Seamless Integration
Data Accuracy with AI Validation
Customizable and Scalable Solutions
APIs for Enhanced Functionality:
Data Enrichment API
Lead Generation API
The Milky Way galaxy contains a large, spherical component which is believed to harbor a substantial amount of unseen matter. Recent observations indirectly suggest that as much as half of this "dark matter" may be in the form of old, very cool white dwarfs, the remnants of an ancient population of stars as old as the galaxy itself. We conducted a survey to find faint, cool white dwarfs with large space velocities, indicative of their membership in the galaxy's spherical halo component. The survey reveals a substantial, directly observed population of old white dwarfs, too faint to be seen in previous surveys. This newly discovered population accounts for at least 2 percent of the halo dark matter. It provides a natural explanation for the indirect observations, and represents a direct detection of galactic halo dark matter. Cone search capability for table J/other/Sci/292.698/table1 (Candidate Halo White Dwarfs) Cone search capability for table J/other/Sci/292.698/ohdhs (Complete dataset of Figure 1 of the paper)
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This dataset contains 20000 pieces of text collected from Wikipedia, Gutenberg, and CNN/DailyMail. The text is cleaned by replacing symbols such as (.*?/) with a white space using automatic scripts and regex.
The data was collected from these source to ensure the highest level of integrity against AI generated text. * Wikipedia: The 20220301 dataset was chosen to minimize the chance of including articles generated or heavily edited by AI. * Gutenberg: Books from this source are guaranteed to be written by real humans and span various genres and time periods. * CNN/DailyMail: These news articles were written by professional journalists and cover a variety of topics, ensuring diversity in writing style and subject matter.
The dataset consists of 5 CSV files.
1. CNN_DailyMail.csv
: Contains all processed news articles.
2. Gutenberg.csv
: Contains all processed books.
3. Wikipedia.csv
: Contains all processed Wikipedia articles.
4. Human.csv
: Combines all three datasets in order.
5. Shuffled_Human.csv
: This is the randomly shuffled version of Human.csv
.
Each file has 2 columns:
- Title
: The title of the item.
- Text
: The content of the item.
This dataset is suitable for a wide range of NLP tasks, including: - Training models to distinguish between human-written and AI-generated text (Human/AI classifiers). - Training LSTMs or Transformers for chatbots, summarization, or topic modeling. - Sentiment analysis, genre classification, or linguistic research.
While the data was collected from such sources, the data may not be 100% pure from AI generated text. Wikipedia articles may reflect systemic biases in contributor demographics. CNN/DailyMail articles may focus on specific news topics or regions.
For details on how the dataset was created, click here to view the Kaggle notebook used.
This dataset is published under the MIT License, allowing free use for both personal and commercial purposes. Attribution is encouraged but not required.
This dataset covers ballots 333-38, spanning January, March, May, July, September and October 1969. The dataset contains the data resulting from these polls in ASCII. The ballots are as follows: 333 - January This Gallup poll seeks the opinions of Canadians on various political and social issues. Subjects include discipline in schools, preferred political parties and leaders, and the overall development of the country. The respondents were also asked questions so that they could be grouped according to geographical and social variables. Topics of interest include: Canadian development; changes in savings; feelings towards the future; putting limits on debates in Parliament; the outcome of giving women more say; political preferences; the preferred size of the population; the proposed reconstruction of the Provinces; the sale of beer in grocery stores; satisfaction with the government; and the idea of going back to a two-party system in Canada. Basic demographic variables are also included. 334 - March This Gallup poll seeks the opinions of Canadians on a variety of political and social issues of importance to the country and government. Some of the subjects include political leaders, parties and issues, abortion, international development and foreign aid, and lotteries. The respondents were also asked questions so that they could be grouped according to geographical and social variables. Topics of interest include: abortions for physical and mental reasons; approval of the language rights bill; the court's treatment of criminals; the effectiveness of the Federal government; foreign aid; interest in international development; the legalization of sweepstakes and lotteries; militant students causing damage; political preference; a politician's right to privacy; recognizing Red China; the issue of public workers striking; the use of Medicare money; whether or not regional differences will break confederation; and if Canada will be better off if it was governed federally. Basic demographic variables are also included. 335 - May This Gallup poll seeks the opinions of Canadians on political and social issues of interest to the country and government. Topics of interest include: involvement in politics, opinions on Trudeau as prime minister, the nature of the U.S. vs Canada, livable income, how the government should raise money, U.S.-Canada relations, integrating neighbourhoods, whether Quebec will gain its independence, opinions on Nixon as president, Rene Levesque, and voting behavior. Basic demographic variables are also included. 336 - July This Gallup poll seeks the opinions of Canadians on political and social issues of interest to the country and government. There are questions about elections, world conflicts, money matters and prices. The respondents were also asked questions so that they could be grouped according to geographical and social variables. Topics of interest include: the cutback of NATO forces in Europe; the dispute between Arabs and Jews; the amount of government money spent on Expo '67; opinions on who gets the most profit with the increased prices of vegetables; the amount of objectionable material in the media; the opinions about John Robarts; the opinions about topless waitresses; political preferences; provinces with power; the ratings of Stanfield as leader of the opposition; whether or not some proportion of income is saved; sex education in schools, the use of alcohol; which household member decides on money matters; which family member gets a fixed amount of pocket money; and who gets profit from the increased price of meat. Basic demographic variables are also included 337 - September This Gallup poll seeks the opinions of Canadians on current issues of importance to the country and government. Some of the questions are politically-based, collecting opinions about political parties, leaders, and policies. There are also other questions of importance to the country, such as problems facing the government, and attitudes towards inflation. The respondents were also asked questions so that they could be grouped according to geographical and social variables. Topics of interest include: Allowing the police to go on strike; baby bonus cuts to the rich; the biggest worry for the future; the greatest problem facing the Federal government; inflation problems; will the NDP gain support; the opinion of Trudeau; the performance of the police; political preferences; the ratings of Federal MPs; the ratings of Provincial MPs; reducing the work week from 40 to 35 hours; and the Trudeau plan of efficiency. Basic demographic variables are also included. 338 - October This Gallup poll seeks the opinions of Canadians on important current events topics of the day. Many of the questions in this survey deal with predictions of social, political and economic conditions for the future. The respondents were also asked questions so that they could be grouped according to geographical and social variables. Topics of interest include: American power in 1970; the amount of student demonstrations; chance of atomic war by 1990; changing the voting age; Chinese power in 1970; the collapse of capitalism; the collapse of civilization; continuation of space programmes; the country with the strongest claim to the South Pole; a cure for cancer; the disappearance of Communism; economic prosperity in 1970; the amount of excitement in life; heart transplant operations; International discord in 1970; the length of life span in the future; man living on the moon; the manufacturing of H-bombs; opinions of 1969; political preferences; predictions for 1990; predictions for the future; predictions of peace in 1990; Russian power in 1970; opinions of a three day work week; and travel involving passports. Basic demographic variables are also included.The codebook for this dataset is available through the UBC Library catalogue, with call number HN110.Z9 P84.
The GEO Data Portal is the authoritative source for data sets used by UNEP and its partners in the Global Environment Outlook (GEO) report and other integrated environment assessments. The GEO Data Portal gives access to a broad socio-economic data sets from authoritative sources at global, regional, sub-regional and national levels. The contents of the Data Portal cover environmental themes such as climate, forests and freshwater and many others, as well as socioeconomic categories, including education, health, economy, population and environmental policies.
Series Name: Annual mean levels of fine particulate matter in cities urban population (micrograms per cubic meter)Series Code: EN_ATM_PM25Release Version: 2020.Q2.G.03 This dataset is the part of the Global SDG Indicator Database compiled through the UN System in preparation for the Secretary-General's annual report on Progress towards the Sustainable Development Goals.Indicator 11.6.2: Annual mean levels of fine particulate matter (e.g. PM2.5 and PM10) in cities (population weighted)Target 11.6: By 2030, reduce the adverse per capita environmental impact of cities, including by paying special attention to air quality and municipal and other waste managementGoal 11: Make cities and human settlements inclusive, safe, resilient and sustainableFor more information on the compilation methodology of this dataset, see https://unstats.un.org/sdgs/metadata/
In this project, we aim to analyze and gain insights into the performance of students based on various factors that influence their academic achievements. We have collected data related to students' demographic information, family background, and their exam scores in different subjects.
**********Key Objectives:*********
Performance Evaluation: Evaluate and understand the academic performance of students by analyzing their scores in various subjects.
Identifying Underlying Factors: Investigate factors that might contribute to variations in student performance, such as parental education, family size, and student attendance.
Visualizing Insights: Create data visualizations to present the findings effectively and intuitively.
Dataset Details:
Analysis Highlights:
We will perform a comprehensive analysis of the dataset, including data cleaning, exploration, and visualization to gain insights into various aspects of student performance.
By employing statistical methods and machine learning techniques, we will determine the significant factors that affect student performance.
Why This Matters:
Understanding the factors that influence student performance is crucial for educators, policymakers, and parents. This analysis can help in making informed decisions to improve educational outcomes and provide support where it is most needed.
Acknowledgments:
We would like to express our gratitude to [mention any data sources or collaborators] for making this dataset available.
Please Note:
This project is meant for educational and analytical purposes. The dataset used is fictitious and does not represent any specific educational institution or individuals.
This database contains tobacco consumption data from 1970-2015 collected through a systematic search coupled with consultation with country and subject-matter experts. Data quality appraisal was conducted by at least two research team members in duplicate, with greater weight given to official government sources. All data was standardized into units of cigarettes consumed and a detailed accounting of data quality and sourcing was prepared. Data was found for 82 of 214 countries for which searches for national cigarette consumption data were conducted, representing over 95% of global cigarette consumption and 85% of the world’s population. Cigarette consumption fell in most countries over the past three decades but trends in country specific consumption were highly variable. For example, China consumed 2.5 million metric tonnes (MMT) of cigarettes in 2013, more than Russia (0.36 MMT), the United States (0.28 MMT), Indonesia (0.28 MMT), Japan (0.20 MMT), and the next 35 highest consuming countries combined. The US and Japan achieved reductions of more than 0.1 MMT from a decade earlier, whereas Russian consumption plateaued, and Chinese and Indonesian consumption increased by 0.75 MMT and 0.1 MMT, respectively. These data generally concord with modelled country level data from the Institute for Health Metrics and Evaluation and have the additional advantage of not smoothing year-over-year discontinuities that are necessary for robust quasi-experimental impact evaluations. Before this study, publicly available data on cigarette consumption have been limited—either inappropriate for quasi-experimental impact evaluations (modelled data), held privately by companies (proprietary data), or widely dispersed across many national statistical agencies and research organisations (disaggregated data). This new dataset confirms that cigarette consumption has decreased in most countries over the past three decades, but that secular country specific consumption trends are highly variable. The findings underscore the need for more robust processes in data reporting, ideally built into international legal instruments or other mandated processes. To monitor the impact of the WHO Framework Convention on Tobacco Control and other tobacco control interventions, data on national tobacco production, trade, and sales should be routinely collected and openly reported. The first use of this database for a quasi-experimental impact evaluation of the WHO Framework Convention on Tobacco Control is: Hoffman SJ, Poirier MJP, Katwyk SRV, Baral P, Sritharan L. Impact of the WHO Framework Convention on Tobacco Control on global cigarette consumption: quasi-experimental evaluations using interrupted time series analysis and in-sample forecast event modelling. BMJ. 2019 Jun 19;365:l2287. doi: https://doi.org/10.1136/bmj.l2287 Another use of this database was to systematically code and classify longitudinal cigarette consumption trajectories in European countries since 1970 in: Poirier MJ, Lin G, Watson LK, Hoffman SJ. Classifying European cigarette consumption trajectories from 1970 to 2015. Tobacco Control. 2022 Jan. DOI: 10.1136/tobaccocontrol-2021-056627. Statement of Contributions: Conceived the study: GEG, SJH Identified multi-country datasets: GEG, MP Extracted data from multi-country datasets: MP Quality assessment of data: MP, GEG Selection of data for final analysis: MP, GEG Data cleaning and management: MP, GL Internet searches: MP (English, French, Spanish, Portuguese), GEG (English, French), MYS (Chinese), SKA (Persian), SFK (Arabic); AG, EG, BL, MM, YM, NN, EN, HR, KV, CW, and JW (English), GL (English) Identification of key informants: GEG, GP Project Management: LS, JM, MP, SJH, GEG Contacts with Statistical Agencies: MP, GEG, MYS, SKA, SFK, GP, BL, MM, YM, NN, HR, KV, JW, GL Contacts with key informants: GEG, MP, GP, MYS, GP Funding: GEG, SJH SJH: Hoffman, SJ; JM: Mammone J; SRVK: Rogers Van Katwyk, S; LS: Sritharan, L; MT: Tran, M; SAK: Al-Khateeb, S; AG: Grjibovski, A.; EG: Gunn, E; SKA: Kamali-Anaraki, S; BL: Li, B; MM: Mahendren, M; YM: Mansoor, Y; NN: Natt, N; EN: Nwokoro, E; HR: Randhawa, H; MYS: Yunju Song, M; KV: Vercammen, K; CW: Wang, C; JW: Woo, J; MJPP: Poirier, MJP; GEG: Guindon, EG; GP: Paraje, G; GL Gigi Lin Key informants who provided data: Corne van Walbeek (South Africa, Jamaica) Frank Chaloupka (US) Ayda Yurekli (Turkey) Dardo Curti (Uruguay) Bungon Ritthiphakdee (Thailand) Jakub Lobaszewski (Poland) Guillermo Paraje (Chile, Argentina) Key informants who provided useful insights: Carlos Manuel Guerrero López (Mexico) Muhammad Jami Husain (Bangladesh) Nigar Nargis (Bangladesh) Rijo M John (India) Evan Blecher (Nigeria, Indonesia, Philippines, South Africa) Yagya Karki (Nepal) Anne CK Quah (Malaysia) Nery Suarez Lugo (Cuba) Agencies providing assistance: Irani... Visit https://dataone.org/datasets/sha256%3Aaa1b4aae69c3399c96bfbf946da54abd8f7642332d12ccd150c42ad400e9699b for complete metadata about this dataset.
PROBLEM AND OPPORTUNITY In the United States, voting is largely a private matter. A registered voter is given a randomized ballot form or machine to prevent linkage between their voting choices and their identity. This disconnect supports confidence in the election process, but it provides obstacles to an election's analysis. A common solution is to field exit polls, interviewing voters immediately after leaving their polling location. This method is rife with bias, however, and functionally limited in direct demographics data collected. For the 2020 general election, though, most states published their election results for each voting location. These publications were additionally supported by the geographical areas assigned to each location, the voting precincts. As a result, geographic processing can now be applied to project precinct election results onto Census block groups. While precinct have few demographic traits directly, their geographies have characteristics that make them projectable onto U.S. Census geographies. Both state voting precincts and U.S. Census block groups: are exclusive, and do not overlap are adjacent, fully covering their corresponding state and potentially county have roughly the same size in area, population and voter presence Analytically, a projection of local demographics does not allow conclusions about voters themselves. However, the dataset does allow statements related to the geographies that yield voting behavior. One could say, for example, that an area dominated by a particular voting pattern would have mean traits of age, race, income or household structure. The dataset that results from this programming provides voting results allocated by Census block groups. The block group identifier can be joined to Census Decennial and American Community Survey demographic estimates. DATA SOURCES The state election results and geographies have been compiled by Voting and Election Science team on Harvard's dataverse. State voting precincts lie within state and county boundaries. The Census Bureau, on the other hand, publishes its estimates across a variety of geographic definitions including a hierarchy of states, counties, census tracts and block groups. Their definitions can be found here. The geometric shapefiles for each block group are available here. The lowest level of this geography changes often and can obsolesce before the next census survey (Decennial or American Community Survey programs). The second to lowest census level, block groups, have the benefit of both granularity and stability however. The 2020 Decennial survey details US demographics into 217,740 block groups with between a few hundred and a few thousand people. Dataset Structure The dataset's columns include: Column Definition BLOCKGROUP_GEOID 12 digit primary key. Census GEOID of the block group row. This code concatenates: 2 digit state 3 digit county within state 6 digit Census Tract identifier 1 digit Census Block Group identifier within tract STATE State abbreviation, redundent with 2 digit state FIPS code above REP Votes for Republican party candidate for president DEM Votes for Democratic party candidate for president LIB Votes for Libertarian party candidate for president OTH Votes for presidential candidates other than Republican, Democratic or Libertarian AREA square kilometers of area associated with this block group GAP total area of the block group, net of area attributed to voting precincts PRECINCTS Number of voting precincts that intersect this block group ASSUMPTIONS, NOTES AND CONCERNS: Votes are attributed based upon the proportion of the precinct's area that intersects the corresponding block group. Alternative methods are left to the analyst's initiative. 50 states and the District of Columbia are in scope as those U.S. possessions voting in the general election for the U.S. Presidency. Three states did not report their results at the precinct level: South Dakota, Kentucky and West Virginia. A dummy block group is added for each of these states to maintain national totals. These states represent 2.1% of all votes cast. Counties are commonly coded using FIPS codes. However, each election result file may have the county field named differently. Also, three states do not share county definitions - Delaware, Massachusetts, Alaska and the District of Columbia. Block groups may be used to capture geographies that do not have population like bodies of water. As a result, block groups without intersection voting precincts are not uncommon. In the U.S., elections are administered at a state level with the Federal Elections Commission compiling state totals against the Electoral College weights. The states have liberty, though, to define and change their own voting precincts https://en.wikipedia.org/wiki/Electoral_precinct. The Census Bureau... Visit https://dataone.org/datasets/sha256%3A05707c1dc04a814129f751937a6ea56b08413546b18b351a85bc96da16a7f8b5 for complete metadata about this dataset.