Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
wikipedia persons masked: A filtered version of the wikipedia dataset, with only pages of people
Dataset Summary
Contains ~70k pages from wikipedia, each describing a person. For each page, the person described in the text is masked with a
Supported Tasks and Leaderboards
The dataset supports the tasks of fill-mask, but can also be used for other tasks such as question answering, e.g. "Who is
Languages
english only
Dataset Structure… See the full description on the dataset page: https://huggingface.co/datasets/rcds/wikipedia-persons-masked.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Data was pulled from a table in the following Wikipedia article: https://en.wikipedia.org/wiki/List_of_United_States_cities_by_population I used Microsoft Excel's PowerQuery function to pull the table from Wikipedia. Lists each city, its rank (based on 2020 population), some data on its area, and population in both 2020 and 2010.
Banner image source: https://unsplash.com/photos/wh-7GeXxItI
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Contains data on the top 330 most populous cities in the United States, including 2020 census information, 2010 census population, land area, population density, and location.
Source: https://en.wikipedia.org/wiki/List_of_United_States_cities_by_population
Facebook
TwitterThis dataset provides comparisons of demographic group prevalence in AmeriCorps Member/Volunteers populations to that of the greater U.S. population. The odds ratio analysis was completed by the Office of the Chief Data Officer. Population estimates were obtained from U.S. Census Bureau data reported in American Community Survey 5-Year tables DP05 (total U.S. populations) and S1701 (U.S. populations below poverty line), and socioeconomic status-related microdata maintained by IPUMS USA. See Attached Document 'AmeriCorps Demographic Analysis Procedure.pdf' for a full technical documentation of the analysis.
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
List of Countries and Dependencies by Population
This dataset contains population-related information for countries and dependencies, scraped from Wikipedia. The dataset includes the following columns:
Location: The country or dependency name. Population: Total population count. % of World: The percentage of the world's population this country or dependency represents. Date: The date of the population estimate. Source: Whether the source is official or derived from the United… See the full description on the dataset page: https://huggingface.co/datasets/iamramzan/Global-Population-Data.
Facebook
Twitterhttps://www.caliper.com/license/maptitude-license-agreement.htmhttps://www.caliper.com/license/maptitude-license-agreement.htm
County population projections broken down by year, age, race, and gender (2020-2100) for use with GIS mapping software, databases, and web applications.
Facebook
TwitterAttribution-NonCommercial-ShareAlike 3.0 (CC BY-NC-SA 3.0)https://creativecommons.org/licenses/by-nc-sa/3.0/
License information was derived automatically
Created to examine coronavirus cases relative to population.
Population Data for the US States and Regions of China
Sources: https://simple.wikipedia.org/wiki/List_of_U.S._states_by_population
https://en.wikipedia.org/wiki/List_of_Chinese_administrative_divisions_by_population
Facebook
TwitterThis is population and total area data for US, Europe and Asia scraped from wikipidia.
area is in km and population should be in straight numbers.
https://en.wikipedia.org/wiki/Demographics_of_Europe https://en.wikipedia.org/wiki/List_of_Chinese_administrative_divisions_by_population https://en.wikipedia.org/wiki/List_of_states_and_territories_of_the_United_States_by_population https://en.wikipedia.org/wiki/List_of_Asian_countries_by_population https://en.wikipedia.org/wiki/List_of_U.S._states_and_territories_by_area https://en.wikipedia.org/wiki/List_of_Chinese_administrative_divisions_by_area
COVID-19 prediction
Facebook
Twitter8Opt/famous-people-wiki-0001 dataset hosted on Hugging Face and contributed by the HF Datasets community
Facebook
TwitterThe Current Population Survey Civic Engagement and Volunteering (CEV) Supplement is the most robust longitudinal survey about volunteerism and other forms of civic engagement in the United States. Produced by AmeriCorps in partnership with the U.S. Census Bureau, the CEV takes the pulse of our nation’s civic health every two years. The data on this page was collected in September 2021. The CEV can generate reliable estimates at the national level, within states and the District of Columbia, and in the largest twelve Metropolitan Statistical Areas to support evidence-based decision making and efforts to understand how people make a difference in communities across the country. Click on "Export" to download and review an excerpt from the 2021 CEV Analytic Codebook that shows the variables available in the analytic CEV datasets produced by AmeriCorps. Click on "Show More" to download and review the following 2021 CEV data and resources provided as attachments: 1) 2021 CEV Dataset Fact Sheet – brief summary of technical aspects of the 2021 CEV dataset. 2) CEV FAQs – answers to frequently asked technical questions about the CEV 3) Constructs and measures in the CEV 4) 2021 CEV Analytic Data and Setup Files – analytic dataset in Stata (.dta), R (.rdata), SPSS (.sav), and Excel (.csv) formats, codebook for analytic dataset, and Stata code (.do) to convert raw dataset to analytic formatting produced by AmeriCorps. These files were updated on January 16, 2025 to correct erroneous missing values for the ssupwgt variable. 5) 2021 CEV Technical Documentation – codebook for raw dataset and full supplement documentation produced by U.S. Census Bureau 6) Nonresponse Bias Analysis produced by U.S. Census Bureau 7) 2021 CEV Raw Data and Read In Files – raw dataset in Stata (.dta) format, Stata code (.do) and dictionary file (.dct) to read ASCII dataset (.dat) into Stata using layout files (.lis)
Facebook
Twitteranalyze the current population survey (cps) annual social and economic supplement (asec) with r the annual march cps-asec has been supplying the statistics for the census bureau's report on income, poverty, and health insurance coverage since 1948. wow. the us census bureau and the bureau of labor statistics ( bls) tag-team on this one. until the american community survey (acs) hit the scene in the early aughts (2000s), the current population survey had the largest sample size of all the annual general demographic data sets outside of the decennial census - about two hundred thousand respondents. this provides enough sample to conduct state- and a few large metro area-level analyses. your sample size will vanish if you start investigating subgroups b y state - consider pooling multiple years. county-level is a no-no. despite the american community survey's larger size, the cps-asec contains many more variables related to employment, sources of income, and insurance - and can be trended back to harry truman's presidency. aside from questions specifically asked about an annual experience (like income), many of the questions in this march data set should be t reated as point-in-time statistics. cps-asec generalizes to the united states non-institutional, non-active duty military population. the national bureau of economic research (nber) provides sas, spss, and stata importation scripts to create a rectangular file (rectangular data means only person-level records; household- and family-level information gets attached to each person). to import these files into r, the parse.SAScii function uses nber's sas code to determine how to import the fixed-width file, then RSQLite to put everything into a schnazzy database. you can try reading through the nber march 2012 sas importation code yourself, but it's a bit of a proc freak show. this new github repository contains three scripts: 2005-2012 asec - download all microdata.R down load the fixed-width file containing household, family, and person records import by separating this file into three tables, then merge 'em together at the person-level download the fixed-width file containing the person-level replicate weights merge the rectangular person-level file with the replicate weights, then store it in a sql database create a new variable - one - in the data table 2012 asec - analysis examples.R connect to the sql database created by the 'download all microdata' progr am create the complex sample survey object, using the replicate weights perform a boatload of analysis examples replicate census estimates - 2011.R connect to the sql database created by the 'download all microdata' program create the complex sample survey object, using the replicate weights match the sas output shown in the png file below 2011 asec replicate weight sas output.png statistic and standard error generated from the replicate-weighted example sas script contained in this census-provided person replicate weights usage instructions document. click here to view these three scripts for more detail about the current population survey - annual social and economic supplement (cps-asec), visit: the census bureau's current population survey page the bureau of labor statistics' current population survey page the current population survey's wikipedia article notes: interviews are conducted in march about experiences during the previous year. the file labeled 2012 includes information (income, work experience, health insurance) pertaining to 2011. when you use the current populat ion survey to talk about america, subract a year from the data file name. as of the 2010 file (the interview focusing on america during 2009), the cps-asec contains exciting new medical out-of-pocket spending variables most useful for supplemental (medical spending-adjusted) poverty research. confidential to sas, spss, stata, sudaan users: why are you still rubbing two sticks together after we've invented the butane lighter? time to transition to r. :D
Facebook
TwitterPROBLEM AND OPPORTUNITY In the United States, voting is largely a private matter. A registered voter is given a randomized ballot form or machine to prevent linkage between their voting choices and their identity. This disconnect supports confidence in the election process, but it provides obstacles to an election's analysis. A common solution is to field exit polls, interviewing voters immediately after leaving their polling location. This method is rife with bias, however, and functionally limited in direct demographics data collected. For the 2020 general election, though, most states published their election results for each voting location. These publications were additionally supported by the geographical areas assigned to each location, the voting precincts. As a result, geographic processing can now be applied to project precinct election results onto Census block groups. While precinct have few demographic traits directly, their geographies have characteristics that make them projectable onto U.S. Census geographies. Both state voting precincts and U.S. Census block groups: are exclusive, and do not overlap are adjacent, fully covering their corresponding state and potentially county have roughly the same size in area, population and voter presence Analytically, a projection of local demographics does not allow conclusions about voters themselves. However, the dataset does allow statements related to the geographies that yield voting behavior. One could say, for example, that an area dominated by a particular voting pattern would have mean traits of age, race, income or household structure. The dataset that results from this programming provides voting results allocated by Census block groups. The block group identifier can be joined to Census Decennial and American Community Survey demographic estimates. DATA SOURCES The state election results and geographies have been compiled by Voting and Election Science team on Harvard's dataverse. State voting precincts lie within state and county boundaries. The Census Bureau, on the other hand, publishes its estimates across a variety of geographic definitions including a hierarchy of states, counties, census tracts and block groups. Their definitions can be found here. The geometric shapefiles for each block group are available here. The lowest level of this geography changes often and can obsolesce before the next census survey (Decennial or American Community Survey programs). The second to lowest census level, block groups, have the benefit of both granularity and stability however. The 2020 Decennial survey details US demographics into 217,740 block groups with between a few hundred and a few thousand people. Dataset Structure The dataset's columns include: Column Definition BLOCKGROUP_GEOID 12 digit primary key. Census GEOID of the block group row. This code concatenates: 2 digit state 3 digit county within state 6 digit Census Tract identifier 1 digit Census Block Group identifier within tract STATE State abbreviation, redundent with 2 digit state FIPS code above REP Votes for Republican party candidate for president DEM Votes for Democratic party candidate for president LIB Votes for Libertarian party candidate for president OTH Votes for presidential candidates other than Republican, Democratic or Libertarian AREA square kilometers of area associated with this block group GAP total area of the block group, net of area attributed to voting precincts PRECINCTS Number of voting precincts that intersect this block group ASSUMPTIONS, NOTES AND CONCERNS: Votes are attributed based upon the proportion of the precinct's area that intersects the corresponding block group. Alternative methods are left to the analyst's initiative. 50 states and the District of Columbia are in scope as those U.S. possessions voting in the general election for the U.S. Presidency. Three states did not report their results at the precinct level: South Dakota, Kentucky and West Virginia. A dummy block group is added for each of these states to maintain national totals. These states represent 2.1% of all votes cast. Counties are commonly coded using FIPS codes. However, each election result file may have the county field named differently. Also, three states do not share county definitions - Delaware, Massachusetts, Alaska and the District of Columbia. Block groups may be used to capture geographies that do not have population like bodies of water. As a result, block groups without intersection voting precincts are not uncommon. In the U.S., elections are administered at a state level with the Federal Elections Commission compiling state totals against the Electoral College weights. The states have liberty, though, to define and change their own voting precincts https://en.wikipedia.org/wiki/Electoral_precinct. The Census Bureau... Visit https://dataone.org/datasets/sha256%3A05707c1dc04a814129f751937a6ea56b08413546b18b351a85bc96da16a7f8b5 for complete metadata about this dataset.
Facebook
Twitterhttps://www.caliper.com/license/maptitude-license-agreement.htmhttps://www.caliper.com/license/maptitude-license-agreement.htm
Geodemographic Segmentation Data from Caliper Corporation contain demographic data in a way that is easy to visualize and interpret. We provide 8 segments and 32 subsegments for exploring the demographic makeup of neighborhoods across the country.
Facebook
TwitterAttribution-ShareAlike 3.0 (CC BY-SA 3.0)https://creativecommons.org/licenses/by-sa/3.0/
License information was derived automatically
This dataset includes basic data about all US cities with a population over 100.000 (333 cities)
Source: https://en.wikipedia.org/wiki/List_of_United_States_cities_by_population
Coordinates of cities have been geocoded using https://rapidapi.com/GeocodeSupport/api/forward-reverse-geocoding/
Rows description:
City: Name of city State: Name of state Latitude, Longitude, Population_estimate_2022: Estimated population in 2022 Population_2020: Population figure from 2020 census Change_population: % change in population between 2022 and 2020 Land_area: City land area in sq. mi. Population_density_2020: density of population per sq. mi. in 2020
Facebook
Twitterhttps://www.florida-demographics.com/terms_and_conditionshttps://www.florida-demographics.com/terms_and_conditions
A dataset listing Florida cities by population for 2024.
Facebook
TwitterOpen Database License (ODbL) v1.0https://www.opendatacommons.org/licenses/odbl/1.0/
License information was derived automatically
OpenStreetMap exports for use in GIS applications.
This theme includes all OpenStreetMap features in this area matching:
place IN ('isolated_dwelling','town','village','hamlet','city')
Features may have these attributes:
This dataset is one of many "https://data.humdata.org/organization/hot">OpenStreetMap exports on HDX. See the Humanitarian OpenStreetMap Team website for more information.
Facebook
TwitterAttribution-ShareAlike 3.0 (CC BY-SA 3.0)https://creativecommons.org/licenses/by-sa/3.0/
License information was derived automatically
Percentage of populations in the USA that identify as black. 1790-2018.
Columns: State, Year, Percent
Data from https://en.wikipedia.org/wiki/List_of_U.S._states_and_territories_by_African-American_population, via https://web.archive.org/web/20080725044857/http://www.census.gov/population/www/documentation/twps0056/twps0056.html.
Banner photo by Enayet Raheem on Unsplash
Facebook
TwitterThis dataset contains information on all United States of America counties.
I have scraped this data from the following Wikipedia website: https://en.wikipedia.org/wiki/List_of_United_States_counties_and_county_equivalents
Data scientists spend most of their time on data cleaning. Hence, this dataset can be ideal for sharpening your data-cleaning skills.
Columns specification: county: Name of each county. state: State name. founded: The year when it was founded. largest_city: Name of the largest city. pop_total: Population in total on that state. pop_den: Population density per square mile and km square. total_area: Total area(land + water) on mile square and km square. land_area: Total land area in mile square and km square. water_area: Total water area on mile square and km square.
Facebook
TwitterDataset Card for image_text_wikipedia_vi
Dataset Summary
Dataset Summary: People-Text Wikipedia Abstracts (Vietnamese version) This dataset comprises nearly 30.000 pairs of well-known people images and corresponding textual abstracts extracted from Vietnamese Wikipedia articles. It cotain portraits, paintings portray famous people,....
Facebook
Twitterhttps://www.sci-tech-today.com/privacy-policyhttps://www.sci-tech-today.com/privacy-policy
FOMO statistics: If you’ve ever scrolled through Instagram at midnight and suddenly felt like your entire life is boring compared to everyone else’s… (I do). Well, you’ve met FOMO. Short for “Fear of Missing Out,†it’s not just a feeling; it’s a measurable, trackable thing.
And that’s where these FOMO statistics come in. These stats don’t just tell us how common it is; they reveal how it shapes the way we think, spend money, and even sleep. The truth is, FOMO has been studied for years, and the data behind it is honestly shocking.
From how many people lose sleep over it, to how often it makes us pull out our wallets, the numbers show that FOMO has gone from a social quirk to a full-blown cultural driver. In this article, we’re diving deep into the most detailed FOMO statistics you’ll find anywhere, covering its origins, how it works, who it hits hardest, and just how much it’s costing us in time, money, and peace of mind, literally everything. Let’s get started.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
wikipedia persons masked: A filtered version of the wikipedia dataset, with only pages of people
Dataset Summary
Contains ~70k pages from wikipedia, each describing a person. For each page, the person described in the text is masked with a
Supported Tasks and Leaderboards
The dataset supports the tasks of fill-mask, but can also be used for other tasks such as question answering, e.g. "Who is
Languages
english only
Dataset Structure… See the full description on the dataset page: https://huggingface.co/datasets/rcds/wikipedia-persons-masked.