87 datasets found
  1. Wikia census / Fandom census

    • kaggle.com
    zip
    Updated Oct 19, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Abel Serrano Juste (2018). Wikia census / Fandom census [Dataset]. https://www.kaggle.com/abeserra/wikia-census
    Explore at:
    zip(87833068 bytes)Available download formats
    Dataset updated
    Oct 19, 2018
    Authors
    Abel Serrano Juste
    License

    Attribution-ShareAlike 3.0 (CC BY-SA 3.0)https://creativecommons.org/licenses/by-sa/3.0/
    License information was derived automatically

    Description

    Context

    A census of all the wikis hosted in Wikia (Now renamed to Fandom). A dataset consisting on data of more than 300 thousand wikis, such as: language, topic, number of users, admins, articles, edits, pages, number of users with a certain number of contributions, number of bots, etc.

    A study of this data has been presented in the Opensym 2018 conference. You can find the Jupyter notebook code regarding that study under the "Kernels" section.

    Content

    There are several files of data: - wikia_stats.csv: general data about each wiki. - wikia_stats_users.csv: general data about each wiki + number of human registered users, categorized according to the number of edits in the last 30 days (Users_N). - wikia_stats_users_birthdate.csv: all the data above plus the estimated date of birth.

    If you are just looking for the whole dataset corresponding the Wikia census, go for the wikia_stats_users_birthdate.csv file

    The other two .txt files contains pairs of (name, url) of the raw index crawled from the Wikia Sitemap, and the corresponding curated index with only the working wikis.

    The date of the data collection of this second version is October 2018. First version was February 2018.

    The collection of the data has been made using the scripts located here: https://github.com/Grasia/wiki-scripts

    The license of the data is not clearly stated by Wikia, because this data is publicly available in their website but they haven't established anything in their license policy.

    Acknowledgements

    All the data is possible thanks to FANDOM, the company supporting Wikia, and thank to all the contributors to the wikis.

    Inspiration

    We want to find the patterns that characterizes a healthy and sustainable online community.

    Wikia is a huge ecosystem of these communities where small, medium, big as well as young and old community coexist, so it is a perfect scenario to study online collaboration.

    License

    This data is released under the Creative Commons Attribution-Share Alike License 3.0 (Unported) (CC-BY-SA). Please attribute FANDOM (The company behind Wikia) and me (Abel Serrano Juste) when using this data.

  2. City Profile Census Data

    • hub.tumidata.org
    csv, url
    Updated Jun 4, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    TUMI (2024). City Profile Census Data [Dataset]. https://hub.tumidata.org/dataset/city_profile_census_data_ahmedebad
    Explore at:
    csv(2921), urlAvailable download formats
    Dataset updated
    Jun 4, 2024
    Dataset provided by
    Tumi Inc.http://www.tumi.com/
    Description

    City Profile Census Data
    This dataset falls under the category Planning & Policy Planning.
    It contains the following data: Information on the population
    This dataset was scouted on 2022-02-24 as part of a data sourcing project conducted by TUMI. License information might be outdated: Check original source for current licensing. The data can be accessed using the following URL / API Endpoint: https://smartcities.data.gov.in/resources/city-profile-census-data

  3. 2020 Census Data

    • caliper.com
    Updated Dec 5, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Caliper Corporation (2023). 2020 Census Data [Dataset]. https://www.caliper.com/mapping-software-data/2020-census-data.htm
    Explore at:
    cdf, shp, kml, kmz, geojsonAvailable download formats
    Dataset updated
    Dec 5, 2023
    Dataset authored and provided by
    Caliper Corporationhttp://www.caliper.com/
    License

    https://www.caliper.com/license/maptitude-license-agreement.htmhttps://www.caliper.com/license/maptitude-license-agreement.htm

    Time period covered
    2020
    Area covered
    United States
    Description

    2020 Census Tract data for use with GIS mapping software, databases, and web applications are from Caliper Corporation. Available for Maptitude or in any format such as shapefile, KML, KMZ, GeoJSON.

  4. Z

    Wiki-based Knowledge about Demographics and Outstanding Members

    • data.niaid.nih.gov
    • zenodo.org
    Updated Jan 14, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Hiba Arnaout; Simon Razniewski; Gerhard Weikum; Jeff Z. Pan (2023). Wiki-based Knowledge about Demographics and Outstanding Members [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7410436
    Explore at:
    Dataset updated
    Jan 14, 2023
    Dataset provided by
    Max Planck Institute for Informatics
    The University of Edinburgh
    Authors
    Hiba Arnaout; Simon Razniewski; Gerhard Weikum; Jeff Z. Pan
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    These datasets contains statements about demographic factors and outstanding members from Wiki-based knowledge (i.e., Wikipedia and Wikidata).

    Group-centric dataset (sample of what is it about):

    Demographic factors of winners of Nobel Prize in Physics include: male, physicist, american, university teacher, and researcher. Outstanding members in this group include Maria Curie (who isn't male but female) and Wilhelm Röntgen (who isn't a citizen of the U.S. but Germany).

    Subject-centric dataset (sample of what is it about):

    Fun trivia about Max Planck include: unlike 93% of winners of Liebig Medal (an award by Society of German Chemists), Planck was not a chemist, but a physicist.

    This data can be also browsed at: https://wikiknowledge.onrender.com/demographics/

  5. Data from: Census 2020

    • hub.tumidata.org
    url
    Updated Oct 31, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    TUMI (2025). Census 2020 [Dataset]. https://hub.tumidata.org/dataset/census_2020_guadalajara
    Explore at:
    urlAvailable download formats
    Dataset updated
    Oct 31, 2025
    Dataset provided by
    Tumi Inc.http://www.tumi.com/
    Description

    Census 2020
    This dataset falls under the category Planning & Policy Planning.
    It contains the following data: It is a very detailed street-by-street census but does not allow downloading.
    This dataset was scouted on 2022-02-11 as part of a data sourcing project conducted by TUMI. License information might be outdated: Check original source for current licensing. The data can be accessed using the following URL / API Endpoint: https://geomatica.guadalajara.gob.mx/apps/censo_2020/index.html URL for data access and license information.

  6. Electoral-College

    • kaggle.com
    zip
    Updated Nov 22, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Som Jadhav (2020). Electoral-College [Dataset]. https://www.kaggle.com/somjadhav/electoralcollege
    Explore at:
    zip(48451 bytes)Available download formats
    Dataset updated
    Nov 22, 2020
    Authors
    Som Jadhav
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Context

    The Electoral College is used to elect the President of the United States. Rather than having a national vote count to determine the winner, candidates compete at the state level to win the electoral votes of each state. Votes are allocated on a winner-take-all basis (except in Maine and Nebraska, where votes are allocated by congressional district). This institution is no stranger to controversy and there have been efforts to reform/abolish this system (most notably NPVIC).

    Content

    This dataset not only contains data about each state's electoral votes, but also statewide winners in each election, population estimates for the last 10 years, and decennial census data going back to 1960.

    Acknowledgements

    Thank you to Wikipedia, Kaggle, and Census.gov for providing the data included.

  7. Data from: Wiki-based Communities of Interest: Demographics and Outliers

    • zenodo.org
    bin
    Updated Jan 15, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Hiba Arnaout; Simon Razniewski; Jeff Z. Pan; Hiba Arnaout; Simon Razniewski; Jeff Z. Pan (2023). Wiki-based Communities of Interest: Demographics and Outliers [Dataset]. http://doi.org/10.5281/zenodo.7537200
    Explore at:
    binAvailable download formats
    Dataset updated
    Jan 15, 2023
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Hiba Arnaout; Simon Razniewski; Jeff Z. Pan; Hiba Arnaout; Simon Razniewski; Jeff Z. Pan
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    These datasets contains statements about demographics and outliers of Wiki-based Communities of Interest.

    Group-centric dataset (sample):

    {
      "title": "winners of Priestley Medal", 
      "recorded_members": 83, 
      "topics": ["STEM.Chemistry"], 
      "demographics": [
          "occupation-chemist",
          "gender-male", 
          "citizen-U.S."
      ], 
      "outliers": [
        {
          "reason": "NOT(chemist) unlike 82 recorded members", 
          "members": [
          "Francis Garvan (lawyer, art collector)"
          ]
        }, 
        {
          "reason": "NOT(male) unlike 80 recorded members", 
          "members": [
          "Mary L. Good (female)",
          "Darleane Hoffman (female)", 
          "Jacqueline Barton (female)"
          ]
        }
      ]
    }

    Subject-centric dataset (sample):

    {
      "subject": "Serena Williams", 
      "statements": [
        {
          "statement": "NOT(sport-basketball) but (tennis) unlike 4 recorded winners of Best Female Athlete ESPY Award.", 
          "score": 0.36
        },
      {
          "statement": "NOT(occupation-politician) but (tennis player, businessperson, autobiographer) unlike 20 recorded winners of Michigan Women's Hall of Fame.",
          "score": 0.17
        }
      ]
    }

    This data can be also browsed at: https://wikiknowledge.onrender.com/demographics/

  8. MHA Population Census Dataset

    • kaggle.com
    Updated May 3, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Prasad Patil (2023). MHA Population Census Dataset [Dataset]. https://www.kaggle.com/datasets/prasad22/mha-population-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    May 3, 2023
    Dataset provided by
    Kaggle
    Authors
    Prasad Patil
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    According to wikipedia, Maharashtra is the second-most populous state in India and the second-most populous country subdivision globally. the state is divided into 36 districts. Information for the Dataset is sourced from the government website of Maharashtra State. It contains information on 3 Census years - 1991,2001 and 2011.

    Content

    The dataset has 29 columns and 1.32 Lac rows.

    Column headers are self-explanatory. Below is the list of columns - District, Taluka, Town/Village, No. of households, Total population, Total male population, Total female population, Total 0 to 6-year children, Male 0 to 6-year children, Female 0 to 6-year children, Total SC population, Male SC population, Female SC population, Total ST population, Male ST population, Female ST population, Total literates, Male literates, Female literates, Total illiterates, Male illiterates, Female illiterates, Total main workers, Male main workers, Female main workers, Total non-workers, Male non-workers, Female non-workers

    Abbreviations - SC - Scheduled Caste ST - Scheduled Tribes

  9. Census Information By Radio

    • hub.tumidata.org
    csv, geojson, url +1
    Updated Jun 4, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    TUMI (2024). Census Information By Radio [Dataset]. https://hub.tumidata.org/dataset/census_information_by_radio_buenos_aires
    Explore at:
    csv(2492762), csv(2398445), zip(2426003), url, geojson(6337981), zip(393212)Available download formats
    Dataset updated
    Jun 4, 2024
    Dataset provided by
    Tumi Inc.http://www.tumi.com/
    Description

    Census Information By Radio
    This dataset falls under the category Traffic Generating Parameters Population.
    It contains the following data: Census information of the City, disaggregated by radius.
    This dataset was scouted on 2022-02-20 as part of a data sourcing project conducted by TUMI. License information might be outdated: Check original source for current licensing. The data can be accessed using the following URL / API Endpoint: https://data.buenosaires.gob.ar/dataset/informacion-censal-por-radio

  10. List of Countries and their Population

    • kaggle.com
    Updated Apr 12, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Anah Chukwujekwu (2025). List of Countries and their Population [Dataset]. https://www.kaggle.com/datasets/anahchukwujekwu/list-of-countries-and-their-population/discussion
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 12, 2025
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Anah Chukwujekwu
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    🌍 Countries and Dependencies by Population (2025)

    This dataset provides a comprehensive list of countries and dependent territories worldwide, along with their most recent population estimates.The data is sourced from the Wikipedia page List of countries and dependencies by population, which compiles figures from national statistical offices and the United Nations Population Division

    📄 Dataset Overview

    • Country/Territory Name Includes sovereign states, dependent territories, and regions with limited recognition.
    • Population Latest available estimates, primarily from national censuses or UN projection.
    • Percentage of World Population Each country's population as a percentage of the global total.
    • Date of Estimate The reference date for the population figure.
    • Notes Additional information, such as inclusion or exclusion of certain region.

    🧠 Potential Use Cases

    • Analyzing global population distribution and trends.- Creating visualizations like choropleth maps.- Normalizing other datasets by population for per capita analysis.- Educational purposes in demographics and geography.

    📌 Notes

    • The dataset includes territories and regions with limited recognition to provide a complete global perspective.
    • Population figures are based on the most recent estimates available as of 225.
    • Data may be subject to revisions as new census information becomes available.
  11. Americorps US Census Place Shape Data

    • datalumos.org
    delimited
    Updated Mar 5, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Americorps (2025). Americorps US Census Place Shape Data [Dataset]. http://doi.org/10.3886/E221708V1
    Explore at:
    delimitedAvailable download formats
    Dataset updated
    Mar 5, 2025
    Dataset provided by
    AmeriCorpshttp://www.americorps.gov/
    Authors
    Americorps
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    United States
    Description

    Boundary Shapes for the US Census 'Places' 2021

  12. Census Blocks

    • caliper.com
    Updated Apr 22, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Caliper Corporation (2025). Census Blocks [Dataset]. https://www.caliper.com/mapping-software-data/census-blocks.htm
    Explore at:
    cdf, shp, kml, kmz, geojsonAvailable download formats
    Dataset updated
    Apr 22, 2025
    Dataset authored and provided by
    Caliper Corporationhttp://www.caliper.com/
    License

    https://www.caliper.com/license/maptitude-license-agreement.htmhttps://www.caliper.com/license/maptitude-license-agreement.htm

    Time period covered
    2025
    Area covered
    United States
    Description

    Census Blocks data for use with GIS mapping software, databases, and web applications are from Caliper Corporation and contain block boundaries with associated 2020 Census demographic data.

  13. Census Block Groups

    • caliper.com
    cdf, dwg, dxf, gdb +9
    Updated Apr 22, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Caliper Corporation (2025). Census Block Groups [Dataset]. https://www.caliper.com/mapping-software-data/census-block-groups.htm
    Explore at:
    shp, kmz, kml, geojson, cdf, postgis, postgresql, gdb, ntf, sdo, sql server mssql, dxf, dwgAvailable download formats
    Dataset updated
    Apr 22, 2025
    Dataset authored and provided by
    Caliper Corporationhttp://www.caliper.com/
    License

    https://www.caliper.com/license/maptitude-license-agreement.htmhttps://www.caliper.com/license/maptitude-license-agreement.htm

    Time period covered
    2025
    Area covered
    United States
    Description

    Census Block Groups data for use with GIS mapping software, databases, and web applications are from Caliper Corporation and contain block group boundaries with associated Census and American Community Survey demographic data.

  14. US Census Place Shape Boundary 2021

    • datasets.ai
    • data.americorps.gov
    • +1more
    23, 40, 55, 8
    Updated Nov 15, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    AmeriCorps (2023). US Census Place Shape Boundary 2021 [Dataset]. https://datasets.ai/datasets/us-census-place-shape-boundary-2021
    Explore at:
    55, 23, 8, 40Available download formats
    Dataset updated
    Nov 15, 2023
    Dataset authored and provided by
    AmeriCorpshttp://www.americorps.gov/
    Area covered
    United States
    Description

    Boundary Shapes for the US Census 'Places' 2021

  15. National Population And Housing Census - Cnpv - 2018

    • hub.tumidata.org
    url
    Updated Oct 31, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    TUMI (2025). National Population And Housing Census - Cnpv - 2018 [Dataset]. https://hub.tumidata.org/dataset/national_population_and_housing_census_cnpv_2018_barranquilla
    Explore at:
    urlAvailable download formats
    Dataset updated
    Oct 31, 2025
    Dataset provided by
    Tumi Inc.http://www.tumi.com/
    Description

    National Population And Housing Census - Cnpv - 2018
    This dataset falls under the category Traffic Generating Parameters Population.
    It contains the following data: The population and housing census conducted in 2018, consisted of counting and characterizing the people residing in Colombia, as well as the dwellings and households in the national territory. Through the census, the country obtains first-hand data on the number of inhabitants, their distribution in the territory and their living conditions.
    This dataset was scouted on 2022-02-05 as part of a data sourcing project conducted by TUMI. License information might be outdated: Check original source for current licensing. The data can be accessed using the following URL / API Endpoint: https://microdatos.dane.gov.co/index.php/catalog/643/get_microdata URL for data access and license information.

  16. a

    NM Census Tract Population Estimates, 2015

    • hub.arcgis.com
    Updated Feb 8, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Larry Spear's GIS Research Projects (2017). NM Census Tract Population Estimates, 2015 [Dataset]. https://hub.arcgis.com/maps/lspe::nm-census-tract-population-estimates-2015/about
    Explore at:
    Dataset updated
    Feb 8, 2017
    Dataset authored and provided by
    Larry Spear's GIS Research Projects
    Area covered
    Description

    Population estimates for 2015 New Mexico Census Tracts from ESRI Demographics and UNM, GPS (Geospatial and Population Studies). Compares the relative difference between (see https://en.wikipedia.org/wiki/Relative_change_and_difference) the two estimates for each census tract. They are very similar overall numerically (Pearson's correlation 0.9921849) with the ESRI population total of 2,105,287 persons and the GPS population total of 2,099,848 persons. However, there are some notable differences for specific census tracts. This comparison is provided so that researchers who use both estimates can gain a better understanding of areas where they are similar or different. GPS does not provide estimates at the census block or block group level, the building blocks of census tracts. Fortunately ESRI provides these estimates for block groups at yearly intervals. For researchers who focus on urban areas and use block group estimates, knowing these differences at the census tract level is also very useful.Note: Recent GPS estimates were obtained from the NM IBIS website as an Excel file and converted to a ESRI file geodatabase for comparison using ArcGIS Desktop.Additional Note: The GPS total of 2,099,848 was derived from the Excel census tract file downloaded from IBIS on9/1/2016. Since then GPS has released 2015 population estimates in geodatabase format (downloaded on11/10/2016) and the census tract total is 2,099.852 persons. Both are slightly different than thecounty total of 2,099,856 persons.See ongoing research projects for some example applications.

  17. Labour force census (01-01-1985 - 01-01-1986)

    • data.europa.eu
    Updated Dec 11, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Centraal_Bureau_voor_de_Statistiek (2024). Labour force census (01-01-1985 - 01-01-1986) [Dataset]. https://data.europa.eu/data/datasets/cbs-microdata-0b01e41080022527?locale=en
    Explore at:
    Dataset updated
    Dec 11, 2024
    Dataset provided by
    Centraal Bureau voor de Statistiek
    Authors
    Centraal_Bureau_voor_de_Statistiek
    Description

    Labour force census 1985

    More information on how to access the data:

    https://www.cbs.nl/en-en/our-services/custom-and-microdata/microdata-self-research

    Population

    Persons

  18. AmeriCorps Participant Demographics Data

    • catalog.data.gov
    • data.americorps.gov
    Updated Mar 13, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    AmeriCorps (2025). AmeriCorps Participant Demographics Data [Dataset]. https://catalog.data.gov/dataset/americorps-participant-demographics-data
    Explore at:
    Dataset updated
    Mar 13, 2025
    Dataset provided by
    AmeriCorpshttp://www.americorps.gov/
    Description

    This dataset provides comparisons of demographic group prevalence in AmeriCorps Member/Volunteers populations to that of the greater U.S. population. The odds ratio analysis was completed by the Office of the Chief Data Officer. Population estimates were obtained from U.S. Census Bureau data reported in American Community Survey 5-Year tables DP05 (total U.S. populations) and S1701 (U.S. populations below poverty line), and socioeconomic status-related microdata maintained by IPUMS USA. See Attached Document 'AmeriCorps Demographic Analysis Procedure.pdf' for a full technical documentation of the analysis.

  19. Data Demographics, 2010 Census

    • hub.tumidata.org
    • fortaleza.tumidata.org
    url
    Updated Jun 4, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    TUMI (2024). Data Demographics, 2010 Census [Dataset]. https://hub.tumidata.org/dataset/data_demographics_2010_census_fortaleza
    Explore at:
    urlAvailable download formats
    Dataset updated
    Jun 4, 2024
    Dataset provided by
    Tumi Inc.http://www.tumi.com/
    Description

    Data Demographics, 2010 Census
    This dataset falls under the category Planning & Policy Planning.
    It contains the following data: DATA DEMOGRAPHICS, 2010 CENSUS
    This dataset was scouted on 2022-09-20 as part of a data sourcing project conducted by TUMI. License information might be outdated: Check original source for current licensing. The data can be accessed using the following URL / API Endpoint: https://public.tableau.com/app/profile/secretaria.de.desenvolvimento.economico.sde/viz/DADOSDEMOGRFICOSDEFORTALEZACENSO2010-IBGE/PainelDemografiaSee URL for data access and license information.

  20. d

    U.S. Voting by Census Block Groups

    • search.dataone.org
    Updated Oct 29, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bryan, Michael (2025). U.S. Voting by Census Block Groups [Dataset]. http://doi.org/10.7910/DVN/NKNWBX
    Explore at:
    Dataset updated
    Oct 29, 2025
    Dataset provided by
    Harvard Dataverse
    Authors
    Bryan, Michael
    Area covered
    United States
    Description

    PROBLEM AND OPPORTUNITY In the United States, voting is largely a private matter. A registered voter is given a randomized ballot form or machine to prevent linkage between their voting choices and their identity. This disconnect supports confidence in the election process, but it provides obstacles to an election's analysis. A common solution is to field exit polls, interviewing voters immediately after leaving their polling location. This method is rife with bias, however, and functionally limited in direct demographics data collected. For the 2020 general election, though, most states published their election results for each voting location. These publications were additionally supported by the geographical areas assigned to each location, the voting precincts. As a result, geographic processing can now be applied to project precinct election results onto Census block groups. While precinct have few demographic traits directly, their geographies have characteristics that make them projectable onto U.S. Census geographies. Both state voting precincts and U.S. Census block groups: are exclusive, and do not overlap are adjacent, fully covering their corresponding state and potentially county have roughly the same size in area, population and voter presence Analytically, a projection of local demographics does not allow conclusions about voters themselves. However, the dataset does allow statements related to the geographies that yield voting behavior. One could say, for example, that an area dominated by a particular voting pattern would have mean traits of age, race, income or household structure. The dataset that results from this programming provides voting results allocated by Census block groups. The block group identifier can be joined to Census Decennial and American Community Survey demographic estimates. DATA SOURCES The state election results and geographies have been compiled by Voting and Election Science team on Harvard's dataverse. State voting precincts lie within state and county boundaries. The Census Bureau, on the other hand, publishes its estimates across a variety of geographic definitions including a hierarchy of states, counties, census tracts and block groups. Their definitions can be found here. The geometric shapefiles for each block group are available here. The lowest level of this geography changes often and can obsolesce before the next census survey (Decennial or American Community Survey programs). The second to lowest census level, block groups, have the benefit of both granularity and stability however. The 2020 Decennial survey details US demographics into 217,740 block groups with between a few hundred and a few thousand people. Dataset Structure The dataset's columns include: Column Definition BLOCKGROUP_GEOID 12 digit primary key. Census GEOID of the block group row. This code concatenates: 2 digit state 3 digit county within state 6 digit Census Tract identifier 1 digit Census Block Group identifier within tract STATE State abbreviation, redundent with 2 digit state FIPS code above REP Votes for Republican party candidate for president DEM Votes for Democratic party candidate for president LIB Votes for Libertarian party candidate for president OTH Votes for presidential candidates other than Republican, Democratic or Libertarian AREA square kilometers of area associated with this block group GAP total area of the block group, net of area attributed to voting precincts PRECINCTS Number of voting precincts that intersect this block group ASSUMPTIONS, NOTES AND CONCERNS: Votes are attributed based upon the proportion of the precinct's area that intersects the corresponding block group. Alternative methods are left to the analyst's initiative. 50 states and the District of Columbia are in scope as those U.S. possessions voting in the general election for the U.S. Presidency. Three states did not report their results at the precinct level: South Dakota, Kentucky and West Virginia. A dummy block group is added for each of these states to maintain national totals. These states represent 2.1% of all votes cast. Counties are commonly coded using FIPS codes. However, each election result file may have the county field named differently. Also, three states do not share county definitions - Delaware, Massachusetts, Alaska and the District of Columbia. Block groups may be used to capture geographies that do not have population like bodies of water. As a result, block groups without intersection voting precincts are not uncommon. In the U.S., elections are administered at a state level with the Federal Elections Commission compiling state totals against the Electoral College weights. The states have liberty, though, to define and change their own voting precincts https://en.wikipedia.org/wiki/Electoral_precinct. The Census Bureau... Visit https://dataone.org/datasets/sha256%3A05707c1dc04a814129f751937a6ea56b08413546b18b351a85bc96da16a7f8b5 for complete metadata about this dataset.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Abel Serrano Juste (2018). Wikia census / Fandom census [Dataset]. https://www.kaggle.com/abeserra/wikia-census
Organization logo

Wikia census / Fandom census

A census of 300k wikis in Wikia.

Explore at:
zip(87833068 bytes)Available download formats
Dataset updated
Oct 19, 2018
Authors
Abel Serrano Juste
License

Attribution-ShareAlike 3.0 (CC BY-SA 3.0)https://creativecommons.org/licenses/by-sa/3.0/
License information was derived automatically

Description

Context

A census of all the wikis hosted in Wikia (Now renamed to Fandom). A dataset consisting on data of more than 300 thousand wikis, such as: language, topic, number of users, admins, articles, edits, pages, number of users with a certain number of contributions, number of bots, etc.

A study of this data has been presented in the Opensym 2018 conference. You can find the Jupyter notebook code regarding that study under the "Kernels" section.

Content

There are several files of data: - wikia_stats.csv: general data about each wiki. - wikia_stats_users.csv: general data about each wiki + number of human registered users, categorized according to the number of edits in the last 30 days (Users_N). - wikia_stats_users_birthdate.csv: all the data above plus the estimated date of birth.

If you are just looking for the whole dataset corresponding the Wikia census, go for the wikia_stats_users_birthdate.csv file

The other two .txt files contains pairs of (name, url) of the raw index crawled from the Wikia Sitemap, and the corresponding curated index with only the working wikis.

The date of the data collection of this second version is October 2018. First version was February 2018.

The collection of the data has been made using the scripts located here: https://github.com/Grasia/wiki-scripts

The license of the data is not clearly stated by Wikia, because this data is publicly available in their website but they haven't established anything in their license policy.

Acknowledgements

All the data is possible thanks to FANDOM, the company supporting Wikia, and thank to all the contributors to the wikis.

Inspiration

We want to find the patterns that characterizes a healthy and sustainable online community.

Wikia is a huge ecosystem of these communities where small, medium, big as well as young and old community coexist, so it is a perfect scenario to study online collaboration.

License

This data is released under the Creative Commons Attribution-Share Alike License 3.0 (Unported) (CC-BY-SA). Please attribute FANDOM (The company behind Wikia) and me (Abel Serrano Juste) when using this data.

Search
Clear search
Close search
Google apps
Main menu