Facebook
TwitterAttribution-ShareAlike 3.0 (CC BY-SA 3.0)https://creativecommons.org/licenses/by-sa/3.0/
License information was derived automatically
A census of all the wikis hosted in Wikia (Now renamed to Fandom). A dataset consisting on data of more than 300 thousand wikis, such as: language, topic, number of users, admins, articles, edits, pages, number of users with a certain number of contributions, number of bots, etc.
A study of this data has been presented in the Opensym 2018 conference. You can find the Jupyter notebook code regarding that study under the "Kernels" section.
There are several files of data: - wikia_stats.csv: general data about each wiki. - wikia_stats_users.csv: general data about each wiki + number of human registered users, categorized according to the number of edits in the last 30 days (Users_N). - wikia_stats_users_birthdate.csv: all the data above plus the estimated date of birth.
If you are just looking for the whole dataset corresponding the Wikia census, go for the wikia_stats_users_birthdate.csv file
The other two .txt files contains pairs of (name, url) of the raw index crawled from the Wikia Sitemap, and the corresponding curated index with only the working wikis.
The date of the data collection of this second version is October 2018. First version was February 2018.
The collection of the data has been made using the scripts located here: https://github.com/Grasia/wiki-scripts
The license of the data is not clearly stated by Wikia, because this data is publicly available in their website but they haven't established anything in their license policy.
All the data is possible thanks to FANDOM, the company supporting Wikia, and thank to all the contributors to the wikis.
We want to find the patterns that characterizes a healthy and sustainable online community.
Wikia is a huge ecosystem of these communities where small, medium, big as well as young and old community coexist, so it is a perfect scenario to study online collaboration.
This data is released under the Creative Commons Attribution-Share Alike License 3.0 (Unported) (CC-BY-SA). Please attribute FANDOM (The company behind Wikia) and me (Abel Serrano Juste) when using this data.
Facebook
TwitterCity Profile Census Data
This dataset falls under the category Planning & Policy Planning.
It contains the following data: Information on the population
This dataset was scouted on 2022-02-24 as part of a data sourcing project conducted by TUMI. License information might be outdated: Check original source for current licensing.
The data can be accessed using the following URL / API Endpoint: https://smartcities.data.gov.in/resources/city-profile-census-data
Facebook
Twitterhttps://www.caliper.com/license/maptitude-license-agreement.htmhttps://www.caliper.com/license/maptitude-license-agreement.htm
2020 Census Tract data for use with GIS mapping software, databases, and web applications are from Caliper Corporation. Available for Maptitude or in any format such as shapefile, KML, KMZ, GeoJSON.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
These datasets contains statements about demographic factors and outstanding members from Wiki-based knowledge (i.e., Wikipedia and Wikidata).
Group-centric dataset (sample of what is it about):
Demographic factors of winners of Nobel Prize in Physics include: male, physicist, american, university teacher, and researcher. Outstanding members in this group include Maria Curie (who isn't male but female) and Wilhelm Röntgen (who isn't a citizen of the U.S. but Germany).
Subject-centric dataset (sample of what is it about):
Fun trivia about Max Planck include: unlike 93% of winners of Liebig Medal (an award by Society of German Chemists), Planck was not a chemist, but a physicist.
This data can be also browsed at: https://wikiknowledge.onrender.com/demographics/
Facebook
TwitterCensus 2020
This dataset falls under the category Planning & Policy Planning.
It contains the following data: It is a very detailed street-by-street census but does not allow downloading.
This dataset was scouted on 2022-02-11 as part of a data sourcing project conducted by TUMI. License information might be outdated: Check original source for current licensing.
The data can be accessed using the following URL / API Endpoint: https://geomatica.guadalajara.gob.mx/apps/censo_2020/index.html URL for data access and license information.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
The Electoral College is used to elect the President of the United States. Rather than having a national vote count to determine the winner, candidates compete at the state level to win the electoral votes of each state. Votes are allocated on a winner-take-all basis (except in Maine and Nebraska, where votes are allocated by congressional district). This institution is no stranger to controversy and there have been efforts to reform/abolish this system (most notably NPVIC).
This dataset not only contains data about each state's electoral votes, but also statewide winners in each election, population estimates for the last 10 years, and decennial census data going back to 1960.
Thank you to Wikipedia, Kaggle, and Census.gov for providing the data included.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
These datasets contains statements about demographics and outliers of Wiki-based Communities of Interest.
Group-centric dataset (sample):
{
"title": "winners of Priestley Medal",
"recorded_members": 83,
"topics": ["STEM.Chemistry"],
"demographics": [
"occupation-chemist",
"gender-male",
"citizen-U.S."
],
"outliers": [
{
"reason": "NOT(chemist) unlike 82 recorded members",
"members": [
"Francis Garvan (lawyer, art collector)"
]
},
{
"reason": "NOT(male) unlike 80 recorded members",
"members": [
"Mary L. Good (female)",
"Darleane Hoffman (female)",
"Jacqueline Barton (female)"
]
}
]
}
Subject-centric dataset (sample):
{
"subject": "Serena Williams",
"statements": [
{
"statement": "NOT(sport-basketball) but (tennis) unlike 4 recorded winners of Best Female Athlete ESPY Award.",
"score": 0.36
},
{
"statement": "NOT(occupation-politician) but (tennis player, businessperson, autobiographer) unlike 20 recorded winners of Michigan Women's Hall of Fame.",
"score": 0.17
}
]
}
This data can be also browsed at: https://wikiknowledge.onrender.com/demographics/
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
According to wikipedia, Maharashtra is the second-most populous state in India and the second-most populous country subdivision globally. the state is divided into 36 districts. Information for the Dataset is sourced from the government website of Maharashtra State.
It contains information on 3 Census years - 1991,2001 and 2011.
The dataset has 29 columns and 1.32 Lac rows.
Column headers are self-explanatory. Below is the list of columns - District, Taluka, Town/Village, No. of households, Total population, Total male population, Total female population, Total 0 to 6-year children, Male 0 to 6-year children, Female 0 to 6-year children, Total SC population, Male SC population, Female SC population, Total ST population, Male ST population, Female ST population, Total literates, Male literates, Female literates, Total illiterates, Male illiterates, Female illiterates, Total main workers, Male main workers, Female main workers, Total non-workers, Male non-workers, Female non-workers
Abbreviations - SC - Scheduled Caste ST - Scheduled Tribes
Facebook
TwitterCensus Information By Radio
This dataset falls under the category Traffic Generating Parameters Population.
It contains the following data: Census information of the City, disaggregated by radius.
This dataset was scouted on 2022-02-20 as part of a data sourcing project conducted by TUMI. License information might be outdated: Check original source for current licensing.
The data can be accessed using the following URL / API Endpoint: https://data.buenosaires.gob.ar/dataset/informacion-censal-por-radio
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This dataset provides a comprehensive list of countries and dependent territories worldwide, along with their most recent population estimates.The data is sourced from the Wikipedia page List of countries and dependencies by population, which compiles figures from national statistical offices and the United Nations Population Division
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Boundary Shapes for the US Census 'Places' 2021
Facebook
Twitterhttps://www.caliper.com/license/maptitude-license-agreement.htmhttps://www.caliper.com/license/maptitude-license-agreement.htm
Census Blocks data for use with GIS mapping software, databases, and web applications are from Caliper Corporation and contain block boundaries with associated 2020 Census demographic data.
Facebook
Twitterhttps://www.caliper.com/license/maptitude-license-agreement.htmhttps://www.caliper.com/license/maptitude-license-agreement.htm
Census Block Groups data for use with GIS mapping software, databases, and web applications are from Caliper Corporation and contain block group boundaries with associated Census and American Community Survey demographic data.
Facebook
TwitterBoundary Shapes for the US Census 'Places' 2021
Facebook
TwitterNational Population And Housing Census - Cnpv - 2018
This dataset falls under the category Traffic Generating Parameters Population.
It contains the following data: The population and housing census conducted in 2018, consisted of counting and characterizing the people residing in Colombia, as well as the dwellings and households in the national territory. Through the census, the country obtains first-hand data on the number of inhabitants, their distribution in the territory and their living conditions.
This dataset was scouted on 2022-02-05 as part of a data sourcing project conducted by TUMI. License information might be outdated: Check original source for current licensing.
The data can be accessed using the following URL / API Endpoint: https://microdatos.dane.gov.co/index.php/catalog/643/get_microdata URL for data access and license information.
Facebook
TwitterPopulation estimates for 2015 New Mexico Census Tracts from ESRI Demographics and UNM, GPS (Geospatial and Population Studies). Compares the relative difference between (see https://en.wikipedia.org/wiki/Relative_change_and_difference) the two estimates for each census tract. They are very similar overall numerically (Pearson's correlation 0.9921849) with the ESRI population total of 2,105,287 persons and the GPS population total of 2,099,848 persons. However, there are some notable differences for specific census tracts. This comparison is provided so that researchers who use both estimates can gain a better understanding of areas where they are similar or different. GPS does not provide estimates at the census block or block group level, the building blocks of census tracts. Fortunately ESRI provides these estimates for block groups at yearly intervals. For researchers who focus on urban areas and use block group estimates, knowing these differences at the census tract level is also very useful.Note: Recent GPS estimates were obtained from the NM IBIS website as an Excel file and converted to a ESRI file geodatabase for comparison using ArcGIS Desktop.Additional Note: The GPS total of 2,099,848 was derived from the Excel census tract file downloaded from IBIS on9/1/2016. Since then GPS has released 2015 population estimates in geodatabase format (downloaded on11/10/2016) and the census tract total is 2,099.852 persons. Both are slightly different than thecounty total of 2,099,856 persons.See ongoing research projects for some example applications.
Facebook
TwitterLabour force census 1985
More information on how to access the data:
https://www.cbs.nl/en-en/our-services/custom-and-microdata/microdata-self-research
Persons
Facebook
TwitterThis dataset provides comparisons of demographic group prevalence in AmeriCorps Member/Volunteers populations to that of the greater U.S. population. The odds ratio analysis was completed by the Office of the Chief Data Officer. Population estimates were obtained from U.S. Census Bureau data reported in American Community Survey 5-Year tables DP05 (total U.S. populations) and S1701 (U.S. populations below poverty line), and socioeconomic status-related microdata maintained by IPUMS USA. See Attached Document 'AmeriCorps Demographic Analysis Procedure.pdf' for a full technical documentation of the analysis.
Facebook
TwitterData Demographics, 2010 Census
This dataset falls under the category Planning & Policy Planning.
It contains the following data: DATA DEMOGRAPHICS, 2010 CENSUS
This dataset was scouted on 2022-09-20 as part of a data sourcing project conducted by TUMI. License information might be outdated: Check original source for current licensing.
The data can be accessed using the following URL / API Endpoint: https://public.tableau.com/app/profile/secretaria.de.desenvolvimento.economico.sde/viz/DADOSDEMOGRFICOSDEFORTALEZACENSO2010-IBGE/PainelDemografiaSee URL for data access and license information.
Facebook
TwitterPROBLEM AND OPPORTUNITY In the United States, voting is largely a private matter. A registered voter is given a randomized ballot form or machine to prevent linkage between their voting choices and their identity. This disconnect supports confidence in the election process, but it provides obstacles to an election's analysis. A common solution is to field exit polls, interviewing voters immediately after leaving their polling location. This method is rife with bias, however, and functionally limited in direct demographics data collected. For the 2020 general election, though, most states published their election results for each voting location. These publications were additionally supported by the geographical areas assigned to each location, the voting precincts. As a result, geographic processing can now be applied to project precinct election results onto Census block groups. While precinct have few demographic traits directly, their geographies have characteristics that make them projectable onto U.S. Census geographies. Both state voting precincts and U.S. Census block groups: are exclusive, and do not overlap are adjacent, fully covering their corresponding state and potentially county have roughly the same size in area, population and voter presence Analytically, a projection of local demographics does not allow conclusions about voters themselves. However, the dataset does allow statements related to the geographies that yield voting behavior. One could say, for example, that an area dominated by a particular voting pattern would have mean traits of age, race, income or household structure. The dataset that results from this programming provides voting results allocated by Census block groups. The block group identifier can be joined to Census Decennial and American Community Survey demographic estimates. DATA SOURCES The state election results and geographies have been compiled by Voting and Election Science team on Harvard's dataverse. State voting precincts lie within state and county boundaries. The Census Bureau, on the other hand, publishes its estimates across a variety of geographic definitions including a hierarchy of states, counties, census tracts and block groups. Their definitions can be found here. The geometric shapefiles for each block group are available here. The lowest level of this geography changes often and can obsolesce before the next census survey (Decennial or American Community Survey programs). The second to lowest census level, block groups, have the benefit of both granularity and stability however. The 2020 Decennial survey details US demographics into 217,740 block groups with between a few hundred and a few thousand people. Dataset Structure The dataset's columns include: Column Definition BLOCKGROUP_GEOID 12 digit primary key. Census GEOID of the block group row. This code concatenates: 2 digit state 3 digit county within state 6 digit Census Tract identifier 1 digit Census Block Group identifier within tract STATE State abbreviation, redundent with 2 digit state FIPS code above REP Votes for Republican party candidate for president DEM Votes for Democratic party candidate for president LIB Votes for Libertarian party candidate for president OTH Votes for presidential candidates other than Republican, Democratic or Libertarian AREA square kilometers of area associated with this block group GAP total area of the block group, net of area attributed to voting precincts PRECINCTS Number of voting precincts that intersect this block group ASSUMPTIONS, NOTES AND CONCERNS: Votes are attributed based upon the proportion of the precinct's area that intersects the corresponding block group. Alternative methods are left to the analyst's initiative. 50 states and the District of Columbia are in scope as those U.S. possessions voting in the general election for the U.S. Presidency. Three states did not report their results at the precinct level: South Dakota, Kentucky and West Virginia. A dummy block group is added for each of these states to maintain national totals. These states represent 2.1% of all votes cast. Counties are commonly coded using FIPS codes. However, each election result file may have the county field named differently. Also, three states do not share county definitions - Delaware, Massachusetts, Alaska and the District of Columbia. Block groups may be used to capture geographies that do not have population like bodies of water. As a result, block groups without intersection voting precincts are not uncommon. In the U.S., elections are administered at a state level with the Federal Elections Commission compiling state totals against the Electoral College weights. The states have liberty, though, to define and change their own voting precincts https://en.wikipedia.org/wiki/Electoral_precinct. The Census Bureau... Visit https://dataone.org/datasets/sha256%3A05707c1dc04a814129f751937a6ea56b08413546b18b351a85bc96da16a7f8b5 for complete metadata about this dataset.
Facebook
TwitterAttribution-ShareAlike 3.0 (CC BY-SA 3.0)https://creativecommons.org/licenses/by-sa/3.0/
License information was derived automatically
A census of all the wikis hosted in Wikia (Now renamed to Fandom). A dataset consisting on data of more than 300 thousand wikis, such as: language, topic, number of users, admins, articles, edits, pages, number of users with a certain number of contributions, number of bots, etc.
A study of this data has been presented in the Opensym 2018 conference. You can find the Jupyter notebook code regarding that study under the "Kernels" section.
There are several files of data: - wikia_stats.csv: general data about each wiki. - wikia_stats_users.csv: general data about each wiki + number of human registered users, categorized according to the number of edits in the last 30 days (Users_N). - wikia_stats_users_birthdate.csv: all the data above plus the estimated date of birth.
If you are just looking for the whole dataset corresponding the Wikia census, go for the wikia_stats_users_birthdate.csv file
The other two .txt files contains pairs of (name, url) of the raw index crawled from the Wikia Sitemap, and the corresponding curated index with only the working wikis.
The date of the data collection of this second version is October 2018. First version was February 2018.
The collection of the data has been made using the scripts located here: https://github.com/Grasia/wiki-scripts
The license of the data is not clearly stated by Wikia, because this data is publicly available in their website but they haven't established anything in their license policy.
All the data is possible thanks to FANDOM, the company supporting Wikia, and thank to all the contributors to the wikis.
We want to find the patterns that characterizes a healthy and sustainable online community.
Wikia is a huge ecosystem of these communities where small, medium, big as well as young and old community coexist, so it is a perfect scenario to study online collaboration.
This data is released under the Creative Commons Attribution-Share Alike License 3.0 (Unported) (CC-BY-SA). Please attribute FANDOM (The company behind Wikia) and me (Abel Serrano Juste) when using this data.