https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This is a Dataset of the World Population Consisting of Each and Every Country. I have attempted to analyze the same data to bring some insights out of it. The dataset consists of 234 rows and 17 columns. I will analyze the same data and bring the below pieces of information regarding the same.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset is a compilation of processed data on citation and references for research papers including their author, institution and open access info for a selected sample of academics analysed using Microsoft Academic Graph (MAG) data and CORE. The data for this dataset was collected during December 2019 to January 2020.Six countries (Austria, Brazil, Germany, India, Portugal, United Kingdom and United States) were the focus of the six questions which make up this dataset. There is one csv file per country and per question (36 files in total). More details about the creation of this dataset are available on the public ON-MERRIT D3.1 deliverable report.The dataset is a combination of two different data sources, one part is a dataset created on analysing promotion policies across the target countries, while the second part is a set of data points available to understand the publishing behaviour. To facilitate the analysis the dataset is organised in the following seven folders:PRTThe dataset with the file name "PRT_policies.csv" contains the related information as this was extracted from promotion, review and tenure (PRT) policies. Q1: What % of papers coming from a university are Open Access?- Dataset Name format: oa_status_countryname_papers.csv- Dataset Contents: Open Access (OA) status of all papers of all the universities listed in Times Higher Education World University Rankings (THEWUR) for the given country. A paper is marked OA if there is at least an OA link available. OA links are collected using the CORE Discovery API.- Important considerations about this dataset: - Papers with multiple authorship are preserved only once towards each of the distinct institutions their authors may belong to. - The service we used to recognise if a paper is OA, CORE Discovery, does not contain entries for all paperids in MAG. This implies that some of the records in the dataset extracted will not have either a true or false value for the _is_OA_ field. - Only those records marked as true for _is_OA_ field can be said to be OA. Others with false or no value for is_OA field are unknown status (i.e. not necessarily closed access).Q2: How are papers, published by the selected universities, distributed across the three scientific disciplines of our choice?- Dataset Name format: fsid_countryname_papers.csv- Dataset Contents: For the given country, all papers for all the universities listed in THEWUR with the information of fieldofstudy they belong to.- Important considerations about this dataset: * MAG can associate a paper to multiple fieldofstudyid. If a paper belongs to more than one of our fieldofstudyid, separate records were created for the paper with each of those _fieldofstudyid_s.- MAG assigns fieldofstudyid to every paper with a score. We preserve only those records whose score is more than 0.5 for any fieldofstudyid it belongs to.- Papers with multiple authorship are preserved only once towards each of the distinct institutions their authors may belong to. Papers with authorship from multiple universities are counted once towards each of the universities concerned.Q3: What is the gender distribution in authorship of papers published by the universities?- Dataset Name format: author_gender_countryname_papers.csv- Dataset Contents: All papers with their author names for all the universities listed in THEWUR.- Important considerations about this dataset :- When there are multiple collaborators(authors) for the same paper, this dataset makes sure that only the records for collaborators from within selected universities are preserved.- An external script was executed to determine the gender of the authors. The script is available here.Q4: Distribution of staff seniority (= number of years from their first publication until the last publication) in the given university.- Dataset Name format: author_ids_countryname_papers.csv- Dataset Contents: For a given country, all papers for authors with their publication year for all the universities listed in THEWUR.- Important considerations about this work :- When there are multiple collaborators(authors) for the same paper, this dataset makes sure that only the records for collaborators from within selected universities are preserved.- Calculating staff seniority can be achieved in various ways. The most straightforward option is to calculate it as _academic_age = MAX(year) - MIN(year) _for each authorid.Q5: Citation counts (incoming) for OA vs Non-OA papers published by the university.- Dataset Name format: cc_oa_countryname_papers.csv- Dataset Contents: OA status and OA links for all papers of all the universities listed in THEWUR and for each of those papers, count of incoming citations available in MAG.- Important considerations about this dataset :- CORE Discovery was used to establish the OA status of papers.- Papers with multiple authorship are preserved only once towards each of the distinct institutions their authors may belong to.- Only those records marked as true for _is_OA_ field can be said to be OA. Others with false or no value for is_OA field are unknown status (i.e. not necessarily closed access).Q6: Count of OA vs Non-OA references (outgoing) for all papers published by universities.- Dataset Name format: rc_oa_countryname_-papers.csv- Dataset Contents: Counts of all OA and unknown papers referenced by all papers published by all the universities listed in THEWUR.- Important considerations about this dataset :- CORE Discovery was used to establish the OA status of papers being referenced.- Papers with multiple authorship are preserved only once towards each of the distinct institutions their authors may belong to. Papers with authorship from multiple universities are counted once towards each of the universities concerned.Additional files:- _fieldsofstudy_mag_.csv: this file contains a dump of fieldsofstudy table of MAG mapping each of the ids to their actual field of study name.
Which county has the most Facebook users?
There are more than 378 million Facebook users in India alone, making it the leading country in terms of Facebook audience size. To put this into context, if India’s Facebook audience were a country then it would be ranked third in terms of largest population worldwide. Apart from India, there are several other markets with more than 100 million Facebook users each: The United States, Indonesia, and Brazil with 193.8 million, 119.05 million, and 112.55 million Facebook users respectively.
Facebook – the most used social media
Meta, the company that was previously called Facebook, owns four of the most popular social media platforms worldwide, WhatsApp, Facebook Messenger, Facebook, and Instagram. As of the third quarter of 2021, there were around 3,5 billion cumulative monthly users of the company’s products worldwide. With around 2.9 billion monthly active users, Facebook is the most popular social media worldwide. With an audience of this scale, it is no surprise that the vast majority of Facebook’s revenue is generated through advertising.
Facebook usage by device
As of July 2021, it was found that 98.5 percent of active users accessed their Facebook account from mobile devices. In fact, almost 81.8 percent of Facebook audiences worldwide access the platform only via mobile phone. Facebook is not only available through mobile browser as the company has published several mobile apps for users to access their products and services. As of the third quarter 2021, the four core Meta products were leading the ranking of most downloaded mobile apps worldwide, with WhatsApp amassing approximately six billion downloads.
Cost of Living - Country Rankings Dataset
The "Cost of Living - Country Rankings Dataset" provides comprehensive information on the cost of living in various countries around the world. Understanding the cost of living is crucial for individuals, businesses, and policymakers alike, as it impacts decisions related to travel, relocation, investment, and economic analysis. This dataset is intended to serve as a valuable resource for researchers, data analysts, and anyone interested in exploring and comparing the cost of living across different nations.
This dataset comprises four primary columns:
1. Countries: This column contains the names of various countries included in the dataset. Each country is identified by its official name.
2. Cost of Living: The "Cost of Living" column represents the cost of living index or score for each country. This index is typically calculated by considering various factors, such as housing, food, transportation, healthcare, and other essential expenses. A higher index value indicates a higher cost of living in that particular country, while a lower value suggests a more affordable cost of living.
3. 2017 Global Rank: This column provides the global ranking of each country's cost of living in the year 2017. The ranking is based on the cost of living index mentioned earlier. A lower rank indicates a lower cost of living relative to other countries, while a higher rank suggests a higher cost of living position.
4. Available Data: The "Available Data" column indicates whether or not data for a specific country and year is available.
This dataset is designed to support various data analysis and visualization tasks. Users can explore trends in the cost of living, identify countries with high or low cost of living, and analyze how rankings have changed over time. Researchers can use this dataset to conduct in-depth studies on the factors influencing the cost of living in different regions and the economic implications of such variations.
Please note that the dataset includes information for the year 2017, and users are encouraged to consider this when interpreting the data, as economic conditions and the cost of living may have changed since then. Additionally, this dataset aims to provide a snapshot of cost of living rankings for countries in 2017 and may not cover every country in the world.
Link: https://www.theglobaleconomy.com/rankings/cost_of_living_wb/
Disclaimer: The accuracy and completeness of the data provided in this dataset are subject to the source from which it was obtained. Users are advised to cross-reference this data with authoritative sources and exercise discretion when making decisions based on it. The dataset creator and Kaggle assume no responsibility for any actions taken based on the information provided herein.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
All cities with a population > 1000 or seats of adm div (ca 80.000)Sources and ContributionsSources : GeoNames is aggregating over hundred different data sources. Ambassadors : GeoNames Ambassadors help in many countries. Wiki : A wiki allows to view the data and quickly fix error and add missing places. Donations and Sponsoring : Costs for running GeoNames are covered by donations and sponsoring.Enrichment:add country name
The Global Data Regulation Diagnostic provides a comprehensive assessment of the quality of the data governance environment. Diagnostic results show that countries have put in greater effort in adopting enabler regulatory practices than in safeguard regulatory practices. However, for public intent data, enablers for private intent data, safeguards for personal and nonpersonal data, cybersecurity and cybercrime, as well as cross-border data flows. Across all these dimensions, no income group demonstrates advanced regulatory frameworks across all dimensions, indicating significant room for the regulatory development of both enablers and safeguards remains at an intermediate stage: 47 percent of enabler good practices and 41 percent of good safeguard practices are adopted across countries. Under the enabler and safeguard pillars, the diagnostic covers dimensions of e-commerce/e-transactions, enablers further improvement on data governance environment.
The Global Data Regulation Diagnostic is the first comprehensive assessment of laws and regulations on data governance. It covers enabler and safeguard regulatory practices in 80 countries providing indicators to assess and compare their performance. This Global Data Regulation Diagnostic develops objective and standardized indicators to measure the regulatory environment for the data economy across countries. The indicators aim to serve as a diagnostic tool so countries can assess and compare their performance vis-á-vis other countries. Understanding the gap with global regulatory good practices is a necessary first step for governments when identifying and prioritizing reforms.
80 countries
Country
Observation data/ratings [obs]
The diagnostic is based on a detailed assessment of domestic laws, regulations, and administrative requirements in 80 countries selected to ensure a balanced coverage across income groups, regions, and different levels of digital technology development. Data are further verified through a detailed desk research of legal texts, reflecting the regulatory status of each country as of June 1, 2020.
Mail Questionnaire [mail]
The questionnaire comprises 37 questions designed to determine if a country has adopted good regulatory practice on data governance. The responses are then scored and assigned a normative interpretation. Related questions fall into seven clusters so that when the scores are averaged, each cluster provides an overall sense of how it performs in its corresponding regulatory and legal dimensions. These seven dimensions are: (1) E-commerce/e-transaction; (2) Enablers for public intent data; (3) Enablers for private intent data; (4) Safeguards for personal data; (5) Safeguards for nonpersonal data; (6) Cybersecurity and cybercrime; (7) Cross-border data transfers.
100%
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Columns | Description |
---|---|
Rank | Country Ranks from 1996 - 2021 |
Country | Country's Names |
Regions | Country's Regions |
Documents | Published Documents |
Citable Documents | Citable Documents Include: Articles, Reviews & Concerned Papers |
Citations | Whole period citations to documents published from 1996 to 2021 |
Self-Citations | Whole period Country Self Citations to documents published from 1996 to 2021 |
Citations per Documents | Average Citations to documents published from 1996 to 2021 |
H Index | Country's No. of Articles(h) that have received atleast h citations |
More - Find More Exciting🙀 Datasets Here - An Upvote👍 A Dayᕙ(`▿´)ᕗ , Keeps Aman Hurray Hurray..... ٩(˘◡˘)۶Hehe
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
The World Bank is an international financial institution that provides loans to countries of the world for capital projects. The World Bank's stated goal is the reduction of poverty. Source: https://en.wikipedia.org/wiki/World_Bank
This dataset contains both national and regional debt statistics captured by over 200 economic indicators. Time series data is available for those indicators from 1970 to 2015 for reporting countries.
For more information, see the World Bank website.
Fork this kernel to get started with this dataset.
https://bigquery.cloud.google.com/dataset/bigquery-public-data:world_bank_intl_debt
https://cloud.google.com/bigquery/public-data/world-bank-international-debt
Citation: The World Bank: International Debt Statistics
Dataset Source: World Bank. This dataset is publicly available for anyone to use under the following terms provided by the Dataset Source - http://www.data.gov/privacy-policy#data_policy - and is provided "AS IS" without any warranty, express or implied, from Google. Google disclaims all liability for any damages, direct or indirect, resulting from the use of the dataset.
Banner Photo by @till_indeman from Unplash.
What countries have the largest outstanding debt?
https://cloud.google.com/bigquery/images/outstanding-debt.png" alt="enter image description here">
https://cloud.google.com/bigquery/images/outstanding-debt.png
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
This data set has been generated using data from the Gapminder website, which focuses on gathering and sharing statistics and other information about social, economic and environmental development at local, national and global levels.
This particular data set describes the values of several parameters (see the list below) between 1998 and 2018 for a total of 175 countries, having a total of 3675 rows. The parameters included in the data set and the column name of the dataframe are as follows:
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This article introduces the most comprehensive dataset on de jure central bank independence (CBI), including yearly data from 182 countries between 1970 and 2012. The dataset identifies statutory reforms affecting CBI, their direction, and the attributes necessary to build the Cukierman, Webb and Neyapty index. Previous datasets focused on developed countries, and included non-representative samples of developing countries. This dataset’s substantially broader coverage has important implications. First, it challenges the conventional wisdom about central bank reforms in the world, revealing CBI increases and restrictions in decades and regions previously considered barely affected by reforms. Second, the inclusion of almost 100 countries usually overlooked in previous studies suggests that the sample selection may have substantially affected results. Simple analyses show that the associations between CBI and inflation, unemployment or growth are very sensitive to sample selection. Finally, the dataset identifies numerous CBI decreases (restrictions), whereas previous datasets mostly look at CBI increases. These data’s coverage not only allows researchers to test competing explanations of the determinants and effects of CBI in a global sample, but it also provides a useful instrument for cross-national studies in diverse fields, such as liberalization, diffusion, political institutions, democratization, or responses to financial crises.
To create the dataset, the top 10 countries leading in the incidence of COVID-19 in the world were selected as of October 22, 2020 (on the eve of the second full of pandemics), which are presented in the Global 500 ranking for 2020: USA, India, Brazil, Russia, Spain, France and Mexico. For each of these countries, no more than 10 of the largest transnational corporations included in the Global 500 rating for 2020 and 2019 were selected separately. The arithmetic averages were calculated and the change (increase) in indicators such as profitability and profitability of enterprises, their ranking position (competitiveness), asset value and number of employees. The arithmetic mean values of these indicators for all countries of the sample were found, characterizing the situation in international entrepreneurship as a whole in the context of the COVID-19 crisis in 2020 on the eve of the second wave of the pandemic. The data is collected in a general Microsoft Excel table. Dataset is a unique database that combines COVID-19 statistics and entrepreneurship statistics. The dataset is flexible data that can be supplemented with data from other countries and newer statistics on the COVID-19 pandemic. Due to the fact that the data in the dataset are not ready-made numbers, but formulas, when adding and / or changing the values in the original table at the beginning of the dataset, most of the subsequent tables will be automatically recalculated and the graphs will be updated. This allows the dataset to be used not just as an array of data, but as an analytical tool for automating scientific research on the impact of the COVID-19 pandemic and crisis on international entrepreneurship. The dataset includes not only tabular data, but also charts that provide data visualization. The dataset contains not only actual, but also forecast data on morbidity and mortality from COVID-19 for the period of the second wave of the pandemic in 2020. The forecasts are presented in the form of a normal distribution of predicted values and the probability of their occurrence in practice. This allows for a broad scenario analysis of the impact of the COVID-19 pandemic and crisis on international entrepreneurship, substituting various predicted morbidity and mortality rates in risk assessment tables and obtaining automatically calculated consequences (changes) on the characteristics of international entrepreneurship. It is also possible to substitute the actual values identified in the process and following the results of the second wave of the pandemic to check the reliability of pre-made forecasts and conduct a plan-fact analysis. The dataset contains not only the numerical values of the initial and predicted values of the set of studied indicators, but also their qualitative interpretation, reflecting the presence and level of risks of a pandemic and COVID-19 crisis for international entrepreneurship.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The data is from:
https://simplemaps.com/data/world-cities
We're proud to offer a simple, accurate and up-to-date database of the world's cities and towns. We've built it from the ground up using authoritative sources such as the NGIA, US Geological Survey, US Census Bureau, and NASA.
Our database is:
By Harish Kumar Garg [source]
This dataset is about the number of Indian students studying abroad in different countries and the detailed information about different nations where Indian students are present. The data has been complied from the Ministry Of External Affairs to answer a question from the Member of Parliament regarding how many students from India are studying in foreign countries and which country. This dataset includes two fields, Country Name and Number of Indians Studying Abroad as of Mar 2017, giving a unique opportunity to track student mobility across various nations around the world. With this valuable data about student mobility, we can gain insights into how educational opportunities for Indian students have increased over time as well as look at trends in international education throughout different regions. From comparison among countries with similar academic opportunities to tracking regional popularity among study destinations, this dataset provides important context for studying student migration patterns. We invite everyone to explore this data further and use it to draw meaningful conclusions!
For more datasets, click here.
- 🚨 Your notebook can be here! 🚨!
How to use this dataset?
The data has two columns – Country Name and Number of Indians studying there as of March 2017. It also includes a third column, Percentage, which gives an indication about the proportion of Indian students enrolled in each country relative to total number enrolled abroad globally.
To get started with your exploration, you can visualize the data against various parameters like geographical region or language speaking as it may provide more clarity about motives/reasons behind student’s choice. You can also group countries on basis of research opportunities available, cost consideration etc.,to understand deeper into all aspects that motivate Indians to explore further studies outside India.
Additionally you can use this dataset for benchmarking purpose with other regional / international peer groups or aggregate regional / global reports with aim towards making better decisions or policies aiming greater outreach & support while targeting foreign universities/colleges for educational promotion activities that highlights engaging elements aimed at attracting more potential students from India aspiring higher international education experience abroad!
- Using this dataset, educational institutions in India can set up international exchange programs with universities in other countries to facilitate and support Indian students studying abroad.
Higher Education Institutions can also understand the current trend of Indian students sourcing for opportunities to study abroad and use this data to build specialized short-term courses in collaboration with universities from different countries that cater to the needs of students who are interested in moving abroad permanently or even temporarily for higher studies.
Policy makers could use this data to assess the current trends and develop policies that aim at incentivizing international exposure among young professionals by commissioning fellowships or scholarships with an aim of exposing them to different problem sets around the world thereby making their profile more attractive while they look for better job opportunities globally
If you use this dataset in your research, please credit the original authors. Data Source
Unknown License - Please check the dataset description for more information.
File: final_data.csv | Column name | Description | |:--------------------------|:-------------------------------------------------------------------------------------------------------------------------------| | Country | Name of the country where Indian students are studying. (String) | | No of Indian Students | Number of Indian students studying in the country. (Integer) | | Percentage | Percentage of Indian students studying in the country compared to the total number of Indian students studying abroad. (Float) |
If you use this dataset in your research, please credit ...
Round 1 of the Afrobarometer survey was conducted from July 1999 through June 2001 in 12 African countries, to solicit public opinion on democracy, governance, markets, and national identity. The full 12 country dataset released was pieced together out of different projects, Round 1 of the Afrobarometer survey,the old Southern African Democracy Barometer, and similar surveys done in West and East Africa.
The 7 country dataset is a subset of the Round 1 survey dataset, and consists of a combined dataset for the 7 Southern African countries surveyed with other African countries in Round 1, 1999-2000 (Botswana, Lesotho, Malawi, Namibia, South Africa, Zambia and Zimbabwe). It is a useful dataset because, in contrast to the full 12 country Round 1 dataset, all countries in this dataset were surveyed with the identical questionnaire
Botswana Lesotho Malawi Namibia South Africa Zambia Zimbabwe
Basic units of analysis that the study investigates include: individuals and groups
Sample survey data [ssd]
A new sample has to be drawn for each round of Afrobarometer surveys. Whereas the standard sample size for Round 3 surveys will be 1200 cases, a larger sample size will be required in societies that are extremely heterogeneous (such as South Africa and Nigeria), where the sample size will be increased to 2400. Other adaptations may be necessary within some countries to account for the varying quality of the census data or the availability of census maps.
The sample is designed as a representative cross-section of all citizens of voting age in a given country. The goal is to give every adult citizen an equal and known chance of selection for interview. We strive to reach this objective by (a) strictly applying random selection methods at every stage of sampling and by (b) applying sampling with probability proportionate to population size wherever possible. A randomly selected sample of 1200 cases allows inferences to national adult populations with a margin of sampling error of no more than plus or minus 2.5 percent with a confidence level of 95 percent. If the sample size is increased to 2400, the confidence interval shrinks to plus or minus 2 percent.
Sample Universe
The sample universe for Afrobarometer surveys includes all citizens of voting age within the country. In other words, we exclude anyone who is not a citizen and anyone who has not attained this age (usually 18 years) on the day of the survey. Also excluded are areas determined to be either inaccessible or not relevant to the study, such as those experiencing armed conflict or natural disasters, as well as national parks and game reserves. As a matter of practice, we have also excluded people living in institutionalized settings, such as students in dormitories and persons in prisons or nursing homes.
What to do about areas experiencing political unrest? On the one hand we want to include them because they are politically important. On the other hand, we want to avoid stretching out the fieldwork over many months while we wait for the situation to settle down. It was agreed at the 2002 Cape Town Planning Workshop that it is difficult to come up with a general rule that will fit all imaginable circumstances. We will therefore make judgments on a case-by-case basis on whether or not to proceed with fieldwork or to exclude or substitute areas of conflict. National Partners are requested to consult Core Partners on any major delays, exclusions or substitutions of this sort.
Sample Design
The sample design is a clustered, stratified, multi-stage, area probability sample.
To repeat the main sampling principle, the objective of the design is to give every sample element (i.e. adult citizen) an equal and known chance of being chosen for inclusion in the sample. We strive to reach this objective by (a) strictly applying random selection methods at every stage of sampling and by (b) applying sampling with probability proportionate to population size wherever possible.
In a series of stages, geographically defined sampling units of decreasing size are selected. To ensure that the sample is representative, the probability of selection at various stages is adjusted as follows:
The sample is stratified by key social characteristics in the population such as sub-national area (e.g. region/province) and residential locality (urban or rural). The area stratification reduces the likelihood that distinctive ethnic or language groups are left out of the sample. And the urban/rural stratification is a means to make sure that these localities are represented in their correct proportions. Wherever possible, and always in the first stage of sampling, random sampling is conducted with probability proportionate to population size (PPPS). The purpose is to guarantee that larger (i.e., more populated) geographical units have a proportionally greater probability of being chosen into the sample. The sampling design has four stages
A first-stage to stratify and randomly select primary sampling units;
A second-stage to randomly select sampling start-points;
A third stage to randomly choose households;
A final-stage involving the random selection of individual respondents
We shall deal with each of these stages in turn.
STAGE ONE: Selection of Primary Sampling Units (PSUs)
The primary sampling units (PSU's) are the smallest, well-defined geographic units for which reliable population data are available. In most countries, these will be Census Enumeration Areas (or EAs). Most national census data and maps are broken down to the EA level. In the text that follows we will use the acronyms PSU and EA interchangeably because, when census data are employed, they refer to the same unit.
We strongly recommend that NIs use official national census data as the sampling frame for Afrobarometer surveys. Where recent or reliable census data are not available, NIs are asked to inform the relevant Core Partner before they substitute any other demographic data. Where the census is out of date, NIs should consult a demographer to obtain the best possible estimates of population growth rates. These should be applied to the outdated census data in order to make projections of population figures for the year of the survey. It is important to bear in mind that population growth rates vary by area (region) and (especially) between rural and urban localities. Therefore, any projected census data should include adjustments to take such variations into account.
Indeed, we urge NIs to establish collegial working relationships within professionals in the national census bureau, not only to obtain the most recent census data, projections, and maps, but to gain access to sampling expertise. NIs may even commission a census statistician to draw the sample to Afrobarometer specifications, provided that provision for this service has been made in the survey budget.
Regardless of who draws the sample, the NIs should thoroughly acquaint themselves with the strengths and weaknesses of the available census data and the availability and quality of EA maps. The country and methodology reports should cite the exact census data used, its known shortcomings, if any, and any projections made from the data. At minimum, the NI must know the size of the population and the urban/rural population divide in each region in order to specify how to distribute population and PSU's in the first stage of sampling. National investigators should obtain this written data before they attempt to stratify the sample.
Once this data is obtained, the sample population (either 1200 or 2400) should be stratified, first by area (region/province) and then by residential locality (urban or rural). In each case, the proportion of the sample in each locality in each region should be the same as its proportion in the national population as indicated by the updated census figures.
Having stratified the sample, it is then possible to determine how many PSU's should be selected for the country as a whole, for each region, and for each urban or rural locality.
The total number of PSU's to be selected for the whole country is determined by calculating the maximum degree of clustering of interviews one can accept in any PSU. Because PSUs (which are usually geographically small EAs) tend to be socially homogenous we do not want to select too many people in any one place. Thus, the Afrobarometer has established a standard of no more than 8 interviews per PSU. For a sample size of 1200, the sample must therefore contain 150 PSUs/EAs (1200 divided by 8). For a sample size of 2400, there must be 300 PSUs/EAs.
These PSUs should then be allocated proportionally to the urban and rural localities within each regional stratum of the sample. Let's take a couple of examples from a country with a sample size of 1200. If the urban locality of Region X in this country constitutes 10 percent of the current national population, then the sample for this stratum should be 15 PSUs (calculated as 10 percent of 150 PSUs). If the rural population of Region Y constitutes 4 percent of the current national population, then the sample for this stratum should be 6 PSU's.
The next step is to select particular PSUs/EAs using random methods. Using the above example of the rural localities in Region Y, let us say that you need to pick 6 sample EAs out of a census list that contains a total of 240 rural EAs in Region Y. But which 6? If the EAs created by the national census bureau are of equal or roughly equal population size, then selection is relatively straightforward. Just number all EAs consecutively, then make six selections using a table of random numbers. This procedure, known as simple random sampling (SRS), will
https://www.gnu.org/licenses/gpl-3.0.htmlhttps://www.gnu.org/licenses/gpl-3.0.html
This dataset comprises 204 entries and 38 attributes, providing a comprehensive analysis of key economic and social indicators across various countries. It includes a diverse range of metrics, allowing for in-depth exploration of global trends related to GDP, education, health, and environmental factors.
Key Features:
Applications and Uses:
Research and Analysis: Ideal for researchers studying the correlation between economic performance and social indicators. This dataset can help identify trends and patterns relevant to global development.
Policy Development: Policymakers can utilize this data to inform decisions on education, healthcare, and environmental policies, aiming to improve national outcomes.
Machine Learning and Data Science: Data scientists can apply machine learning techniques to predict economic trends, analyze social impacts, or classify countries based on various indicators.
Educational Purposes: Suitable for students and educators in fields like economics, sociology, and environmental science for practical data analysis exercises.
Visualization Projects: Perfect for creating compelling visualizations that illustrate relationships between different metrics, aiding in public understanding and engagement.
By leveraging this dataset, users can uncover insights into how different factors influence a country's development, making it a valuable resource for diverse applications across various fields.
Ten Most Populous Countries, 2020 to 2050
https://pacific-data.sprep.org/dataset/data-portal-license-agreements/resource/de2a56f5-a565-481a-8589-406dc40b5588https://pacific-data.sprep.org/dataset/data-portal-license-agreements/resource/de2a56f5-a565-481a-8589-406dc40b5588
A recently published paper, titled “Coastal proximity of populations in 22 Pacific Island Countries and Territories” details the methodology used to undertake the analysis and presents the findings. Purpose * This analysis aims to estimate populations settled in coastal areas in 22 Pacific Island Countries and Territories (PICTS) using the data currently available. In addition to the coastal population estimates, the study compares the results obtained from the use of national population datasets (census) with those derived from the use of global population grids. * Accuracy and reliability from national and global datasets derived results have been evaluated to identify the most suitable options to estimate size and location of coastal populations in the region. A collaborative project between the Pacific Community (SPC), WorldFish and the University of Wollongong has produced the first detailed population estimates of people living close to the coast in the 22 Pacific Island Countries and Territories (PICTs).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset provides values for SOCIAL SECURITY RATE reported in several countries. The data includes current values, previous releases, historical highs and record lows, release frequency, reported unit and currency.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The OSDG Community Dataset (OSDG-CD) is a public dataset of thousands of text excerpts, which were validated by over 1,400 OSDG Community Platform (OSDG-CP) citizen scientists from over 140 countries, with respect to the Sustainable Development Goals (SDGs).
Dataset Information
In support of the global effort to achieve the Sustainable Development Goals (SDGs), OSDG is realising a series of SDG-labelled text datasets. The OSDG Community Dataset (OSDG-CD) is the direct result of the work of more than 1,400 volunteers from over 130 countries who have contributed to our understanding of SDGs via the OSDG Community Platform (OSDG-CP). The dataset contains tens of thousands of text excerpts (henceforth: texts) which were validated by the Community volunteers with respect to SDGs. The data can be used to derive insights into the nature of SDGs using either ontology-based or machine learning approaches.
📘 The file contains 43,0210 (+390) text excerpts and a total of 310,328 (+3,733) assigned labels.
To learn more about the project, please visit the OSDG website and the official GitHub page. Explore a detailed overview of the OSDG methodology in our recent paper "OSDG 2.0: a multilingual tool for classifying text data by UN Sustainable Development Goals (SDGs)".
Source Data
The dataset consists of paragraph-length text excerpts derived from publicly available documents, including reports, policy documents and publication abstracts. A significant number of documents (more than 3,000) originate from UN-related sources such as SDG-Pathfinder and SDG Library. These sources often contain documents that already have SDG labels associated with them. Each text is comprised of 3 to 6 sentences and is about 90 words on average.
Methodology
All the texts are evaluated by volunteers on the OSDG-CP. The platform is an ambitious attempt to bring together researchers, subject-matter experts and SDG advocates from all around the world to create a large and accurate source of textual information on the SDGs. The Community volunteers use the platform to participate in labelling exercises where they validate each text's relevance to SDGs based on their background knowledge.
In each exercise, the volunteer is shown a text together with an SDG label associated with it – this usually comes from the source – and asked to either accept or reject the suggested label.
There are 3 types of exercises:
All volunteers start with the mandatory introductory exercise that consists of 10 pre-selected texts. Each volunteer must complete this exercise before they can access 2 other exercise types. Upon completion, the volunteer reviews the exercise by comparing their answers with the answers of the rest of the Community using aggregated statistics we provide, i.e., the share of those who accepted and rejected the suggested SDG label for each of the 10 texts. This helps the volunteer to get a feel for the platform.
SDG-specific exercises where the volunteer validates texts with respect to a single SDG, e.g., SDG 1 No Poverty.
All SDGs exercise where the volunteer validates a random sequence of texts where each text can have any SDG as its associated label.
After finishing the introductory exercise, the volunteer is free to select either SDG-specific or All SDGs exercises. Each exercise, regardless of its type, consists of 100 texts. Once the exercise is finished, the volunteer can either label more texts or exit the platform. Of course, the volunteer can finish the exercise early. All progress is saved and recorded still.
To ensure quality, each text is validated by up to 9 different volunteers and all texts included in the public release of the data have been validated by at least 3 different volunteers.
It is worth keeping in mind that all exercises present the volunteers with a binary decision problem, i.e., either accept or reject a suggested label. The volunteers are never asked to select one or more SDGs that a certain text might relate to. The rationale behind this set-up is that asking a volunteer to select from 17 SDGs is extremely inefficient. Currently, all texts are validated against only one associated SDG label.
Column Description
doi - Digital Object Identifier of the original document
text_id - unique text identifier
text - text excerpt from the document
sdg - the SDG the text is validated against
labels_negative - the number of volunteers who rejected the suggested SDG label
labels_positive - the number of volunteers who accepted the suggested SDG label
agreement - agreement score based on the formula (agreement = \frac{|labels_{positive} - labels_{negative}|}{labels_{positive} + labels_{negative}})
Further Information
Do not hesitate to share with us your outputs, be it a research paper, a machine learning model, a blog post, or just an interesting observation. All queries can be directed to community@osdg.ai.
The United Nations Energy Statistics Database (UNSTAT) is a comprehensive collection of international energy and demographic statistics prepared by the United Nations Statistics Division. The 2004 version represents the latest in the series of annual compilations which commenced under the title World Energy Supplies in Selected Years, 1929-1950. Supplementary series of monthly and quarterly data on production of energy may be found in the Monthly Bulletin of Statistics. The database contains comprehensive energy statistics for more than 215 countries or areas for production, trade and intermediate and final consumption (end-use) for primary and secondary conventional, non-conventional and new and renewable sources of energy. Mid-year population estimates are included to enable the computation of per capita data. Annual questionnaires sent to national statistical offices serve as the primary source of information. Supplementary data are also compiled from national, regional and international statistical publications. The Statistics Division prepares estimates where official data are incomplete or inconsistent. The database is updated on a continuous basis as new information and revisions are received. This metadata file represents the population statistics during the expressed time. For more information about the country site codes, click this link to the United Nations "Standard country or area codes for statistical use": https://unstats.un.org/unsd/methodology/m49/overview/
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This is a Dataset of the World Population Consisting of Each and Every Country. I have attempted to analyze the same data to bring some insights out of it. The dataset consists of 234 rows and 17 columns. I will analyze the same data and bring the below pieces of information regarding the same.