Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
In demographics, the world population is the total number of humans currently living, and was estimated to have reached 7,800,000,000 people as of March 2020. It took over 2 million years of human history for the world's population to reach 1 billion, and only 200 years more to reach 7 billion. The world population has experienced continuous growth following the Great Famine of 1315–1317 and the end of the Black Death in 1350, when it was near 370 million. The highest global population growth rates, with increases of over 1.8% per year, occurred between 1955 and 1975 – peaking to 2.1% between 1965 and 1970.[7] The growth rate declined to 1.2% between 2010 and 2015 and is projected to decline further in the course of the 21st century. However, the global population is still increasing[8] and is projected to reach about 10 billion in 2050 and more than 11 billion in 2100.
Annual population growth rate for year t is the exponential rate of growth of midyear population from year t-1 to t, expressed as a percentage . Population is based on the de facto definition of population, which counts all residents regardless of legal status or citizenship. Annual population growth rate. Population is based on the de facto definition of population, which counts all residents regardless of legal status or citizenship.
Total population growth rates are calculated on the assumption that rate of growth is constant between two points in time. The growth rate is computed using the exponential growth formula: r = ln(pn/p0)/n, where r is the exponential rate of growth, ln() is the natural logarithm, pn is the end period population, p0 is the beginning period population, and n is the number of years in between. Note that this is not the geometric growth rate used to compute compound growth over discrete periods. For information on total population from which the growth rates are calculated, see total population (SP.POP.TOTL).
Derived from total population. Population source: ( 1 ) United Nations Population Division. World Population Prospects: 2019 Revision, ( 2 ) Census reports and other statistical publications from national statistical offices, ( 3 ) Eurostat: Demographic Statistics, ( 4 ) United Nations Statistical Division. Population and Vital Statistics Reprot ( various years ), ( 5 ) U.S. Census Bureau: International Database, and ( 6 ) Secretariat of the Pacific Community: Statistics and Demography Programme.
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
The GDELT Project is the largest, most comprehensive, and highest resolution open database of human society ever created. Just the 2015 data alone records nearly three quarters of a trillion emotional snapshots and more than 1.5 billion location references, while its total archives span more than 215 years, making it one of the largest open-access spatio-temporal datasets in existance and pushing the boundaries of "big data" study of global human society. Its Global Knowledge Graph connects the world's people, organizations, locations, themes, counts, images and emotions into a single holistic network over the entire planet. How can you query, explore, model, visualize, interact, and even forecast this vast archive of human society?
GDELT 2.0 has a wealth of features in the event database which includes events reported in articles published in 65 live translated languages, measurements of 2,300 emotions and themes, high resolution views of the non-Western world, relevant imagery, videos, and social media embeds, quotes, names, amounts, and more.
You may find these code books helpful:
GDELT Global Knowledge Graph Codebook V2.1 (PDF)
GDELT Event Codebook V2.0 (PDF)
You can use the BigQuery Python client library to query tables in this dataset in Kernels. Note that methods available in Kernels are limited to querying data. Tables are at bigquery-public-data.github_repos.[TABLENAME]
. [Fork this kernel to get started][98] to learn how to safely manage analyzing large BigQuery datasets.
You may redistribute, rehost, republish, and mirror any of the GDELT datasets in any form. However, any use or redistribution of the data must include a citation to the GDELT Project and a link to the website (https://www.gdeltproject.org/).
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Welcome to the Human Capital Project (HCP), a global effort to accelerate more and better investments in people for greater equity and economic growth. In a post-COVID-19 pandemic world, it's even more important to understand why countries should invest in human capital (HC) and protect hard-won gains from being eroded. Find out why the World Bank, countries, and partners are coming together to close the massive HC gap in the world. Check out the Human Capital Network Fact Sheet, updated 2020 Human Capital Index, our videos, visualizations, frequently-asked-questions, and more. You too can help those around the world to #InvestinPeople.
The World Bank Group announced the Human Capital Project in 2017, with the launch of the Human Capital Index in October 2018, at the Annual Meetings held in Bali, Indonesia. As of November 2023, 92 economies have joined the Human Capital Project, which Gabriel Demombynes manages. Financed by the Bill & Melinda Gates Foundation and the Canadian government, the Human Capital Umbrella Program allows donors and partners to support human capital outcomes across the life cycle through a ‘whole-of-government’ approach. The program has informed 19 World Bank operations totaling $4.6 billion. The Human Capital Project has three pillars: Measurement and Indicators, Analytics and Operations, and Convening and Global Thought Leadership. The Human Capital Index (HCI) quantifies the contribution of health and education to the productivity of the next generation of workers. Ranging between 0 and 1, the index takes the value 1 only if a child born today can expect to achieve full health (defined as no stunting and survival
The total amount of data created, captured, copied, and consumed globally is forecast to increase rapidly, reaching *** zettabytes in 2024. Over the next five years up to 2028, global data creation is projected to grow to more than *** zettabytes. In 2020, the amount of data created and replicated reached a new high. The growth was higher than previously expected, caused by the increased demand due to the COVID-19 pandemic, as more people worked and learned from home and used home entertainment options more often. Storage capacity also growing Only a small percentage of this newly created data is kept though, as just * percent of the data produced and consumed in 2020 was saved and retained into 2021. In line with the strong growth of the data volume, the installed base of storage capacity is forecast to increase, growing at a compound annual growth rate of **** percent over the forecast period from 2020 to 2025. In 2020, the installed base of storage capacity reached *** zettabytes.
Semantic subject indexing is the process of annotating documents with terms that describe what the document is about. This is often used in digital libraries to increase the findability of the documents. Annotations are usually created by human experts from the domain, who select appropriate terms from a pre-specified set of available labels. In order to keep up with the vast amount of new publications, (semi-)automatic tools are developed that assist the experts by suggesting them terms for annotation. Unfortunately, due to legal restrictions these tools often cannot use the full-text nor the abstract of the publication. Therefore, it is desirable to explore techniques that work with the publications' metadata only. To some extent, it is already possible to achieve competitive performance to the full-text by merely using titles. Yet, the performance of automatic subject indexing methods is still far from the level of human annotators. Semantic subject indexing can be framed as a multi-label classification problem, where the entry (i,j) of an indicator matrix is set to one if the label has been assigned to a document, or it is set to zero otherwise. A major challenge is that the label space is usually very large (up to almost 30,000), that the labels follow a power-law, and are subject to concept drift(cmp. Toepfer and Seifert).
Here, we provide two large-scale datasets from the domain of economics and business studies (EconBiz) and biomedicine (PubMed) used in our recent study, which each come with the title and respective annotated labels. Do you find valuable insights in the data that can help understand the problem of semantic subject indexing better? Can you come up with clever ideas that push the state-of-the-art in automatic semantic subject indexing? We are excited to see what the collective power of data scientists can achieve on this task!
We compiled two English datasets from two digital libraries, EconBiz and PubMed.
EconBiz
The EconBiz dataset was compiled from a meta-data export provided by ZBW - Leibniz Information Centre for Economics from July 2017. We only retained those publications that were flagged as being in English and that were annotated with STW labels. Afterwards, we removed duplicates by checking for same title and labels. In total, approximately 1,064k publications remain. The annotations were selected by human annotators from the Standard Thesaurus Wirtschaft (STW), which contains approximately 6,000 labels.
PubMed
The PubMed dataset was compiled from the training set of the 5th BioASQ challenge on large-scale semantic subject indexing of biomedical articles, which were all in English. Again, we removed duplicates by checking for same title and labels. In total, approximately 12.8 million publications remain. The labels are so called MeSH terms. In our data, approximately 28k of them are used.
Fields Both datasets share the same set of fields:
We would like to thank ZBW - Information Centre for Economics for providing the EconBiz dataset, and in particular Tamara Pianos and Tobias Rebholz.
We would also like to thank the team from the BioASQ challenge, from where we compiled the PubMed dataset. This organization is dedicated to advancing the state-of-the-art in large-scale semantic indexing. It is currently running the 6th iteration of their challenge, which you should definitely check out!
The PubMed dataset has been gathered by BioASQ following the terms from the U.S. National Library of Medicine regarding public use and redistribution of the data.
a non-redundant high-quality catalog of 2.9 million genes 392 Metagenomic Species Pangenomes (MSPs) This dataset can be used to analyze shotgun sequencing data of the human skin microbiota. How to use this dataset Create a gene abundance table by aligning reads from each sample against the catalog. For this purpose, you can use Meteor or NGLess. Then, normalize raw counts by gene length. Taxonomic profiling: the abundance of each species can be estimated as the average abundance of its 100 first core genes. To reduce the false positive rate, only consider that a species is present if at least 10/100 marker genes are detected. Methods Data sources This dataset was built using the following data sources: 118 isolate-derived genomes from the HMRGD 246 isolate-derived genomes from the Skin Microbial Genome Collection (SMGC) 1,407 skin metagenome assemblies from the Skin Microbial Genome Collection (SMGC) Non-redundant gene catalog After filtering out short contigs (<1500 bp), genes were predicted with Prodigal on genomes (mode: single) and metagenome assemblies (mode: meta). Complete genes (partial=00) were pooled and clustered with cd-hit-est (parameters -c 0.95 -aS 0.90 -G 0 -d 0 -M 0 -T 0) by choosing those from the longest contigs as representatives. Functional annotation KOs assignments were obtained with KofamScan using the KEGG 107 database. MSPs recovery Reads from the 1,120 skin metagenomes available in the bioproject PRJNA46333 were aligned against the non-redundant gene catalog with the Meteor software suite to produce a raw gene abundance table (2.9M genes quantified in 1,120samples). Then, co-abundant genes were binned in 392 Metagenomic Species Pan-genomes (MSPs, i.e. clusters of co-abundant genes that likely belong to the same microbial species) using MSPminer. MSPs taxonomic annotation Taxonomic annotation was performed by alignment of all core and accessory genes against representative genomes of the GTDB database (release r214) using blastn (version 2.7.1, task = megablast, word_size = 16). A species-level assignment was given if > 50% of the genes matched the representative genome of a given species, with a mean nucleotide identity ≥ 95% and mean gene length coverage ≥ 90%. The remaining MSPs were assigned to a higher taxonomic level (genus to superkingdom), if more than 50% of their genes had the same annotation. Construction of the phylogenetic tree 39 universal phylogenetic markers genes were extracted from the MSPs (or the corresponding genome if available) with fetchMGs. Then, the markers were separately aligned with MUSCLE. The 40 alignments were merged and trimmed with trimAl (parameters: -automated1). Finally, the phylogenetic tree was computed with FastTreeMP (parameters: -gamma -pseudo -spr -mlacc 3 -slownni).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
If you use the dataset, cite the paper: https://doi.org/10.1016/j.eswa.2022.117541
The most comprehensive dataset to date regarding climate change and human opinions via Twitter. It has the heftiest temporal coverage, spanning over 13 years, includes over 15 million tweets spatially distributed across the world, and provides the geolocation of most tweets. Seven dimensions of information are tied to each tweet, namely geolocation, user gender, climate change stance and sentiment, aggressiveness, deviations from historic temperature, and topic modeling, while accompanied by environmental disaster events information. These dimensions were produced by testing and evaluating a plethora of state-of-the-art machine learning algorithms and methods, both supervised and unsupervised, including BERT, RNN, LSTM, CNN, SVM, Naive Bayes, VADER, Textblob, Flair, and LDA.
The following columns are in the dataset:
➡ created_at: The timestamp of the tweet. ➡ id: The unique id of the tweet. ➡ lng: The longitude the tweet was written. ➡ lat: The latitude the tweet was written. ➡ topic: Categorization of the tweet in one of ten topics namely, seriousness of gas emissions, importance of human intervention, global stance, significance of pollution awareness events, weather extremes, impact of resource overconsumption, Donald Trump versus science, ideological positions on global warming, politics, and undefined. ➡ sentiment: A score on a continuous scale. This scale ranges from -1 to 1 with values closer to 1 being translated to positive sentiment, values closer to -1 representing a negative sentiment while values close to 0 depicting no sentiment or being neutral. ➡ stance: That is if the tweet supports the belief of man-made climate change (believer), if the tweet does not believe in man-made climate change (denier), and if the tweet neither supports nor refuses the belief of man-made climate change (neutral). ➡ gender: Whether the user that made the tweet is male, female, or undefined. ➡ temperature_avg: The temperature deviation in Celsius and relative to the January 1951-December 1980 average at the time and place the tweet was written. ➡ aggressiveness: That is if the tweet contains aggressive language or not.
Since Twitter forbids making public the text of the tweets, in order to retrieve it you need to do a process called hydrating. Tools such as Twarc or Hydrator can be used to hydrate tweets.
Success.ai’s B2B Contact Data for Human Resources Professionals Worldwide empowers businesses to connect with HR leaders across the globe. With access to over 170 million verified professional profiles, this dataset includes critical contact information for key HR decision-makers in various industries. Whether you’re targeting HR directors, talent acquisition specialists, or employee relations managers, Success.ai ensures accurate and effective outreach.
Why Choose Success.ai’s HR Professionals Data?
Data accuracy is backed by AI validation to ensure 99% reliability.
Global Reach Across HR Functions:
Includes profiles of HR directors, recruiters, payroll specialists, and training managers.
Covers regions such as North America, Europe, Asia-Pacific, South America, and the Middle East.
Continuously Updated Datasets:
Real-time updates provide the latest information about HR professionals in decision-making roles.
Ethical and Compliant:
Adheres to GDPR, CCPA, and other global privacy regulations for ethical use of data.
Data Highlights: - 170M+ Verified Professional Profiles: Includes HR professionals from diverse industries. - 50M Work Emails: Verified and AI-validated for seamless communication. - 30M Company Profiles: Rich insights to support detailed targeting. - 700M Global Professional Profiles: Enriched data for broad business objectives.
Key Features of the Dataset:
Strategic Use Cases:
Build relationships with professionals managing recruitment, payroll, or employee engagement.
Corporate Training and Development:
Reach training managers to promote learning solutions, workshops, and skill-building programs.
Showcase personalized employee development initiatives.
Targeted Marketing Campaigns:
Design campaigns to promote HR-focused tools, resources, or consultancy services.
Leverage verified contact data for higher engagement and conversions.
HR Tech Solutions:
Present HR software, automation tools, or cloud solutions to relevant decision-makers.
Target professionals managing HR digital transformation.
Why Choose Success.ai?
APIs for Enhanced Functionality
Leverage B2B Contact Data for Human Resources Professionals Worldwide to connect with HR leaders and decision-makers in your target market. Success.ai offers verified work emails, phone numbers, and continuously updated profiles to ensure effective outreach and impactful communication.
With AI-validated accuracy and a Best Price Guarantee, Success.ai provides the ultimate solution for accessing and engaging global HR professionals. Contact us now to elevate your business strategy with precise and reliable data!
No one beats us on price. Period.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Existing methods to ascertain small sets of markers for the identification of human population structure require prior knowledge of individual ancestry. Based on Principal Components Analysis (PCA), and recent results in theoretical computer science, we present a novel algorithm that, applied on genomewide data, selects small subsets of SNPs (PCA-correlated SNPs) to reproduce the structure found by PCA on the complete dataset, without use of ancestry information. Evaluating our method on a previously described dataset (10,805 SNPs, 11 populations), we demonstrate that a very small set of PCA-correlated SNPs can be effectively employed to assign individuals to particular continents or populations, using a simple clustering algorithm. We validate our methods on the HapMap populations and achieve perfect intercontinental differentiation with 14 PCA-correlated SNPs. The Chinese and Japanese populations can be easily differentiated using less than 100 PCA-correlated SNPs ascertained after evaluating 1.7 million SNPs from HapMap. We show that, in general, structure informative SNPs are not portable across geographic regions. However, we manage to identify a general set of 50 PCA-correlated SNPs that effectively assigns individuals to one of nine different populations. Compared to analysis with the measure of informativeness, our methods, although unsupervised, achieved similar results. We proceed to demonstrate that our algorithm can be effectively used for the analysis of admixed populations without having to trace the origin of individuals. Analyzing a Puerto Rican dataset (192 individuals, 7,257 SNPs), we show that PCA-correlated SNPs can be used to successfully predict structure and ancestry proportions. We subsequently validate these SNPs for structure identification in an independent Puerto Rican dataset. The algorithm that we introduce runs in seconds and can be easily applied on large genome-wide datasets, facilitating the identification of population substructure, stratification assessment in multi-stage whole-genome association studies, and the study of demographic history in human populations.
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
In demographics, the world population is the total number of humans currently living, and was estimated to have reached 7,800,000,000 people as of March 2020. It took over 2 million years of human history for the world's population to reach 1 billion, and only 200 years more to reach 7 billion. The world population has experienced continuous growth following the Great Famine of 1315–1317 and the end of the Black Death in 1350, when it was near 370 million. The highest global population growth rates, with increases of over 1.8% per year, occurred between 1955 and 1975 – peaking to 2.1% between 1965 and 1970.[7] The growth rate declined to 1.2% between 2010 and 2015 and is projected to decline further in the course of the 21st century. However, the global population is still increasing[8] and is projected to reach about 10 billion in 2050 and more than 11 billion in 2100.
Annual population growth rate for year t is the exponential rate of growth of midyear population from year t-1 to t, expressed as a percentage . Population is based on the de facto definition of population, which counts all residents regardless of legal status or citizenship. Annual population growth rate. Population is based on the de facto definition of population, which counts all residents regardless of legal status or citizenship.
Total population growth rates are calculated on the assumption that rate of growth is constant between two points in time. The growth rate is computed using the exponential growth formula: r = ln(pn/p0)/n, where r is the exponential rate of growth, ln() is the natural logarithm, pn is the end period population, p0 is the beginning period population, and n is the number of years in between. Note that this is not the geometric growth rate used to compute compound growth over discrete periods. For information on total population from which the growth rates are calculated, see total population (SP.POP.TOTL).
Derived from total population. Population source: ( 1 ) United Nations Population Division. World Population Prospects: 2019 Revision, ( 2 ) Census reports and other statistical publications from national statistical offices, ( 3 ) Eurostat: Demographic Statistics, ( 4 ) United Nations Statistical Division. Population and Vital Statistics Reprot ( various years ), ( 5 ) U.S. Census Bureau: International Database, and ( 6 ) Secretariat of the Pacific Community: Statistics and Demography Programme.