100+ datasets found

World Population Statistics - 2023
kaggle.com
Updated Jan 9, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Bhavik Jikadara (2024). World Population Statistics - 2023 [Dataset]. https://www.kaggle.com/datasets/bhavikjikadara/world-population-statistics-2023
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jan 9, 2024
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Bhavik Jikadara
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
World
Description
The current US Census Bureau world population estimate in June 2019 shows that the current global population is 7,577,130,400 people on Earth, which far exceeds the world population of 7.2 billion in 2015. Our estimate based on UN data shows the world's population surpassing 7.7 billion.

China is the most populous country in the world with a population exceeding 1.4 billion. It is one of just two countries with a population of more than 1 billion, with India being the second. As of 2018, India has a population of over 1.355 billion people, and its population growth is expected to continue through at least 2050. By the year 2030, India is expected to become the most populous country in the world. This is because India’s population will grow, while China is projected to see a loss in population.

The following 11 countries that are the most populous in the world each have populations exceeding 100 million. These include the United States, Indonesia, Brazil, Pakistan, Nigeria, Bangladesh, Russia, Mexico, Japan, Ethiopia, and the Philippines. Of these nations, all are expected to continue to grow except Russia and Japan, which will see their populations drop by 2030 before falling again significantly by 2050.

Many other nations have populations of at least one million, while there are also countries that have just thousands. The smallest population in the world can be found in Vatican City, where only 801 people reside.

In 2018, the world’s population growth rate was 1.12%. Every five years since the 1970s, the population growth rate has continued to fall. The world’s population is expected to continue to grow larger but at a much slower pace. By 2030, the population will exceed 8 billion. In 2040, this number will grow to more than 9 billion. In 2055, the number will rise to over 10 billion, and another billion people won’t be added until near the end of the century. The current annual population growth estimates from the United Nations are in the millions - estimating that over 80 million new lives are added yearly.

This population growth will be significantly impacted by nine specific countries which are situated to contribute to the population growth more quickly than other nations. These nations include the Democratic Republic of the Congo, Ethiopia, India, Indonesia, Nigeria, Pakistan, Uganda, the United Republic of Tanzania, and the United States of America. Particularly of interest, India is on track to overtake China's position as the most populous country by 2030. Additionally, multiple nations within Africa are expected to double their populations before fertility rates begin to slow entirely.

Content

In this Dataset, we have Historical Population data for every Country/Territory in the world by different parameters like Area Size of the Country/Territory, Name of the Continent, Name of the Capital, Density, Population Growth Rate, Ranking based on Population, World Population Percentage, etc. >Dataset Glossary (Column-Wise):

Rank: Rank by Population.

CCA3: 3 Digit Country/Territories Code.

Country/Territories: Name of the Country/Territories.

Capital: Name of the Capital.

Continent: Name of the Continent.

2022 Population: Population of the Country/Territories in the year 2022.

2020 Population: Population of the Country/Territories in the year 2020.

2015 Population: Population of the Country/Territories in the year 2015.

2010 Population: Population of the Country/Territories in the year 2010.

2000 Population: Population of the Country/Territories in the year 2000.

1990 Population: Population of the Country/Territories in the year 1990.

1980 Population: Population of the Country/Territories in the year 1980.

1970 Population: Population of the Country/Territories in the year 1970.

Area (km²): Area size of the Country/Territories in square kilometers.

Density (per km²): Population Density per square kilometer.

Growth Rate: Population Growth Rate by Country/Territories.

World Population Percentage: The population percentage by each Country/Territories.
T
United States Population
tradingeconomics.com
es.tradingeconomics.com
+13more
csv, excel, json, xml
Updated Dec 15, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
TRADING ECONOMICS (2024). United States Population [Dataset]. https://tradingeconomics.com/united-states/population
Explore at:
excel, xml, csv, jsonAvailable download formats
Dataset updated
Dec 15, 2024
Dataset authored and provided by
TRADING ECONOMICS
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
Dec 31, 1900 - Dec 31, 2024
Area covered
United States
Description
The total population in the United States was estimated at 341.2 million people in 2024, according to the latest census figures and projections from Trading Economics. This dataset provides - United States Population - actual values, historical data, forecast, chart, statistics, economic calendar and news.
g
Development Economics Data Group - Severely food insecure people (million)...
gimi9.com
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Development Economics Data Group - Severely food insecure people (million) (3-year average) (FAO FS) | gimi9.com [Dataset]. https://gimi9.com/dataset/worldbank_fao_fs_210071/
Explore at:
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
Estimated number of people living in households classified as severely food insecure. It is calculated by multiplying the estimated percentage of people affected by severe food insecurity (I_2.5) by the total population.

Data from: Russian Troll Tweets

kaggle.com

Updated Aug 1, 2018

Facebook

Twitter

Click to copy link

Link copied

Cite

FiveThirtyEight (2018). Russian Troll Tweets [Dataset]. https://www.kaggle.com/fivethirtyeight/russian-troll-tweets/tasks

Explore at:

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Dataset updated

Aug 1, 2018

Dataset provided by

Kaggle

Authors

FiveThirtyEight

License

https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

Area covered

Russia

Description

3 million Russian troll tweets

This data was used in the FiveThirtyEight story Why We’re Sharing 3 Million Russian Troll Tweets.

This directory contains data on nearly 3 million tweets sent from Twitter handles connected to the Internet Research Agency, a Russian "troll factory" and a defendant in an indictment filed by the Justice Department in February 2018, as part of special counsel Robert Mueller's Russia investigation. The tweets in this database were sent between February 2012 and May 2018, with the vast majority posted from 2015 through 2017.

FiveThirtyEight obtained the data from Clemson University researchers Darren Linvill, an associate professor of communication, and Patrick Warren, an associate professor of economics, on July 25, 2018. They gathered the data using custom searches on a tool called Social Studio, owned by Salesforce and contracted for use by Clemson's Social Media Listening Center.

The basis for the Twitter handles included in this data are the November 2017 and June 2018 lists of Internet Research Agency-connected handles that Twitter provided to Congress. This data set contains every tweet sent from each of the 2,752 handles on the November 2017 list since May 10, 2015. For the 946 handles newly added on the June 2018 list, this data contains every tweet since June 19, 2015. (For certain handles, the data extends even earlier than these ranges. Some of the listed handles did not tweet during these ranges.) The researchers believe that this includes the overwhelming majority of these handles’ activity. The researchers also removed 19 handles that remained on the June 2018 list but that they deemed very unlikely to be IRA trolls.

In total, the nine CSV files include 2,973,371 tweets from 2,848 Twitter handles. Also, as always, caveat emptor -- in this case, tweet-reader beware: In addition to their own content, some of the tweets contain active links, which may lead to adult content or worse.

The Clemson researchers used this data in a working paper, Troll Factories: The Internet Research Agency and State-Sponsored Agenda Building, which is currently under review at an academic journal. The authors’ analysis in this paper was done on the data file provided here, limiting the date window to June 19, 2015, to Dec. 31, 2017.

The files have the following columns:

Header	Definition
`external_author_id`	An author account ID from Twitter
`author`	The handle sending the tweet
`content`	The text of the tweet
`region`	A region classification, as determined by Social Studio
`language`	The language of the tweet
`publish_date`	The date and time the tweet was sent
`harvested_date`	The date and time the tweet was collected by Social Studio
`following`	The number of accounts the handle was following at the time of the tweet
`followers`	The number of followers the handle had at the time of the tweet
`updates`	The number of “update actions” on the account that authored the tweet, including tweets, retweets and likes
`post_type`	Indicates if the tweet was a retweet or a quote-tweet
`account_type`	Specific account theme, as coded by Linvill and Warren
`retweet`	A binary indicator of whether or not the tweet is a retweet
`account_category`	General account theme, as coded by Linvill and Warren
`new_june_2018`	A binary indicator of whether the handle was newly listed in June 2018

If you use this data and find anything interesting, please let us know. Send your projects to oliver.roeder@fivethirtyeight.com or @ollie.

The Clemson researchers wish to acknowledge the assistance of the Clemson University Social Media Listening Center and Brandon Boatwright of the University of Tennessee, Knoxville.

Number of global social network users 2017-2028

statista.com
es.statista.com

Facebook

Twitter

Click to copy link

Link copied

Cite

Stacy Jo Dixon, Number of global social network users 2017-2028 [Dataset]. https://www.statista.com/topics/1164/social-networks/

Explore at:

Dataset provided by

Statistahttp://statista.com/

Authors

Stacy Jo Dixon

Description

How many people use social media?

              Social media usage is one of the most popular online activities. In 2024, over five billion people were using social media worldwide, a number projected to increase to over six billion in 2028.

              Who uses social media?
              Social networking is one of the most popular digital activities worldwide and it is no surprise that social networking penetration across all regions is constantly increasing. As of January 2023, the global social media usage rate stood at 59 percent. This figure is anticipated to grow as lesser developed digital markets catch up with other regions
              when it comes to infrastructure development and the availability of cheap mobile devices. In fact, most of social media’s global growth is driven by the increasing usage of mobile devices. Mobile-first market Eastern Asia topped the global ranking of mobile social networking penetration, followed by established digital powerhouses such as the Americas and Northern Europe.

              How much time do people spend on social media?
              Social media is an integral part of daily internet usage. On average, internet users spend 151 minutes per day on social media and messaging apps, an increase of 40 minutes since 2015. On average, internet users in Latin America had the highest average time spent per day on social media.

              What are the most popular social media platforms?
              Market leader Facebook was the first social network to surpass one billion registered accounts and currently boasts approximately 2.9 billion monthly active users, making it the most popular social network worldwide. In June 2023, the top social media apps in the Apple App Store included mobile messaging apps WhatsApp and Telegram Messenger, as well as the ever-popular app version of Facebook.

d
Input Digital Datasets for the Soil-Water Balance Groundwater Recharge Model...
catalog.data.gov
data.usgs.gov
+1more
Updated Aug 15, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. Geological Survey (2025). Input Digital Datasets for the Soil-Water Balance Groundwater Recharge Model of the Upper Colorado River Basin [Dataset]. https://catalog.data.gov/dataset/input-digital-datasets-for-the-soil-water-balance-groundwater-recharge-model-of-the-upper-
Explore at:
Dataset updated
Aug 15, 2025
Dataset provided by
United States Geological Surveyhttp://www.usgs.gov/
Area covered
Colorado River
Description
The Colorado River and its tributaries supply water to more than 35 million people in the United States and 3 million people in Mexico, irrigating more than 4.5 million acres of farmland, and generating about 12 billion kilowatt hours of hydroelectric power annually. Planning for the sustainable management of the Colorado River in future climates requires an understanding of the Upper Colorado River Basin groundwater system. The Upper Colorado River Basin, encompassing more than 110,000 square miles (mi2), contains the headwaters of the Colorado River and is an important source of snowmelt runoff to the River. Groundwater discharge also is an important source of water in the River and its tributaries, with estimates ranging from 21 to 58 percent of streamflow in the upper basin. A study by Castle and others (2014) using remotely sensed gravity observations from the NASA Gravity Recovery and Climate Experiment (GRACE) mission found that UCRB groundwater was depleted by more than 17 million acre-feet (ft) from December 2004 to November 2013. Understanding groundwater-budget components, including groundwater recharge, is important to sustainably manage both groundwater and surface-water supplies in the Colorado River Basin.

Countries with the most Facebook users 2024

statista.com
es.statista.com

Facebook

Twitter

Click to copy link

Link copied

Cite

Stacy Jo Dixon, Countries with the most Facebook users 2024 [Dataset]. https://www.statista.com/topics/1164/social-networks/

Explore at:

Dataset provided by

Statistahttp://statista.com/

Authors

Stacy Jo Dixon

Description

Which county has the most Facebook users?

              There are more than 378 million Facebook users in India alone, making it the leading country in terms of Facebook audience size. To put this into context, if India’s Facebook audience were a country then it would be ranked third in terms of largest population worldwide. Apart from India, there are several other markets with more than 100 million Facebook users each: The United States, Indonesia, and Brazil with 193.8 million, 119.05 million, and 112.55 million Facebook users respectively.

              Facebook – the most used social media

              Meta, the company that was previously called Facebook, owns four of the most popular social media platforms worldwide, WhatsApp, Facebook Messenger, Facebook, and Instagram. As of the third quarter of 2021, there were around 3,5 billion cumulative monthly users of the company’s products worldwide. With around 2.9 billion monthly active users, Facebook is the most popular social media worldwide. With an audience of this scale, it is no surprise that the vast majority of Facebook’s revenue is generated through advertising.

              Facebook usage by device
              As of July 2021, it was found that 98.5 percent of active users accessed their Facebook account from mobile devices. In fact, almost 81.8 percent of Facebook audiences worldwide access the platform only via mobile phone. Facebook is not only available through mobile browser as the company has published several mobile apps for users to access their products and services. As of the third quarter 2021, the four core Meta products were leading the ranking of most downloaded mobile apps worldwide, with WhatsApp amassing approximately six billion downloads.

N
North Carolina Age Cohorts Dataset: Children, Working Adults, and Seniors in...
neilsberg.com
csv, json
Updated Feb 22, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Neilsberg Research (2025). North Carolina Age Cohorts Dataset: Children, Working Adults, and Seniors in North Carolina - Population and Percentage Analysis // 2025 Edition [Dataset]. https://www.neilsberg.com/research/datasets/4b987530-f122-11ef-8c1b-3860777c1fe6/
Explore at:
json, csvAvailable download formats
Dataset updated
Feb 22, 2025
Dataset authored and provided by
Neilsberg Research
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
North Carolina
Variables measured
Population Over 65 Years, Population Under 18 Years, Population Between 18 and 64 Years, Percent of Total Population for Age Groups
Measurement technique
The data presented in this dataset is derived from the latest U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates. To measure the two variables, namely (a) population and (b) population as a percentage of the total population, we initially analyzed and categorized the data for each of the age cohorts. For age cohorts we divided it into three buckets Children ( Under the age of 18 years), working population ( Between 18 and 64 years) and senior population ( Over 65 years). For further information regarding these estimates, please feel free to reach out to us via email at research@neilsberg.com.
Dataset funded by
Neilsberg Research
Description
About this dataset

Context

The dataset tabulates the North Carolina population by age cohorts (Children: Under 18 years; Working population: 18-64 years; Senior population: 65 years or more). It lists the population in each age cohort group along with its percentage relative to the total population of North Carolina. The dataset can be utilized to understand the population distribution across children, working population and senior population for dependency ratio, housing requirements, ageing, migration patterns etc.

Key observations

The largest age group was 18 to 64 years with a poulation of 6.47 million (61.17% of the total population). Source: U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates.

Content

When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates.

Age cohorts:

Under 18 years

18 to 64 years

65 years and over

Variables / Data Columns

Age Group: This column displays the age cohort for the North Carolina population analysis. Total expected values are 3 groups ( Children, Working Population and Senior Population).

Population: The population for the age cohort in North Carolina is shown in the following column.

Percent of Total Population: The population as a percent of total population of the North Carolina is shown in the following column.

Good to know

Margin of Error

Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.

Custom data

If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.

Inspiration

Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.

Recommended for further research

This dataset is a part of the main dataset for North Carolina Population by Age. You can refer the same here
Max Foundation Bangladesh Healthy Village Tracker
kaggle.com
Updated Dec 14, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Remco Geervliet (2021). Max Foundation Bangladesh Healthy Village Tracker [Dataset]. https://www.kaggle.com/remcogeervliet/max-foundation-bangladesh-healthy-village-tracker/discussion
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Dec 14, 2021
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Remco Geervliet
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Area covered
Bangladesh
Description
Max Foundation

Max Foundation is a Netherlands-based NGO that works towards a healthy start for every child in the most effective and long-lasting way. Over the past 15 years, our teams in Bangladesh and Ethiopia have reached almost 3 million people, supporting communities in reducing stunting and undernutrition by gaining better access to clean water, sanitation and hygiene, as well as healthy diets and care for mother and child.

Maximising our impact and cost efficiency are at the core of our work, which makes quantifying and analysing our programmes crucial. We therefore collect a lot of information on the communities we work with; to understand them better and see where and how we can improve as an organisation.

This dataset is one of many we are making publicly available because we believe that data in the development sector should be open: not as a goal in itself, but as a way to help the sector be more effective and create more impact.

Content

These data are collected quarterly at the village-level, in aggregate. In Max Foundation's Healthy Village Approach, our team has created several indicators to track how villages are progressing on WASH (water, sanitation and hygiene), nutrition, and SRHR (sexual and reproductive health and rights) & Baby WASH.

Privacy and links to our other data

All of Max Foundation's data are collected and processed according to GDPR standards and explicit informed consent is given by all respondents. They are also clearly informed that choosing not to participate in data collection will in no way affect their eligibility for, or receiving of, products or services from Max Foundation.

Furthermore, we enforce strong privacy protections on our open data to minimise the risk of these data being used to cause harm or re-identify individuals.

Concretely this means: - Village are masked by random numbers. However, to ensure it is still possible to compare our data sets, these random numbers are consistent across all datasets. This means that village '1' in this data is the same as village '1' in all of our other Bangladesh datasets, unless stated otherwise. Higher level administrative units can be deduced from matching the village numbers to the bd_ loc_XX datasets in the Max Foundation Bangladesh 2018 WASH Census dataset. - Population counts have been bucketed. The values represent the mid-point of a given bucket, for the number of households in the village, which is bucketed by 20 households, the value 50 represents 40-60 households. The values have also been censored at the upper end, and some at the lower end as well. The column descriptions specify any transformations done to the data.

A final note to anyone trying to link Max Foundation's various datasets; as data is self-reported, sometimes by individuals other times by whole communities, there may be differences in for instance the number of households or the number of stunted children in a given village in this dataset versus in another. Some differences can be explained by differences in definitions (a household is a concept that is often hard to define and its interpretation may vary from person to person), and others by a lack of information on the part of a respondent. We therefore encourage you to look at these differences and see which value makes the most sense for the specific analysis you are conducting.

Acknowledgements

These data could have not been collected without the generous support from the Embassy of the Kingdom of the Netherlands in Dhaka and numerous other donors who have supported us over the years. Special thanks to our Bangladesh team for their excellent work in guiding the data collection process.

What you can do for our communities

We invite you to share any interesting insights you have derived from the data with us. From visualising our impact, to uncovering which parts of our programmes are most strongly related with reducing stunting, to making new connections we may have not even considered; we are eager to hear how we can be more effective in what we do and how we do it.

More detailed data insights are available from our internal data, such as the linking of households between datasets. Please note that we would be happy to share more detailed data with researchers, students and many others once proper agreements are in place.

As we value impact above all else, we are happy to work with anyone who can help us to improve this. We are constantly adapting our approach based on internal and external findings, and invite you to join us on this journey. Together we can ensure that every child has a healthy start.
FSboard
kaggle.com
zip
Updated Feb 25, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Google Research (2025). FSboard [Dataset]. https://www.kaggle.com/datasets/googleai/fsboard/code
Explore at:
zip(0 bytes)Available download formats
Dataset updated
Feb 25, 2025
Dataset provided by
Googlehttp://google.com/
Authors
Google Research
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Summary

FSboard is an American Sign Language fingerspelling dataset situated in a mobile text entry use case, collected from 147 paid and consenting Deaf signers using Pixel 4A selfie cameras in a variety of environments. At >3 million characters in length and >250 hours in duration, FSboard is the largest fingerspelling recognition dataset to date by a factor of >10x.

We previously hosted a Kaggle competition using MediaPipe Holistic landmarks for the FSboard data; this release now includes the underlying RGB videos and val/test sets.

See the our paper for a more complete exposition of the dataset: FSboard: Over 3 million characters of ASL fingerspelling collected via smartphones

The dataset consists of several categories of synthetically generated phrases (examples in the table below, not real PII) recorded as video clips of ASL fingerspelling (example frames in the figure below, faces blurred here but not in the dataset).

Directory Category Example
"dmk" MacKenzie phrases prevailing wind from the east
"daun" URLs /dfinance/list.asp?id=418/
Addresses 9841 gritt hill
Phone Numbers 166-893-6320
Names mohammed kim

https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F20954272%2F2a7512937441315b8ddf742e9d02195d%2Ffs-blurred.png?generation=1739550608040254&alt=media" alt="">

Responsible Use

While facial expressions are an essential component of sign language and are therefore included in the dataset, we ask that you blur the signers’ faces when publicizing examples. You should not attempt to reidentify the signers or use their likenesses to generate and publish other content (deepfakes). Please be culturally respectful of the Deaf/Hard of Hearing community in your use of the dataset and do not exaggerate the significance of improving ASL fingerspelling performance, which is only one small component of American Sign Language.

Landmarks

Landmarks were extracted using MediaPipe Holistic . They are provided as tf.train.SequenceExample entries in TFRecordio files. There is also a script which converts these TFRecordio files to Parquet files in a similar format to the one used in the previous Kaggle Competition. Since each entry in the Parquet file represents a single landmark frame, the script also produces a supplemental csv file with video level information.

Sensitive Content Filtering

The synthetic URLs generated in the dataset were created by recombining parts from real URLs. As such, the full breadth of content available on the internet is represented. It is important not to infantilize the Deaf community, and therefore important to ensure that any applications in this space is able to produce arbitrary output. Imagine the frustration when your keyboard r*****s to produce certain ducking words. However, it's also important to ensure that an application doesn't easily produce offensive unintended content. In an effort to facilitate people making sane decisions with this data, we've run a sensitive content filter and keyword searches on the phrases used and manually reviewed the result to produce a boolean tag "sensitiveContent" which is available in the json files. Please ensure that the Deaf community is involved in the creation of any applications targeted to them.

Attribution

If you use FSboard in your work, please cite: @misc{georg2024fsboard3millioncharacters, title={FSboard: Over 3 million characters of ASL fingerspelling collected via smartphones}, author={Manfred Georg and Garrett Tanzer and Saad Hassan and Maximus Shengelia and Esha Uboweja and Sam Sepah and Sean Forbes and Thad Starner}, year={2024}, eprint={2407.15806}, archivePrefix={arXiv}, primaryClass={cs.CV}, url={https://arxiv.org/abs/2407.15806}, }
GitHub Activity Data
console.cloud.google.com
Updated Jun 23, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
https://console.cloud.google.com/marketplace/browse?filter=partner:GitHub&inv=1&invt=Ab41nA (2022). GitHub Activity Data [Dataset]. https://console.cloud.google.com/marketplace/product/github/github-repos
Explore at:
Dataset updated
Jun 23, 2022
Dataset provided by
GitHubhttps://github.com/
Googlehttp://google.com/
Description
GitHub is how people build software and is home to the largest community of open source developers in the world, with over 12 million people contributing to 31 million projects on GitHub since 2008. This 3TB+ dataset comprises the largest released source of GitHub activity to date. It contains a full snapshot of the content of more than 2.8 million open source GitHub repositories including more than 145 million unique commits, over 2 billion different file paths, and the contents of the latest revision for 163 million files, all of which are searchable with regular expressions. This public dataset is hosted in Google BigQuery and is included in BigQuery's 1TB/mo of free tier processing. This means that each user receives 1TB of free BigQuery processing every month, which can be used to run queries on this public dataset. Watch this short video to learn how to get started quickly using BigQuery to access public datasets. What is BigQuery .
H-1B Visa Petitions 2011-2016
kaggle.com
Updated Feb 28, 2017
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sharan Naribole (2017). H-1B Visa Petitions 2011-2016 [Dataset]. https://www.kaggle.com/nsharan/h-1b-visa/data
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Feb 28, 2017
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Sharan Naribole
License
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Description
Context

H-1B visas are a category of employment-based, non-immigrant visas for temporary foreign workers in the United States. For a foreign national to apply for H1-B visa, a US employer must offer them a job and submit a petition for a H-1B visa to the US immigration department. This is also the most common visa status applied for and held by international students once they complete college or higher education and begin working in a full-time position.

The following articles contain more information about the H-1B visa process:

What is H1B LCA ? Why file it ? Salary, Processing times – DOL

H1B Application Process: Step by Step Guide

Content

This dataset contains five year's worth of H-1B petition data, with approximately 3 million records overall. The columns in the dataset include case status, employer name, worksite coordinates, job title, prevailing wage, occupation code, and year filed.

For more information on individual columns, refer to the column metadata. A detailed description of the underlying raw dataset is available in an official data dictionary.

Acknowledgements

The Office of Foreign Labor Certification (OFLC) generates program data, including data about H1-B visas. The disclosure data updated annually and is available online.

The raw data available is messy and not immediately suitable analysis. A set of data transformations were performed making the data more accessible for quick exploration. To learn more, refer to this blog post and to the complimentary R Notebook.

Inspiration

Is the number of petitions with Data Engineer job title increasing over time?

Which part of the US has the most Hardware Engineer jobs?

Which industry has the most number of Data Scientist positions?

Which employers file the most petitions each year?
T
Euro Area Population
tradingeconomics.com
pt.tradingeconomics.com
+13more
csv, excel, json, xml
Updated Oct 10, 2012
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
TRADING ECONOMICS (2012). Euro Area Population [Dataset]. https://tradingeconomics.com/euro-area/population
Explore at:
xml, excel, json, csvAvailable download formats
Dataset updated
Oct 10, 2012
Dataset authored and provided by
TRADING ECONOMICS
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
Dec 31, 1960 - Dec 31, 2025
Area covered
Euro Area
Description
The total population In the Euro Area was estimated at 351.4 million people in 2025, according to the latest census figures and projections from Trading Economics. This dataset provides the latest reported value for - Euro Area Population - plus previous releases, historical high and low, short-term forecast and long-term prediction, economic calendar, survey consensus and news.
Google Landmarks Dataset v2
github.com
opendatalab.com
Updated Sep 27, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Google (2019). Google Landmarks Dataset v2 [Dataset]. https://github.com/cvdfoundation/google-landmark
Explore at:
Dataset updated
Sep 27, 2019
Dataset provided by
Googlehttp://google.com/
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This is the second version of the Google Landmarks dataset (GLDv2), which contains images annotated with labels representing human-made and natural landmarks. The dataset can be used for landmark recognition and retrieval experiments. This version of the dataset contains approximately 5 million images, split into 3 sets of images: train, index and test. The dataset was presented in our CVPR'20 paper. In this repository, we present download links for all dataset files and relevant code for metric computation. This dataset was associated to two Kaggle challenges, on landmark recognition and landmark retrieval. Results were discussed as part of a CVPR'19 workshop. In this repository, we also provide scores for the top 10 teams in the challenges, based on the latest ground-truth version. Please visit the challenge and workshop webpages for more details on the data, tasks and technical solutions from top teams.
a
Wikilinks: A Large-scale Cross-Document Coreference Corpus Labeled via Links...
academictorrents.com
bittorrent
Updated Mar 4, 2017
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sameer Singh and Amarnag Subramanya and Fernando Pereira and Andrew McCallum (2017). Wikilinks: A Large-scale Cross-Document Coreference Corpus Labeled via Links to Wikipedia (Original Dataset) [Dataset]. https://academictorrents.com/details/beefa2ec4161432cd1d9f693a88d3670aae68357
Explore at:
bittorrent(1837946933)Available download formats
Dataset updated
Mar 4, 2017
Dataset authored and provided by
Sameer Singh and Amarnag Subramanya and Fernando Pereira and Andrew McCallum
License
https://academictorrents.com/nolicensespecifiedhttps://academictorrents.com/nolicensespecified
Description
Cross-document coreference resolution is the task of grouping the entity mentions in a collection of documents into sets that each represent a distinct entity. It is central to knowledge base construction and also useful for joint inference with other NLP components. Obtaining large, organic labeled datasets for training and testing cross-document coreference has previously been difficult. We use a method for automatically gathering massive amounts of naturally-occurring cross-document reference data to create the Wikilinks dataset comprising of 40 million mentions over 3 million entities. Our method is based on finding hyperlinks to Wikipedia from a web crawl and using anchor text as mentions. In addition to providing large-scale labeled data without human effort, we are able to include many styles of text beyond newswire and many entity types beyond people. ### Introduction The Wikipedia links (WikiLinks) data consists of web pages that satisfy the following two constraints: a. conta
Facebook users worldwide 2017-2027
statista.com
es.statista.com
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Stacy Jo Dixon, Facebook users worldwide 2017-2027 [Dataset]. https://www.statista.com/topics/1164/social-networks/
Explore at:
Dataset provided by
Statistahttp://statista.com/
Authors
Stacy Jo Dixon
Description
The global number of Facebook users was forecast to continuously increase between 2023 and 2027 by in total 391 million users (+14.36 percent). After the fourth consecutive increasing year, the Facebook user base is estimated to reach 3.1 billion users and therefore a new peak in 2027. Notably, the number of Facebook users was continuously increasing over the past years. User figures, shown here regarding the platform Facebook, have been estimated by taking into account company filings or press material, secondary research, app downloads and traffic data. They refer to the average monthly active users over the period and count multiple accounts by persons only once.The shown data are an excerpt of Statista's Key Market Indicators (KMI). The KMI are a collection of primary and secondary indicators on the macro-economic, demographic and technological environment in up to 150 countries and regions worldwide. All indicators are sourced from international and national statistical offices, trade associations and the trade press and they are processed to generate comparable data sets (see supplementary notes under details for more information).
h
Stable_Diffusion_3_Recaption
huggingface.co
Updated Apr 23, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Gabriel Mongaras (2025). Stable_Diffusion_3_Recaption [Dataset]. https://huggingface.co/datasets/gmongaras/Stable_Diffusion_3_Recaption
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Apr 23, 2025
Authors
Gabriel Mongaras
License
https://choosealicense.com/licenses/openrail/https://choosealicense.com/licenses/openrail/
Description
This dataset is the one specified in the stable diffusion 3 paper which is composed of the ImageNet dataset and the CC12M dataset.

I used the ImageNet 2012 train/val data and captioned it as specified in the paper: "a photo of a 〈class name〉" (note all ids are 999,999,999) CC12M is a dataset with 12 million images created in 2021. Unfortunately the downloader provided by Google has many broken links and the download takes forever. However, some people in the community publicized the dataset.… See the full description on the dataset page: https://huggingface.co/datasets/gmongaras/Stable_Diffusion_3_Recaption.
Data from: GeoNames
data.wu.ac.at
huggingface.co
zip
Updated Oct 10, 2013
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Open Geospatial Data (2013). GeoNames [Dataset]. https://data.wu.ac.at/schema/datahub_io/MzE1MTQ4YWYtZmQyOC00ZWJjLTg3MDEtZWVkMDExNTE3MDA0
Explore at:
zipAvailable download formats
Dataset updated
Oct 10, 2013
Dataset provided by
Open Geospatial Consortiumhttps://www.ogc.org/
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The geonames.org geographical database is available for download free of charge under a creative commons attribution license. It contains over eight million geographical names and consists of 6.3 million unique features whereof 2.2 million populated places and 1.8 million alternate names. All features are categorized into one out of nine feature classes and further subcategorized into one out of 645 feature codes. (more statistics ...).

The data is accessible free of charge through a number of webservices and a daily database export. Geonames.org is already serving up to over 3 million web service requests per day.

Geonames is integrating geographical data such as names of places in various languages, elevation, population and others from various sources. All lat/long coordinates are in WGS84 (World Geodetic System 1984). Users may manually edit, correct and add new names using a user friendly wiki interface.

TODO

This is a large dataset and there are a whole bunch of specially exported subsets of data at http://download.geonames.org/export/dump/ which it might be worth turning into separate datasets (or at least listing here in Resources).

Linked Data

Geonames locations are available as linked data, see dataset:geonames-semantic-web

Directory	Category	Example
"`dmk`"	MacKenzie phrases	prevailing wind from the east
"`daun`"	URLs	/dfinance/list.asp?id=418/
	Addresses	9841 gritt hill
	Phone Numbers	166-893-6320
	Names	mohammed kim

Average daily time spent on social media worldwide 2012-2024

statista.com
es.statista.com

+ more versions

Facebook

Twitter

Click to copy link

Link copied

Cite

Stacy Jo Dixon, Average daily time spent on social media worldwide 2012-2024 [Dataset]. https://www.statista.com/topics/1164/social-networks/

Explore at:

Dataset provided by

Statistahttp://statista.com/

Authors

Stacy Jo Dixon

Description

How much time do people spend on social media?

              As of 2024, the average daily social media usage of internet users worldwide amounted to 143 minutes per day, down from 151 minutes in the previous year. Currently, the country with the most time spent on social media per day is Brazil, with online users spending an average of three hours and 49 minutes on social media each day. In comparison, the daily time spent with social media in
              the U.S. was just two hours and 16 minutes. Global social media usageCurrently, the global social network penetration rate is 62.3 percent. Northern Europe had an 81.7 percent social media penetration rate, topping the ranking of global social media usage by region. Eastern and Middle Africa closed the ranking with 10.1 and 9.6 percent usage reach, respectively.
              People access social media for a variety of reasons. Users like to find funny or entertaining content and enjoy sharing photos and videos with friends, but mainly use social media to stay in touch with current events friends. Global impact of social mediaSocial media has a wide-reaching and significant impact on not only online activities but also offline behavior and life in general.
              During a global online user survey in February 2019, a significant share of respondents stated that social media had increased their access to information, ease of communication, and freedom of expression. On the flip side, respondents also felt that social media had worsened their personal privacy, increased a polarization in politics and heightened everyday distractions.

‘COVID vaccination vs. mortality ’ analyzed by Analyst-2

analyst-2.ai

Updated Aug 4, 2020

Facebook

Twitter

Click to copy link

Link copied

Cite

Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2020). ‘COVID vaccination vs. mortality ’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-covid-vaccination-vs-mortality-cbd8/06c8ccd2/?iid=010-492&v=presentation

Explore at:

Dataset updated

Aug 4, 2020

Dataset authored and provided by

Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Analysis of ‘COVID vaccination vs. mortality ’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/sinakaraji/covid-vaccination-vs-death on 12 November 2021.

--- Dataset description provided by original source is as follows ---

Context

The COVID-19 outbreak has brought the whole planet to its knees.More over 4.5 million people have died since the writing of this notebook, and the only acceptable way out of the disaster is to vaccinate all parts of society. Despite the fact that the benefits of vaccination have been proved to the world many times, anti-vaccine groups are springing up all over the world. This data set was generated to investigate the impact of coronavirus vaccinations on coronavirus mortality.

Content

country	iso_code	date	total_vaccinations	people_vaccinated	people_fully_vaccinated	New_deaths	population	ratio
country name	iso code for each country	date that this data belong	number of all doses of COVID vaccine usage in that country	number of people who got at least one shot of COVID vaccine	number of people who got full vaccine shots	number of daily new deaths	2021 country population	% of vaccinations in that country at that date = people_vaccinated/population * 100

Data Collection

This dataset is a combination of the following three datasets:

1.https://www.kaggle.com/gpreda/covid-world-vaccination-progress

2.https://covid19.who.int/WHO-COVID-19-global-data.csv

3.https://www.kaggle.com/rsrishav/world-population

you can find more detail about this dataset by reading this notebook:

https://www.kaggle.com/sinakaraji/simple-linear-regression-covid-vaccination

Countries in this dataset:


Afghanistan	Albania	Algeria	Andorra	Angola
Anguilla	Antigua and Barbuda	Argentina	Armenia	Aruba
Australia	Austria	Azerbaijan	Bahamas	Bahrain
Bangladesh	Barbados	Belarus	Belgium	Belize
Benin	Bermuda	Bhutan	Bolivia (Plurinational State of)	Brazil
Bosnia and Herzegovina	Botswana	Brunei Darussalam	Bulgaria	Burkina Faso
Cambodia	Cameroon	Canada	Cabo Verde	Cayman Islands
Central African Republic	Chad	Chile	China	Colombia
Comoros	Cook Islands	Costa Rica	Croatia	Cuba
Curaçao	Cyprus	Denmark	Djibouti	Dominica
Dominican Republic	Ecuador	Egypt	El Salvador	Equatorial Guinea
Estonia	Ethiopia	Falkland Islands (Malvinas)	Fiji	Finland
France	French Polynesia	Gabon	Gambia	Georgia
Germany	Ghana	Gibraltar	Greece	Greenland
Grenada	Guatemala	Guinea	Guinea-Bissau	Guyana
Haiti	Honduras	Hungary	Iceland	India
Indonesia	Iran (Islamic Republic of)	Iraq	Ireland	Isle of Man
Israel	Italy	Jamaica	Japan	Jordan
Kazakhstan	Kenya	Kiribati	Kuwait	Kyrgyzstan
Lao People's Democratic Republic	Latvia	Lebanon	Lesotho	Liberia
Libya	Liechtenstein	Lithuania	Luxembourg	Madagascar
Malawi	Malaysia	Maldives	Mali	Malta
Mauritania	Mauritius	Mexico	Republic of Moldova	Monaco
Mongolia	Montenegro	Montserrat	Morocco	Mozambique
Myanmar	Namibia	Nauru	Nepal	Netherlands
New Caledonia	New Zealand	Nicaragua	Niger	Nigeria
Niue	North Macedonia	Norway	Oman	Pakistan
occupied Palestinian territory, including east Jerusalem
Panama	Papua New Guinea	Paraguay	Peru	Philippines
Poland	Portugal	Qatar	Romania	Russian Federation
Rwanda	Saint Kitts and Nevis	Saint Lucia
Saint Vincent and the Grenadines	Samoa	San Marino	Sao Tome and Principe	Saudi Arabia
Senegal	Serbia	Seychelles	Sierra Leone	Singapore
Slovakia	Slovenia	Solomon Islands	Somalia	South Africa
Republic of Korea	South Sudan	Spain	Sri Lanka	Sudan
Suriname	Sweden	Switzerland	Syrian Arab Republic	Tajikistan
United Republic of Tanzania	Thailand	Togo	Tonga	Trinidad and Tobago
Tunisia	Turkey	Turkmenistan	Turks and Caicos Islands	Tuvalu
Uganda	Ukraine	United Arab Emirates	The United Kingdom	United States of America
Uruguay	Uzbekistan	Vanuatu	Venezuela (Bolivarian Republic of)	Viet Nam
Wallis and Futuna	Yemen	Zambia	Zimbabwe

--- Original source retains full ownership of the source dataset ---

Facebook

Twitter

Click to copy link

Link copied

Cite

Bhavik Jikadara (2024). World Population Statistics - 2023 [Dataset]. https://www.kaggle.com/datasets/bhavikjikadara/world-population-statistics-2023

World Population Statistics - 2023

Highlights From the 2023 World Population Data Sheet

Explore at:

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Dataset updated

Jan 9, 2024

Dataset provided by

Kagglehttp://kaggle.com/

Authors

Bhavik Jikadara

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Area covered

World

Description

The current US Census Bureau world population estimate in June 2019 shows that the current global population is 7,577,130,400 people on Earth, which far exceeds the world population of 7.2 billion in 2015. Our estimate based on UN data shows the world's population surpassing 7.7 billion.
China is the most populous country in the world with a population exceeding 1.4 billion. It is one of just two countries with a population of more than 1 billion, with India being the second. As of 2018, India has a population of over 1.355 billion people, and its population growth is expected to continue through at least 2050. By the year 2030, India is expected to become the most populous country in the world. This is because India’s population will grow, while China is projected to see a loss in population.
The following 11 countries that are the most populous in the world each have populations exceeding 100 million. These include the United States, Indonesia, Brazil, Pakistan, Nigeria, Bangladesh, Russia, Mexico, Japan, Ethiopia, and the Philippines. Of these nations, all are expected to continue to grow except Russia and Japan, which will see their populations drop by 2030 before falling again significantly by 2050.
Many other nations have populations of at least one million, while there are also countries that have just thousands. The smallest population in the world can be found in Vatican City, where only 801 people reside.
In 2018, the world’s population growth rate was 1.12%. Every five years since the 1970s, the population growth rate has continued to fall. The world’s population is expected to continue to grow larger but at a much slower pace. By 2030, the population will exceed 8 billion. In 2040, this number will grow to more than 9 billion. In 2055, the number will rise to over 10 billion, and another billion people won’t be added until near the end of the century. The current annual population growth estimates from the United Nations are in the millions - estimating that over 80 million new lives are added yearly.
This population growth will be significantly impacted by nine specific countries which are situated to contribute to the population growth more quickly than other nations. These nations include the Democratic Republic of the Congo, Ethiopia, India, Indonesia, Nigeria, Pakistan, Uganda, the United Republic of Tanzania, and the United States of America. Particularly of interest, India is on track to overtake China's position as the most populous country by 2030. Additionally, multiple nations within Africa are expected to double their populations before fertility rates begin to slow entirely.

Content

In this Dataset, we have Historical Population data for every Country/Territory in the world by different parameters like Area Size of the Country/Territory, Name of the Continent, Name of the Capital, Density, Population Growth Rate, Ranking based on Population, World Population Percentage, etc. >Dataset Glossary (Column-Wise):
Rank: Rank by Population.
CCA3: 3 Digit Country/Territories Code.
Country/Territories: Name of the Country/Territories.
Capital: Name of the Capital.
Continent: Name of the Continent.
2022 Population: Population of the Country/Territories in the year 2022.
2020 Population: Population of the Country/Territories in the year 2020.
2015 Population: Population of the Country/Territories in the year 2015.
2010 Population: Population of the Country/Territories in the year 2010.
2000 Population: Population of the Country/Territories in the year 2000.
1990 Population: Population of the Country/Territories in the year 1990.
1980 Population: Population of the Country/Territories in the year 1980.
1970 Population: Population of the Country/Territories in the year 1970.
Area (km²): Area size of the Country/Territories in square kilometers.
Density (per km²): Population Density per square kilometer.
Growth Rate: Population Growth Rate by Country/Territories.
World Population Percentage: The population percentage by each Country/Territories.

Clear search

Close search

Google apps

Main menu

World Population Statistics - 2023

Content

United States Population

Development Economics Data Group - Severely food insecure people (million)...

Data from: Russian Troll Tweets

3 million Russian troll tweets

Number of global social network users 2017-2028

Input Digital Datasets for the Soil-Water Balance Groundwater Recharge Model...

Countries with the most Facebook users 2024

North Carolina Age Cohorts Dataset: Children, Working Adults, and Seniors in...

About this dataset

Content

Inspiration

Recommended for further research

Max Foundation Bangladesh Healthy Village Tracker

Max Foundation

Content

Privacy and links to our other data

Acknowledgements

What you can do for our communities

FSboard

Summary

Responsible Use

Landmarks

Sensitive Content Filtering

Attribution

GitHub Activity Data

H-1B Visa Petitions 2011-2016

Context

Content

Acknowledgements

Inspiration

Euro Area Population

Google Landmarks Dataset v2

Wikilinks: A Large-scale Cross-Document Coreference Corpus Labeled via Links...

Facebook users worldwide 2017-2027

Stable_Diffusion_3_Recaption

Data from: GeoNames

TODO

Linked Data

Average daily time spent on social media worldwide 2012-2024

‘COVID vaccination vs. mortality ’ analyzed by Analyst-2

Context

Content

Data Collection

Countries in this dataset:

World Population Statistics - 2023

Highlights From the 2023 World Population Data Sheet

Content