86 datasets found

Yelp Dataset - Contains 1 million rows
kaggle.com
Updated Jan 29, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Abdul Majid (2022). Yelp Dataset - Contains 1 million rows [Dataset]. https://www.kaggle.com/datasets/abdulmajid115/yelp-dataset-contains-1-million-rows
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jan 29, 2022
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Abdul Majid
Description
Context

The data has been acquired from yelp website.

Content

The data can help people find companies/organizations with respect to ratings and reviews. This can help people to choose or recommend best services out there.
N
United States Age Group Population Dataset: A Complete Breakdown of United...
neilsberg.com
csv, json
Updated Jul 24, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Neilsberg Research (2024). United States Age Group Population Dataset: A Complete Breakdown of United States Age Demographics from 0 to 85 Years and Over, Distributed Across 18 Age Groups // 2024 Edition [Dataset]. https://www.neilsberg.com/research/datasets/aabf26b9-4983-11ef-ae5d-3860777c1fe6/
Explore at:
csv, jsonAvailable download formats
Dataset updated
Jul 24, 2024
Dataset authored and provided by
Neilsberg Research
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
United States
Variables measured
Population Under 5 Years, Population over 85 years, Population Between 5 and 9 years, Population Between 10 and 14 years, Population Between 15 and 19 years, Population Between 20 and 24 years, Population Between 25 and 29 years, Population Between 30 and 34 years, Population Between 35 and 39 years, Population Between 40 and 44 years, and 9 more
Measurement technique
The data presented in this dataset is derived from the latest U.S. Census Bureau American Community Survey (ACS) 2018-2022 5-Year Estimates. To measure the two variables, namely (a) population and (b) population as a percentage of the total population, we initially analyzed and categorized the data for each of the age groups. For age groups we divided it into roughly a 5 year bucket for ages between 0 and 85. For over 85, we aggregated data into a single group for all ages. For further information regarding these estimates, please feel free to reach out to us via email at research@neilsberg.com.
Dataset funded by
Neilsberg Research
Description
About this dataset

Context

The dataset tabulates the United States population distribution across 18 age groups. It lists the population in each age group along with the percentage population relative of the total population for United States. The dataset can be utilized to understand the population distribution of United States by age. For example, using this dataset, we can identify the largest age group in United States.

Key observations

The largest age group in United States was for the group of age 30 to 34 years years with a population of 22.71 million (6.86%), according to the ACS 2018-2022 5-Year Estimates. At the same time, the smallest age group in United States was the 80 to 84 years years with a population of 6.25 million (1.89%). Source: U.S. Census Bureau American Community Survey (ACS) 2018-2022 5-Year Estimates

Content

When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2018-2022 5-Year Estimates

Age groups:

Under 5 years

5 to 9 years

10 to 14 years

15 to 19 years

20 to 24 years

25 to 29 years

30 to 34 years

35 to 39 years

40 to 44 years

45 to 49 years

50 to 54 years

55 to 59 years

60 to 64 years

65 to 69 years

70 to 74 years

75 to 79 years

80 to 84 years

85 years and over

Variables / Data Columns

Age Group: This column displays the age group in consideration

Population: The population for the specific age group in the United States is shown in this column.

% of Total Population: This column displays the population of each age group as a proportion of United States total population. Please note that the sum of all percentages may not equal one due to rounding of values.

Good to know

Margin of Error

Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.

Custom data

If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.

Inspiration

Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.

Recommended for further research

This dataset is a part of the main dataset for United States Population by Age. You can refer the same here
NYC Open Data
kaggle.com
zip
Updated Mar 20, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
NYC Open Data (2019). NYC Open Data [Dataset]. https://www.kaggle.com/datasets/nycopendata/new-york
Explore at:
zip(0 bytes)Available download formats
Dataset updated
Mar 20, 2019
Dataset authored and provided by
NYC Open Data
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Context

NYC Open Data is an opportunity to engage New Yorkers in the information that is produced and used by City government. We believe that every New Yorker can benefit from Open Data, and Open Data can benefit from every New Yorker. Source: https://opendata.cityofnewyork.us/overview/

Content

Thanks to NYC Open Data, which makes public data generated by city agencies available for public use, and Citi Bike, we've incorporated over 150 GB of data in 5 open datasets into Google BigQuery Public Datasets, including:

Over 8 million 311 service requests from 2012-2016

More than 1 million motor vehicle collisions 2012-present

Citi Bike stations and 30 million Citi Bike trips 2013-present

Over 1 billion Yellow and Green Taxi rides from 2009-present

Over 500,000 sidewalk trees surveyed decennially in 1995, 2005, and 2015

This dataset is deprecated and not being updated.

Fork this kernel to get started with this dataset.

Acknowledgements

https://opendata.cityofnewyork.us/

https://cloud.google.com/blog/big-data/2017/01/new-york-city-public-datasets-now-available-on-google-bigquery

This dataset is publicly available for anyone to use under the following terms provided by the Dataset Source - https://data.cityofnewyork.us/ - and is provided "AS IS" without any warranty, express or implied, from Google. Google disclaims all liability for any damages, direct or indirect, resulting from the use of the dataset.

By accessing datasets and feeds available through NYC Open Data, the user agrees to all of the Terms of Use of NYC.gov as well as the Privacy Policy for NYC.gov. The user also agrees to any additional terms of use defined by the agencies, bureaus, and offices providing data. Public data sets made available on NYC Open Data are provided for informational purposes. The City does not warranty the completeness, accuracy, content, or fitness for any particular purpose or use of any public data set made available on NYC Open Data, nor are any such warranties to be implied or inferred with respect to the public data sets furnished therein.

The City is not liable for any deficiencies in the completeness, accuracy, content, or fitness for any particular purpose or use of any public data set, or application utilizing such data set, provided by any third party.

Banner Photo by @bicadmedia from Unplash.

Inspiration

On which New York City streets are you most likely to find a loud party?

Can you find the Virginia Pines in New York City?

Where was the only collision caused by an animal that injured a cyclist?

What’s the Citi Bike record for the Longest Distance in the Shortest Time (on a route with at least 100 rides)?

https://cloud.google.com/blog/big-data/2017/01/images/148467900588042/nyc-dataset-6.png" alt="enter image description here"> https://cloud.google.com/blog/big-data/2017/01/images/148467900588042/nyc-dataset-6.png
US Broadband Usage Across Counties
kaggle.com
Updated Jan 6, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The Devastator (2023). US Broadband Usage Across Counties [Dataset]. https://www.kaggle.com/datasets/thedevastator/us-broadband-usage-across-counties-and-zip-codes
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jan 6, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
The Devastator
Area covered
United States
Description
US Broadband Usage Across Counties

Utilizing Microsoft's Data to Estimate Access

By Amber Thomas [source]

About this dataset

This dataset provides an estimation of broadband usage in the United States, focusing on how many people have access to broadband and how many are actually using it at broadband speeds. Through data collected by Microsoft from our services, including package size and total time of download, we can estimate the throughput speed of devices connecting to the internet across zip codes and counties.

According to Federal Communications Commission (FCC) estimates, 14.5 million people don't have access to any kind of broadband connection. This data set aims to address this contrast between those with estimated availability but no actual use by providing more accurate usage numbers downscaled to county and zip code levels. Who gets counted as having access is vastly important -- it determines who gets included in public funding opportunities dedicated solely toward closing this digital divide gap. The implications can be huge: millions around this country could remain invisible if these number aren't accurately reported or used properly in decision-making processes.

This dataset includes aggregated information about these locations with less than 20 devices for increased accuracy when estimating Broadband Usage in the United States-- allowing others to use it for developing solutions that improve internet access or label problem areas accurately where no real or reliable connectivity exists among citizens within communities large and small throughout the US mainland.. Please review the license terms before using these data so that you may adhere appropriately with stipulations set forth under Microsoft's Open Use Of Data Agreement v1.0 agreement prior to utilizing this dataset for your needs-- both professional and educational endeavors alike!

More Datasets

For more datasets, click here.

Featured Notebooks

🚨 Your notebook can be here! 🚨!

How to use the dataset

How to Use the US Broadband Usage Dataset

This dataset provides broadband usage estimates in the United States by county and zip code. It is ideally suited for research into how broadband connects households, towns and cities. Understanding this information is vital for closing existing disparities in access to high-speed internet, and for devising strategies for making sure all Americans can stay connected in a digital world.

The dataset contains six columns: - County – The name of the county for which usage statistics are provided. - Zip Code (5-Digit) – The 5-digit zip code from which usage data was collected from within that county or metropolitan area/micro area/divisions within states as reported by the US Census Bureau in 2018[2].
- Population (Households) – Estimated number of households defined according to [3] based on data from the US Census Bureau American Community Survey's 5 Year Estimates[4].
- Average Throughput (Mbps)- Average Mbps download speed derived from a combination of data collected anonymous devices connected through Microsoft services such as Windows Update, Office 365, Xbox Live Core Services, etc.[5]
- Percent Fast (> 25 Mbps)- Percentage of machines with throughput greater than 25 Mbps calculated using [6]. 6) Percent Slow (< 3 Mbps)- Percentage of machines with throughput less than 3Mbps calculated using [7].

Research Ideas

Targeting marketing campaigns based on broadband use. Companies can use the geographic and demographic data in this dataset to create targeted advertising campaigns that are tailored to individuals living in areas where broadband access is scarce or lacking.

Creating an educational platform for those without reliable access to broadband internet. By leveraging existing technologies such as satellite internet, media streaming services like Netflix, and platforms such as Khan Academy or EdX, those with limited access could gain access to new educational options from home.

Establishing public-private partnerships between local governments and telecom providers need better data about gaps in service coverage and usage levels in order to make decisions about investments into new infrastructure buildouts for better connectivity options for rural communities

Acknowledgements

If you use this dataset in your research, please credit the original authors. Data Source

License

See the dataset description for more information.

Columns

File: broadband_data_2020October.csv

Acknowledgements

If you use this dataset in your research,...
T
United States Employment Rate
tradingeconomics.com
pt.tradingeconomics.com
+13more
csv, excel, json, xml
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
TRADING ECONOMICS, United States Employment Rate [Dataset]. https://tradingeconomics.com/united-states/employment-rate
Explore at:
excel, xml, json, csvAvailable download formats
Dataset authored and provided by
TRADING ECONOMICS
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
Jan 31, 1948 - Aug 31, 2025
Area covered
United States
Description
Employment Rate in the United States remained unchanged at 59.60 percent in August. This dataset provides - United States Employment Rate- actual values, historical data, forecast, chart, statistics, economic calendar and news.

Instagram accounts with the most followers worldwide 2024

statista.com
de.statista.com
+3more

Facebook

Twitter

Click to copy link

Link copied

Cite

Stacy Jo Dixon, Instagram accounts with the most followers worldwide 2024 [Dataset]. https://www.statista.com/topics/1164/social-networks/

Explore at:

Dataset provided by

Statistahttp://statista.com/

Authors

Stacy Jo Dixon

Description

Cristiano Ronaldo has one of the most popular Instagram accounts as of April 2024.

              The Portuguese footballer is the most-followed person on the photo sharing app platform with 628 million followers. Instagram's own account was ranked first with roughly 672 million followers.

              How popular is Instagram?

              Instagram is a photo-sharing social networking service that enables users to take pictures and edit them with filters. The platform allows users to post and share their images online and directly with their friends and followers on the social network. The cross-platform app reached one billion monthly active users in mid-2018. In 2020, there were over 114 million Instagram users in the United States and experts project this figure to surpass 127 million users in 2023.

              Who uses Instagram?

              Instagram audiences are predominantly young – recent data states that almost 60 percent of U.S. Instagram users are aged 34 years or younger. Fall 2020 data reveals that Instagram is also one of the most popular social media for teens and one of the social networks with the biggest reach among teens in the United States.

              Celebrity influencers on Instagram
              Many celebrities and athletes are brand spokespeople and generate additional income with social media advertising and sponsored content. Unsurprisingly, Ronaldo ranked first again, as the average media value of one of his Instagram posts was 985,441 U.S. dollars.

Survey of Consumer Finances
federalreserve.gov
Updated Oct 18, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Board of Governors of the Federal Reserve Board (2023). Survey of Consumer Finances [Dataset]. http://doi.org/10.17016/8799
Explore at:
Unique identifier
https://doi.org/10.17016/8799
Dataset updated
Oct 18, 2023
Dataset provided by
Federal Reserve Board of Governors
Federal Reserve Systemhttp://www.federalreserve.gov/
Authors
Board of Governors of the Federal Reserve Board
Time period covered
1962 - 2023
Description
The Survey of Consumer Finances (SCF) is normally a triennial cross-sectional survey of U.S. families. The survey data include information on families' balance sheets, pensions, income, and demographic characteristics.
High income tax filers in Canada
www150.statcan.gc.ca
open.canada.ca
Updated Oct 28, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Government of Canada, Statistics Canada (2024). High income tax filers in Canada [Dataset]. http://doi.org/10.25318/1110005501-eng
Explore at:
Unique identifier
https://doi.org/10.25318/1110005501-eng
Dataset updated
Oct 28, 2024
Dataset provided by
Statistics Canadahttps://statcan.gc.ca/en
Government of Canadahttp://www.gg.ca/
Area covered
Canada
Description
This table presents income shares, thresholds, tax shares, and total counts of individual Canadian tax filers, with a focus on high income individuals (95% income threshold, 99% threshold, etc.). Income thresholds are based on national threshold values, regardless of selected geography; for example, the number of Nova Scotians in the top 1% will be calculated as the number of taxfiling Nova Scotians whose total income exceeded the 99% national income threshold. Different definitions of income are available in the table namely market, total, and after-tax income, both with and without capital gains.
T
United States Personal Savings Rate
tradingeconomics.com
tr.tradingeconomics.com
+13more
csv, excel, json, xml
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
TRADING ECONOMICS, United States Personal Savings Rate [Dataset]. https://tradingeconomics.com/united-states/personal-savings
Explore at:
xml, excel, json, csvAvailable download formats
Dataset authored and provided by
TRADING ECONOMICS
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
Jan 31, 1959 - Aug 31, 2025
Area covered
United States
Description
Household Saving Rate in the United States decreased to 4.60 percent in August from 4.80 percent in July of 2025. This dataset provides - United States Personal Savings Rate - actual values, historical data, forecast, chart, statistics, economic calendar and news.
Reddit users in the United States 2019-2028
statista.com
Updated Jul 30, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista Research Department (2025). Reddit users in the United States 2019-2028 [Dataset]. https://www.statista.com/topics/3196/social-media-usage-in-the-united-states/
Explore at:
Dataset updated
Jul 30, 2025
Dataset provided by
Statistahttp://statista.com/
Authors
Statista Research Department
Area covered
United States
Description
The number of Reddit users in the United States was forecast to continuously increase between 2024 and 2028 by in total 10.3 million users (+5.21 percent). After the ninth consecutive increasing year, the Reddit user base is estimated to reach 208.12 million users and therefore a new peak in 2028. Notably, the number of Reddit users of was continuously increasing over the past years.User figures, shown here with regards to the platform reddit, have been estimated by taking into account company filings or press material, secondary research, app downloads and traffic data. They refer to the average monthly active users over the period and count multiple accounts by persons only once. Reddit users encompass both users that are logged in and those that are not.The shown data are an excerpt of Statista's Key Market Indicators (KMI). The KMI are a collection of primary and secondary indicators on the macro-economic, demographic and technological environment in up to 150 countries and regions worldwide. All indicators are sourced from international and national statistical offices, trade associations and the trade press and they are processed to generate comparable data sets (see supplementary notes under details for more information).Find more key insights for the number of Reddit users in countries like Mexico and Canada.
w
Immigration system statistics data tables
gov.uk
Updated Aug 21, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Home Office (2025). Immigration system statistics data tables [Dataset]. https://www.gov.uk/government/statistical-data-sets/immigration-system-statistics-data-tables
Explore at:
Dataset updated
Aug 21, 2025
Dataset provided by
GOV.UK
Authors
Home Office
Description
List of the data tables as part of the Immigration system statistics Home Office release. Summary and detailed data tables covering the immigration system, including out-of-country and in-country visas, asylum, detention, and returns.

If you have any feedback, please email MigrationStatsEnquiries@homeoffice.gov.uk.

Accessible file formats

The Microsoft Excel .xlsx files may not be suitable for users of assistive technology.
If you use assistive technology (such as a screen reader) and need a version of these documents in a more accessible format, please email MigrationStatsEnquiries@homeoffice.gov.uk
Please tell us what format you need. It will help us if you say what assistive technology you use.

Related content

Immigration system statistics, year ending June 2025
Immigration system statistics quarterly release
Immigration system statistics user guide
Publishing detailed data tables in migration statistics
Policy and legislative changes affecting migration to the UK: timeline
Immigration statistics data archives

Passenger arrivals

https://assets.publishing.service.gov.uk/media/689efececc5ef8b4c5fc448c/passenger-arrivals-summary-jun-2025-tables.ods">Passenger arrivals summary tables, year ending June 2025 (ODS, 31.3 KB)

‘Passengers refused entry at the border summary tables’ and ‘Passengers refused entry at the border detailed datasets’ have been discontinued. The latest published versions of these tables are from February 2025 and are available in the ‘Passenger refusals – release discontinued’ section. A similar data series, ‘Refused entry at port and subsequently departed’, is available within the Returns detailed and summary tables.

Electronic travel authorisation

https://assets.publishing.service.gov.uk/media/689efd8307f2cc15c93572d8/electronic-travel-authorisation-datasets-jun-2025.xlsx">Electronic travel authorisation detailed datasets, year ending June 2025 (MS Excel Spreadsheet, 57.1 KB)
ETA_D01: Applications for electronic travel authorisations, by nationality ETA_D02: Outcomes of applications for electronic travel authorisations, by nationality

Entry clearance visas granted outside the UK

https://assets.publishing.service.gov.uk/media/68b08043b430435c669c17a2/visas-summary-jun-2025-tables.ods">Entry clearance visas summary tables, year ending June 2025 (ODS, 56.1 KB)

https://assets.publishing.service.gov.uk/media/689efda51fedc616bb133a38/entry-clearance-visa-outcomes-datasets-jun-2025.xlsx">Entry clearance visa applications and outcomes detailed datasets, year ending June 2025 (MS Excel Spreadsheet, 29.6 MB)
Vis_D01: Entry clearance visa applications, by nationality and visa type
Vis_D02: Outcomes of entry clearance visa applications, by nationality, visa type, and outcome

Additional data relating to in country and overseas Visa applications can be fo
g
BEA, Foreign Direct Investment Position in the United States on a...
geocommons.com
Updated Apr 29, 2008
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
data (2008). BEA, Foreign Direct Investment Position in the United States on a Historical-Cost Basis, Global, 2005 [Dataset]. http://geocommons.com/search.html
Explore at:
Dataset updated
Apr 29, 2008
Dataset provided by
data
Description
This dataset graphically tracks Foreign Direct Investment in the United States. The dataset covers many types of investment, including manufacturing, trade, and financial aspects. This data covers 2005 figures, and shows which markets are heavily invested in by foreign nations. This data was collected from the Bureau of Economic Analysis : http://www.bea.gov/scb/pdf/2007/07%20July/0707_dip_article.pdf and credit is given to Marilyn Ibarra and Jennifer Koncz. The authors of : Direct Investment Positions for 2006 Country and Industry Detail The data was accessed on October 1, 2007. Statistics are quoted in the Millions.
d
Voter Registration by Census Tract
catalog.data.gov
data.kingcounty.gov
+1more
Updated Jun 29, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
data.kingcounty.gov (2025). Voter Registration by Census Tract [Dataset]. https://catalog.data.gov/dataset/voter-registration-by-census-tract
Explore at:
Dataset updated
Jun 29, 2025
Dataset provided by
data.kingcounty.gov
Description
This web map displays data from the voter registration database as the percent of registered voters by census tract in King County, Washington. The data for this web map is compiled from King County Elections voter registration data for the years 2013-2019. The total number of registered voters is based on the geo-location of the voter's registered address at the time of the general election for each year. The eligible voting population, age 18 and over, is based on the estimated population increase from the US Census Bureau and the Washington Office of Financial Management and was calculated as a projected 6 percent population increase for the years 2010-2013, 7 percent population increase for the years 2010-2014, 9 percent population increase for the years 2010-2015, 11 percent population increase for the years 2010-2016 & 2017, 14 percent population increase for the years 2010-2018 and 17 percent population increase for the years 2010-2019. The total population 18 and over in 2010 was 1,517,747 in King County, Washington. The percentage of registered voters represents the number of people who are registered to vote as compared to the eligible voting population, age 18 and over. The voter registration data by census tract was grouped into six percentage range estimates: 50% or below, 51-60%, 61-70%, 71-80%, 81-90% and 91% or above with an overall 84 percent registration rate. In the map the lighter colors represent a relatively low percentage range of voter registration and the darker colors represent a relatively high percentage range of voter registration. PDF maps of these data can be viewed at King County Elections downloadable voter registration maps. The 2019 General Election Voter Turnout layer is voter turnout data by historical precinct boundaries for the corresponding year. The data is grouped into six percentage ranges: 0-30%, 31-40%, 41-50% 51-60%, 61-70%, and 71-100%. The lighter colors represent lower turnout and the darker colors represent higher turnout. The King County Demographics Layer is census data for language, income, poverty, race and ethnicity at the census tract level and is based on the 2010-2014 American Community Survey 5 year Average provided by the United States Census Bureau. Since the data is based on a survey, they are considered to be estimates and should be used with that understanding. The demographic data sets were developed and are maintained by King County Staff to support the King County Equity and Social Justice program. Other data for this map is located in the King County GIS Spatial Data Catalog, where data is managed by the King County GIS Center, a multi-department enterprise GIS in King County, Washington. King County has nearly 1.3 million registered voters and is the largest jurisdiction in the United States to conduct all elections by mail. In the map you can view the percent of registered voters by census tract, compare registration within political districts, compare registration and demographic data, verify your voter registration or register to vote through a link to the VoteWA, Washington State Online Voter Registration web page.
Spotify Million Playlist: Recsys Challenge 2018 Dataset
zenodo.org
Updated Apr 9, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
AIcrowd; AIcrowd (2022). Spotify Million Playlist: Recsys Challenge 2018 Dataset [Dataset]. http://doi.org/10.5281/zenodo.6425593
Explore at:
Unique identifier
https://doi.org/10.5281/zenodo.6425593
Dataset updated
Apr 9, 2022
Dataset provided by
Zenodohttp://zenodo.org/
Authors
AIcrowd; AIcrowd
Description
Spotify Million Playlist Dataset Challenge

Summary

The Spotify Million Playlist Dataset Challenge consists of a dataset and evaluation to enable research in music recommendations. It is a continuation of the RecSys Challenge 2018, which ran from January to July 2018. The dataset contains 1,000,000 playlists, including playlist titles and track titles, created by users on the Spotify platform between January 2010 and October 2017. The evaluation task is automatic playlist continuation: given a seed playlist title and/or initial set of tracks in a playlist, to predict the subsequent tracks in that playlist. This is an open-ended challenge intended to encourage research in music recommendations, and no prizes will be awarded (other than bragging rights).

Background

Playlists like Today’s Top Hits and RapCaviar have millions of loyal followers, while Discover Weekly and Daily Mix are just a couple of our personalized playlists made especially to match your unique musical tastes.

Our users love playlists too. In fact, the Digital Music Alliance, in their 2018 Annual Music Report, state that 54% of consumers say that playlists are replacing albums in their listening habits.

But our users don’t love just listening to playlists, they also love creating them. To date, over 4 billion playlists have been created and shared by Spotify users. People create playlists for all sorts of reasons: some playlists group together music categorically (e.g., by genre, artist, year, or city), by mood, theme, or occasion (e.g., romantic, sad, holiday), or for a particular purpose (e.g., focus, workout). Some playlists are even made to land a dream job, or to send a message to someone special.

The other thing we love here at Spotify is playlist research. By learning from the playlists that people create, we can learn all sorts of things about the deep relationship between people and music. Why do certain songs go together? What is the difference between “Beach Vibes” and “Forest Vibes”? And what words do people use to describe which playlists?

By learning more about nature of playlists, we may also be able to suggest other tracks that a listener would enjoy in the context of a given playlist. This can make playlist creation easier, and ultimately help people find more of the music they love.

Dataset

To enable this type of research at scale, in 2018 we sponsored the RecSys Challenge 2018, which introduced the Million Playlist Dataset (MPD) to the research community. Sampled from the over 4 billion public playlists on Spotify, this dataset of 1 million playlists consist of over 2 million unique tracks by nearly 300,000 artists, and represents the largest public dataset of music playlists in the world. The dataset includes public playlists created by US Spotify users between January 2010 and November 2017. The challenge ran from January to July 2018, and received 1,467 submissions from 410 teams. A summary of the challenge and the top scoring submissions was published in the ACM Transactions on Intelligent Systems and Technology.

In September 2020, we re-released the dataset as an open-ended challenge on AIcrowd.com. The dataset can now be downloaded by registered participants from the Resources page.

Each playlist in the MPD contains a playlist title, the track list (including track IDs and metadata), and other metadata fields (last edit time, number of playlist edits, and more). All data is anonymized to protect user privacy. Playlists are sampled with some randomization, are manually filtered for playlist quality and to remove offensive content, and have some dithering and fictitious tracks added to them. As such, the dataset is not representative of the true distribution of playlists on the Spotify platform, and must not be interpreted as such in any research or analysis performed on the dataset.

Dataset Contains

1000 examples of each scenario:

Title only (no tracks) Title and first track Title and first 5 tracks First 5 tracks only Title and first 10 tracks First 10 tracks only Title and first 25 tracks Title and 25 random tracks Title and first 100 tracks Title and 100 random tracks

Download Link

Full Details: https://www.aicrowd.com/challenges/spotify-million-playlist-dataset-challenge
Download Link: https://www.aicrowd.com/challenges/spotify-million-playlist-dataset-challenge/dataset_files
g
UNEP, Total External Debt by Country, World, 2002-2004
geocommons.com
Updated Apr 29, 2008
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
data (2008). UNEP, Total External Debt by Country, World, 2002-2004 [Dataset]. http://geocommons.com/search.html
Explore at:
Dataset updated
Apr 29, 2008
Dataset provided by
data
UNEP
Description
Total external debt is debt owed to non residents repayable in foreign currency, goods, or services. Total external debt is the sum of public, publicly guaranteed, and private non-guaranteed long-term debt, use of IMF credit, and short-term debt. Short-term debt includes all debt having an original maturity of one year or less and interest in arrears on long-term debt. Data are in million current U.S. dollars. This Data set uses 0 = no value, however the original data source uses -9999 as its original value. Data was found online at http://geodata.grid.unep.ch
T
United States Corporate Profits
tradingeconomics.com
jp.tradingeconomics.com
+13more
csv, excel, json, xml
Updated Sep 25, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
TRADING ECONOMICS (2025). United States Corporate Profits [Dataset]. https://tradingeconomics.com/united-states/corporate-profits
Explore at:
excel, xml, json, csvAvailable download formats
Dataset updated
Sep 25, 2025
Dataset authored and provided by
TRADING ECONOMICS
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
Mar 31, 1947 - Jun 30, 2025
Area covered
United States
Description
Corporate Profits in the United States increased to 3259.41 USD Billion in the second quarter of 2025 from 3252.44 USD Billion in the first quarter of 2025. This dataset provides the latest reported value for - United States Corporate Profits - plus previous releases, historical high and low, short-term forecast and long-term prediction, economic calendar, survey consensus and news.
World cities database
kaggle.com
Updated May 25, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Juanma Hernández (2025). World cities database [Dataset]. http://doi.org/10.34740/kaggle/dsv/11944536
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.34740/kaggle/dsv/11944536
Dataset updated
May 25, 2025
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Juanma Hernández
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The data is from:

https://simplemaps.com/data/world-cities

We're proud to offer a simple, accurate and up-to-date database of the world's cities and towns. We've built it from the ground up using authoritative sources such as the NGIA, US Geological Survey, US Census Bureau, and NASA.

Our database is:

Up-to-date: It was last refreshed on May 11, 2025.

Comprehensive: Over 4 million unique cities and towns from every country in the world (about 48 thousand in basic database).

Accurate: Cleaned and aggregated from official sources. Includes latitude and longitude coordinates.

Simple: A single CSV file, concise field names, only one entry per city.
O
COVID-19 Cases and Deaths by Race/Ethnicity - ARCHIVE
data.ct.gov
catalog.data.gov
application/rdfxml +5
Updated Jun 24, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Department of Public Health (2022). COVID-19 Cases and Deaths by Race/Ethnicity - ARCHIVE [Dataset]. https://data.ct.gov/Health-and-Human-Services/COVID-19-Cases-and-Deaths-by-Race-Ethnicity-ARCHIV/7rne-efic
Explore at:
xml, tsv, csv, application/rdfxml, json, application/rssxmlAvailable download formats
Dataset updated
Jun 24, 2022
Dataset authored and provided by
Department of Public Health
License
U.S. Government Workshttps://www.usa.gov/government-works
License information was derived automatically
Description
Note: DPH is updating and streamlining the COVID-19 cases, deaths, and testing data. As of 6/27/2022, the data will be published in four tables instead of twelve.

The COVID-19 Cases, Deaths, and Tests by Day dataset contains cases and test data by date of sample submission. The death data are by date of death. This dataset is updated daily and contains information back to the beginning of the pandemic. The data can be found at https://data.ct.gov/Health-and-Human-Services/COVID-19-Cases-Deaths-and-Tests-by-Day/g9vi-2ahj.

The COVID-19 State Metrics dataset contains over 93 columns of data. This dataset is updated daily and currently contains information starting June 21, 2022 to the present. The data can be found at https://data.ct.gov/Health-and-Human-Services/COVID-19-State-Level-Data/qmgw-5kp6 .

The COVID-19 County Metrics dataset contains 25 columns of data. This dataset is updated daily and currently contains information starting June 16, 2022 to the present. The data can be found at https://data.ct.gov/Health-and-Human-Services/COVID-19-County-Level-Data/ujiq-dy22 .

The COVID-19 Town Metrics dataset contains 16 columns of data. This dataset is updated daily and currently contains information starting June 16, 2022 to the present. The data can be found at https://data.ct.gov/Health-and-Human-Services/COVID-19-Town-Level-Data/icxw-cada . To protect confidentiality, if a town has fewer than 5 cases or positive NAAT tests over the past 7 days, those data will be suppressed.

COVID-19 cases and associated deaths that have been reported among Connecticut residents, broken down by race and ethnicity. All data in this report are preliminary; data for previous dates will be updated as new reports are received and data errors are corrected. Deaths reported to the either the Office of the Chief Medical Examiner (OCME) or Department of Public Health (DPH) are included in the COVID-19 update.

The following data show the number of COVID-19 cases and associated deaths per 100,000 population by race and ethnicity. Crude rates represent the total cases or deaths per 100,000 people. Age-adjusted rates consider the age of the person at diagnosis or death when estimating the rate and use a standardized population to provide a fair comparison between population groups with different age distributions. Age-adjustment is important in Connecticut as the median age of among the non-Hispanic white population is 47 years, whereas it is 34 years among non-Hispanic blacks, and 29 years among Hispanics. Because most non-Hispanic white residents who died were over 75 years of age, the age-adjusted rates are lower than the unadjusted rates. In contrast, Hispanic residents who died tend to be younger than 75 years of age which results in higher age-adjusted rates.

The population data used to calculate rates is based on the CT DPH population statistics for 2019, which is available online here: https://portal.ct.gov/DPH/Health-Information-Systems--Reporting/Population/Population-Statistics. Prior to 5/10/2021, the population estimates from 2018 were used.

Rates are standardized to the 2000 US Millions Standard population (data available here: https://seer.cancer.gov/stdpopulations/). Standardization was done using 19 age groups (0, 1-4, 5-9, 10-14, ..., 80-84, 85 years and older). More information about direct standardization for age adjustment is available here: https://www.cdc.gov/nchs/data/statnt/statnt06rv.pdf

Categories are mutually exclusive. The category “multiracial” includes people who answered ‘yes’ to more than one race category. Counts may not add up to total case counts as data on race and ethnicity may be missing. Age adjusted rates calculated only for groups with more than 20 deaths. Abbreviation: NH=Non-Hispanic.

Data on Connecticut deaths were obtained from the Connecticut Deaths Registry maintained by the DPH Office of Vital Records. Cause of death was determined by a death certifier (e.g., physician, APRN, medical examiner) using their best clinical judgment. Additionally, all COVID-19 deaths, including suspected or related, are required to be reported to OCME. On April 4, 2020, CT DPH and OCME released a joint memo to providers and facilities within Connecticut providing guidelines for certifying deaths due to COVID-19 that were consistent with the CDC’s guidelines and a reminder of the required reporting to OCME.25,26 As of July 1, 2021, OCME had reviewed every case reported and performed additional investigation on about one-third of reported deaths to better ascertain if COVID-19 did or did not cause or contribute to the death. Some of these investigations resulted in the OCME performing postmortem swabs for PCR testing on individuals whose deaths were suspected to be due to COVID-19, but antemortem diagnosis was unable to be made.31 The OCME issued or re-issued about 10% of COVID-19 death certificates and, when appropriate, removed COVID-19 from the death certificate. For standardization and tabulation of mortality statistics, written cause of death statements made by the certifiers on death certificates are sent to the National Center for Health Statistics (NCHS) at the CDC which assigns cause of death codes according to the International Causes of Disease 10th Revision (ICD-10) classification system.25,26 COVID-19 deaths in this report are defined as those for which the death certificate has an ICD-10 code of U07.1 as either a primary (underlying) or a contributing cause of death. More information on COVID-19 mortality can be found at the following link: https://portal.ct.gov/DPH/Health-Information-Systems--Reporting/Mortality/Mortality-Statistics

Data are subject to future revision as reporting changes.

Starting in July 2020, this dataset will be updated every weekday.

Additional notes: A delay in the data pull schedule occurred on 06/23/2020. Data from 06/22/2020 was processed on 06/23/2020 at 3:30 PM. The normal data cycle resumed with the data for 06/23/2020.

A network outage on 05/19/2020 resulted in a change in the data pull schedule. Data from 5/19/2020 was processed on 05/20/2020 at 12:00 PM. Data from 5/20/2020 was processed on 5/20/2020 8:30 PM. The normal data cycle resumed on 05/20/2020 with the 8:30 PM data pull. As a result of the network outage, the timestamp on the datasets on the Open Data Portal differ from the timestamp in DPH's daily PDF reports.

Starting 5/10/2021, the date field will represent the date this data was updated on data.ct.gov. Previously the date the data was pulled by DPH was listed, which typically coincided with the date before the data was published on data.ct.gov. This change was made to standardize the COVID-19 data sets on data.ct.gov.
American Community Survey (ACS)
console.cloud.google.com
Updated Apr 19, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
https://console.cloud.google.com/marketplace/browse?filter=partner:United%20States%20Census%20Bureau&hl=de (2022). American Community Survey (ACS) [Dataset]. https://console.cloud.google.com/marketplace/product/united-states-census-bureau/acs?hl=de
Explore at:
Dataset updated
Apr 19, 2022
Dataset provided by
Googlehttp://google.com/
Description
The American Community Survey (ACS) is an ongoing survey that provides vital information on a yearly basis about our nation and its people by contacting over 3.5 million households across the country. The resulting data provides incredibly detailed demographic information across the US aggregated at various geographic levels which helps determine how more than $675 billion in federal and state funding are distributed each year. Businesses use ACS data to inform strategic decision-making. ACS data can be used as a component of market research, provide information about concentrations of potential employees with a specific education or occupation, and which communities could be good places to build offices or facilities. For example, someone scouting a new location for an assisted-living center might look for an area with a large proportion of seniors and a large proportion of people employed in nursing occupations. Through the ACS, we know more about jobs and occupations, educational attainment, veterans, whether people own or rent their homes, and other topics. Public officials, planners, and entrepreneurs use this information to assess the past and plan the future. For more information, see the Census Bureau's ACS Information Guide . This public dataset is hosted in Google BigQuery as part of the Google Cloud Public Datasets Program , with Carto providing cleaning and onboarding support. It is included in BigQuery's 1TB/mo of free tier processing. This means that each user receives 1TB of free BigQuery processing every month, which can be used to run queries on this public dataset. Watch this short video to learn how to get started quickly using BigQuery to access public datasets. What is BigQuery .
g
City-Data, Largest and Smallest Difference Between High and Low...
geocommons.com
Updated May 27, 2008
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
data (2008). City-Data, Largest and Smallest Difference Between High and Low Temperatures, USA, [Dataset]. http://geocommons.com/search.html
Explore at:
Dataset updated
May 27, 2008
Dataset provided by
City-Data
data
Description
This dataset illustrates the largest difference between high and low temperatures and the smallest difference between high and low temperatures in cities with 50,000 people or more. A value of -1 means that the data was not applicable. Also included are the rankings, the inverse ranking to be used for mapping purposes, the popualtion, the name of city and state, and the temperature degree difference. Source City-Data URL http//www.city-data.com/top2/c489.html http//www.city-data.com/top2/c490.html Date Accessed November 13,2007

Facebook

Twitter

Click to copy link

Link copied

Cite

Abdul Majid (2022). Yelp Dataset - Contains 1 million rows [Dataset]. https://www.kaggle.com/datasets/abdulmajid115/yelp-dataset-contains-1-million-rows

Yelp Dataset - Contains 1 million rows

Find top rated and reviewed services from 1 million rows of dataset

Explore at:

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Dataset updated

Jan 29, 2022

Dataset provided by

Kagglehttp://kaggle.com/

Authors

Abdul Majid

Description

Context

The data has been acquired from yelp website.

Content

The data can help people find companies/organizations with respect to ratings and reviews. This can help people to choose or recommend best services out there.

Clear search

Close search

Google apps

Main menu

Yelp Dataset - Contains 1 million rows

Context

Content

United States Age Group Population Dataset: A Complete Breakdown of United...

About this dataset

Content

Inspiration

Recommended for further research

NYC Open Data

Context

Content

Acknowledgements

Inspiration

US Broadband Usage Across Counties

US Broadband Usage Across Counties

Utilizing Microsoft's Data to Estimate Access

About this dataset

More Datasets

Featured Notebooks

How to use the dataset

How to Use the US Broadband Usage Dataset

Research Ideas

Acknowledgements

License

Columns

Acknowledgements

United States Employment Rate

Instagram accounts with the most followers worldwide 2024

Survey of Consumer Finances

High income tax filers in Canada

United States Personal Savings Rate

Reddit users in the United States 2019-2028

Immigration system statistics data tables

Accessible file formats

Related content

Passenger arrivals

Electronic travel authorisation

Entry clearance visas granted outside the UK

BEA, Foreign Direct Investment Position in the United States on a...

Voter Registration by Census Tract

Spotify Million Playlist: Recsys Challenge 2018 Dataset

UNEP, Total External Debt by Country, World, 2002-2004

United States Corporate Profits

World cities database

COVID-19 Cases and Deaths by Race/Ethnicity - ARCHIVE

American Community Survey (ACS)

City-Data, Largest and Smallest Difference Between High and Low...

Yelp Dataset - Contains 1 million rows

Find top rated and reviewed services from 1 million rows of dataset

Context

Content