78 datasets found

Top Visited Websites
kaggle.com
Updated Nov 19, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The Devastator (2022). Top Visited Websites [Dataset]. https://www.kaggle.com/datasets/thedevastator/the-top-websites-in-the-world/discussion
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Nov 19, 2022
Dataset provided by
Kagglehttp://kaggle.com/
Authors
The Devastator
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
The Top Websites in the World

How They Change Over Time

About this dataset

This dataset consists of the top 50 most visited websites in the world, as well as the category and principal country/territory for each site. The data provides insights into which sites are most popular globally, and what type of content is most popular in different parts of the world

How to use the dataset

This dataset can be used to track the most popular websites in the world over time. It can also be used to compare website popularity between different countries and categories

Research Ideas

To track the most popular websites in the world over time

To see how website popularity changes by region

To find out which website categories are most popular

Acknowledgements

Dataset by Alexa Internet, Inc. (2019), released on Kaggle under the Open Data Commons Public Domain Dedication and License (ODC-PDDL)

License

License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.

Columns

File: df_1.csv | Column name | Description | |:--------------------------------|:---------------------------------------------------------------------| | Site | The name of the website. (String) | | Domain Name | The domain name of the website. (String) | | Category | The category of the website. (String) | | Principal country/territory | The principal country/territory where the website is based. (String) |
A
‘Popular Website Traffic Over Time ’ analyzed by Analyst-2
analyst-2.ai
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com), ‘Popular Website Traffic Over Time ’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-popular-website-traffic-over-time-62e4/latest
Explore at:
Dataset authored and provided by
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Analysis of ‘Popular Website Traffic Over Time ’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/yamqwe/popular-website-traffice on 13 February 2022.

--- Dataset description provided by original source is as follows ---

About this dataset

Background

Have you every been in a conversation and the question comes up, who uses Bing? This question comes up occasionally because people wonder if these sites have any views. For this research study, we are going to be exploring popular website traffic for many popular websites.

Methodology

The data collected originates from SimilarWeb.com.

Source

For the analysis and study, go to The Concept Center

This dataset was created by Chase Willden and contains around 0 samples along with 1/1/2017, Social Media, technical information and other features such as: - 12/1/2016 - 3/1/2017 - and more.

How to use this dataset

Analyze 11/1/2016 in relation to 2/1/2017

Study the influence of 4/1/2017 on 1/1/2017

More datasets

Acknowledgements

If you use this dataset in your research, please credit Chase Willden

Start A New Notebook!

--- Original source retains full ownership of the source dataset ---
Data from: UNESCO World Heritage Sites Dataset
kaggle.com
Updated Dec 19, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The Devastator (2023). UNESCO World Heritage Sites Dataset [Dataset]. https://www.kaggle.com/datasets/thedevastator/unesco-world-heritage-sites-dataset/suggestions
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Dec 19, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
The Devastator
Area covered
World
Description
UNESCO World Heritage Sites Dataset

UNESCO World Heritage Sites Dataset

By Throwback Thursday [source]

About this dataset

How to use the dataset

Here are some tips on how to make the most out of this dataset:

Data Exploration:

Begin by understanding the structure and contents of the dataset. Evaluate the number of rows (sites) and columns (attributes) available.

Check for missing values or inconsistencies in data entry that may impact your analysis.

Assess column descriptions to understand what information is included in each attribute.

Geographical Analysis:

Leverage geographical features such as latitude and longitude coordinates provided in this dataset.

Plot these sites on a map using any mapping software or library like Google Maps or Folium for Python. Visualizing their distribution can provide insights into patterns based on location, climate, or cultural factors.

Analyzing Attributes:

Familiarize yourself with different attributes available for analysis. Possible attributes include Name, Description, Category, Region, Country, etc.

Understand each attribute's format and content type (categorical, numerical) for better utilization during data analysis.

Exploring Categories & Regions:

Look at unique categories mentioned in the Category column (e.g., Cultural Site, Natural Site) to explore specific interests. This could help identify clusters within particular heritage types across countries/regions worldwide.

Analyze regions with high concentrations of heritage sites using data visualizations like bar plots or word clouds based on frequency counts.

Identify Trends & Patterns:

Discover recurring themes across various sites by analyzing descriptive text attributes such as names and descriptions.

Identify patterns and correlations between attributes by performing statistical analysis or utilizing machine learning techniques.

Comparison:

Compare different attributes to gain a deeper understanding of the sites.

For example, analyze the number of heritage sites per country/region or compare the distribution between cultural and natural heritage sites.

Additional Data Sources:

Use this dataset as a foundation to combine it with other datasets for in-depth analysis. There are several sources available that provide additional data on UNESCO World Heritage Sites, such as travel blogs, official tourism websites, or academic research databases.

Remember to cite this dataset appropriately if you use it in

Research Ideas

Travel Planning: This dataset can be used to identify and plan visits to UNESCO World Heritage sites around the world. It provides information about the location, category, and date of inscription for each site, allowing users to prioritize their travel destinations based on personal interests or preferences.

Cultural Preservation: Researchers or organizations interested in cultural preservation can use this dataset to analyze trends in UNESCO World Heritage site listings over time. By studying factors such as geographical distribution, types of sites listed, and inscription dates, they can gain insights into patterns of cultural heritage recognition and protection.

Statistical Analysis: The dataset can be used for statistical analysis to explore various aspects related to UNESCO World Heritage sites. For example, it could be used to examine the correlation between a country's economic indicators (such as GDP per capita) and the number or type of World Heritage sites it possesses. This analysis could provide insights into the relationship between economic development and cultural preservation efforts at a global scale

Acknowledgements

If you use this dataset in your research, please credit the original authors. Data Source

License

See the dataset description for more information.

Columns

Acknowledgements

If you use this dataset in your research, please credit the original authors. If you use this dataset in your research, please credit Throwback Thursday.
A
‘Population by Country - 2020’ analyzed by Analyst-2
analyst-2.ai
Updated Feb 13, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2020). ‘Population by Country - 2020’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-population-by-country-2020-c8b7/latest
Explore at:
Dataset updated
Feb 13, 2020
Dataset authored and provided by
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Analysis of ‘Population by Country - 2020’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/tanuprabhu/population-by-country-2020 on 28 January 2022.

--- Dataset description provided by original source is as follows ---

Context

I always wanted to access a data set that was related to the world’s population (Country wise). But I could not find a properly documented data set. Rather, I just created one manually.

Content

Now I knew I wanted to create a dataset but I did not know how to do so. So, I started to search for the content (Population of countries) on the internet. Obviously, Wikipedia was my first search. But I don't know why the results were not acceptable. And also there were only I think 190 or more countries. So then I surfed the internet for quite some time until then I stumbled upon a great website. I think you probably have heard about this. The name of the website is Worldometer. This is exactly the website I was looking for. This website had more details than Wikipedia. Also, this website had more rows I mean more countries with their population.

Once I got the data, now my next hard task was to download it. Of course, I could not get the raw form of data. I did not mail them regarding the data. Now I learned a new skill which is very important for a data scientist. I read somewhere that to obtain the data from websites you need to use this technique. Any guesses, keep reading you will come to know in the next paragraph.

https://fiverr-res.cloudinary.com/images/t_main1,q_auto,f_auto/gigs/119580480/original/68088c5f588ec32a6b3a3a67ec0d1b5a8a70648d/do-web-scraping-and-data-mining-with-python.png" alt="alt text">

You are right its, Web Scraping. Now I learned this so that I could convert the data into a CSV format. Now I will give you the scraper code that I wrote and also I somehow found a way to directly convert the pandas data frame to a CSV(Comma-separated fo format) and store it on my computer. Now just go through my code and you will know what I'm talking about.

Below is the code that I used to scrape the code from the website

https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F3200273%2Fe814c2739b99d221de328c72a0b2571e%2FCapture.PNG?generation=1581314967227445&alt=media" alt="">

Acknowledgements

Now I couldn't have got the data without Worldometer. So special thanks to the website. It is because of them I was able to get the data.

Inspiration

As far as I know, I don't have any questions to ask. You guys can let me know by finding your ways to use the data and let me know via kernel if you find something interesting

--- Original source retains full ownership of the source dataset ---
Number of internet users worldwide 2014-2029
statista.com
Updated Apr 11, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista Research Department (2025). Number of internet users worldwide 2014-2029 [Dataset]. https://www.statista.com/topics/1145/internet-usage-worldwide/
Explore at:
Dataset updated
Apr 11, 2025
Dataset provided by
Statistahttp://statista.com/
Authors
Statista Research Department
Area covered
World
Description
The global number of internet users in was forecast to continuously increase between 2024 and 2029 by in total 1.3 billion users (+23.66 percent). After the fifteenth consecutive increasing year, the number of users is estimated to reach 7 billion users and therefore a new peak in 2029. Notably, the number of internet users of was continuously increasing over the past years.Depicted is the estimated number of individuals in the country or region at hand, that use the internet. As the datasource clarifies, connection quality and usage frequency are distinct aspects, not taken into account here.The shown data are an excerpt of Statista's Key Market Indicators (KMI). The KMI are a collection of primary and secondary indicators on the macro-economic, demographic and technological environment in up to 150 countries and regions worldwide. All indicators are sourced from international and national statistical offices, trade associations and the trade press and they are processed to generate comparable data sets (see supplementary notes under details for more information).Find more key insights for the number of internet users in countries like the Americas and Asia.
d
US Restaurant POI dataset with metadata
datarade.ai
.csv
Updated Jul 30, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Geolytica (2022). US Restaurant POI dataset with metadata [Dataset]. https://datarade.ai/data-products/us-restaurant-poi-dataset-with-metadata-geolytica
Explore at:
.csvAvailable download formats
Dataset updated
Jul 30, 2022
Dataset authored and provided by
Geolytica
Area covered
United States of America
Description
Point of Interest (POI) is defined as an entity (such as a business) at a ground location (point) which may be (of interest). We provide high-quality POI data that is fresh, consistent, customizable, easy to use and with high-density coverage for all countries of the world.

This is our process flow:

Our machine learning systems continuously crawl for new POI data Our geoparsing and geocoding calculates their geo locations Our categorization systems cleanup and standardize the datasets Our data pipeline API publishes the datasets on our data store

A new POI comes into existence. It could be a bar, a stadium, a museum, a restaurant, a cinema, or store, etc.. In today's interconnected world its information will appear very quickly in social media, pictures, websites, press releases. Soon after that, our systems will pick it up.

POI Data is in constant flux. Every minute worldwide over 200 businesses will move, over 600 new businesses will open their doors and over 400 businesses will cease to exist. And over 94% of all businesses have a public online presence of some kind tracking such changes. When a business changes, their website and social media presence will change too. We'll then extract and merge the new information, thus creating the most accurate and up-to-date business information dataset across the globe.

We offer our customers perpetual data licenses for any dataset representing this ever changing information, downloaded at any given point in time. This makes our company's licensing model unique in the current Data as a Service - DaaS Industry. Our customers don't have to delete our data after the expiration of a certain "Term", regardless of whether the data was purchased as a one time snapshot, or via our data update pipeline.

Customers requiring regularly updated datasets may subscribe to our Annual subscription plans. Our data is continuously being refreshed, therefore subscription plans are recommended for those who need the most up to date data. The main differentiators between us vs the competition are our flexible licensing terms and our data freshness.

Data samples may be downloaded at https://store.poidata.xyz/us

WikiReddit: Tracing Information and Attention Flows Between Online Platforms...

zenodo.org

bin

Updated May 4, 2025

Facebook

Twitter

Click to copy link

Link copied

Cite

Patrick Gildersleve; Patrick Gildersleve; Anna Beers; Anna Beers; Viviane Ito; Viviane Ito; Agustin Orozco; Agustin Orozco; Francesca Tripodi; Francesca Tripodi (2025). WikiReddit: Tracing Information and Attention Flows Between Online Platforms [Dataset]. http://doi.org/10.5281/zenodo.14653265

Explore at:

binAvailable download formats

Unique identifier

https://doi.org/10.5281/zenodo.14653265

Dataset updated

May 4, 2025

Dataset provided by

Zenodohttp://zenodo.org/

Authors

Patrick Gildersleve; Patrick Gildersleve; Anna Beers; Anna Beers; Viviane Ito; Viviane Ito; Agustin Orozco; Agustin Orozco; Francesca Tripodi; Francesca Tripodi

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Time period covered

Jan 15, 2025

Description

Preprint

Gildersleve, P., Beers, A., Ito, V., Orozco, A., & Tripodi, F. (2025). WikiReddit: Tracing Information and Attention Flows Between Online Platforms. arXiv [Cs.CY]. https://doi.org/10.48550/arXiv.2502.04942

Accepted at the International AAAI Conference on Web and Social Media (ICWSM) 2025

Abstract

The World Wide Web is a complex interconnected digital ecosystem, where information and attention flow between platforms and communities throughout the globe. These interactions co-construct how we understand the world, reflecting and shaping public discourse. Unfortunately, researchers often struggle to understand how information circulates and evolves across the web because platform-specific data is often siloed and restricted by linguistic barriers. To address this gap, we present a comprehensive, multilingual dataset capturing all Wikipedia links shared in posts and comments on Reddit from 2020 to 2023, excluding those from private and NSFW subreddits. Each linked Wikipedia article is enriched with revision history, page view data, article ID, redirects, and Wikidata identifiers. Through a research agreement with Reddit, our dataset ensures user privacy while providing a query and ID mechanism that integrates with the Reddit and Wikipedia APIs. This enables extended analyses for researchers studying how information flows across platforms. For example, Reddit discussions use Wikipedia for deliberation and fact-checking which subsequently influences Wikipedia content, by driving traffic to articles or inspiring edits. By analyzing the relationship between information shared and discussed on these platforms, our dataset provides a foundation for examining the interplay between social media discourse and collaborative knowledge consumption and production.

Datasheet

Motivation

The motivations for this dataset stem from the challenges researchers face in studying the flow of information across the web. While the World Wide Web enables global communication and collaboration, data silos, linguistic barriers, and platform-specific restrictions hinder our ability to understand how information circulates, evolves, and impacts public discourse. Wikipedia and Reddit, as major hubs of knowledge sharing and discussion, offer an invaluable lens into these processes. However, without comprehensive data capturing their interactions, researchers are unable to fully examine how platforms co-construct knowledge. This dataset bridges this gap, providing the tools needed to study the interconnectedness of social media and collaborative knowledge systems.

Composition

WikiReddit, a comprehensive dataset capturing all Wikipedia mentions (including links) shared in posts and comments on Reddit from 2020 to 2023, excluding those from private and NSFW (not safe for work) subreddits. The SQL database comprises 336K total posts, 10.2M comments, 1.95M unique links, and 1.26M unique articles spanning 59 languages on Reddit and 276 Wikipedia language subdomains. Each linked Wikipedia article is enriched with its revision history and page view data within a ±10-day window of its posting, as well as article ID, redirects, and Wikidata identifiers. Supplementary anonymous metadata from Reddit posts and comments further contextualizes the links, offering a robust resource for analysing cross-platform information flows, collective attention dynamics, and the role of Wikipedia in online discourse.

Collection Process

Data was collected from the Reddit4Researchers and Wikipedia APIs. No personally identifiable information is published in the dataset. Data from Reddit to Wikipedia is linked via the hyperlink and article titles appearing in Reddit posts.

Preprocessing/cleaning/labeling

Extensive processing with tools such as regex was applied to the Reddit post/comment text to extract the Wikipedia URLs. Redirects for Wikipedia URLs and article titles were found through the API and mapped to the collected data. Reddit IDs are hashed with SHA-256 for post/comment/user/subreddit anonymity.

Uses

We foresee several applications of this dataset and preview four here. First, Reddit linking data can be used to understand how attention is driven from one platform to another. Second, Reddit linking data can shed light on how Wikipedia's archive of knowledge is used in the larger social web. Third, our dataset could provide insights into how external attention is topically distributed across Wikipedia. Our dataset can help extend that analysis into the disparities in what types of external communities Wikipedia is used in, and how it is used. Fourth, relatedly, a topic analysis of our dataset could reveal how Wikipedia usage on Reddit contributes to societal benefits and harms. Our dataset could help examine if homogeneity within the Reddit and Wikipedia audiences shapes topic patterns and assess whether these relationships mitigate or amplify problematic engagement online.

Distribution

The dataset is publicly shared with a Creative Commons Attribution 4.0 International license. The article describing this dataset should be cited: https://doi.org/10.48550/arXiv.2502.04942

Maintenance

Patrick Gildersleve will maintain this dataset, and add further years of content as and when available.

SQL Database Schema

Table: `posts`

Column Name	Type	Description
`subreddit_id`	TEXT	The unique identifier for the subreddit.
`crosspost_parent_id`	TEXT	The ID of the original Reddit post if this post is a crosspost.
`post_id`	TEXT	Unique identifier for the Reddit post.
`created_at`	TIMESTAMP	The timestamp when the post was created.
`updated_at`	TIMESTAMP	The timestamp when the post was last updated.
`language_code`	TEXT	The language code of the post.
`score`	INTEGER	The score (upvotes minus downvotes) of the post.
`upvote_ratio`	REAL	The ratio of upvotes to total votes.
`gildings`	INTEGER	Number of awards (gildings) received by the post.
`num_comments`	INTEGER	Number of comments on the post.

Table: `comments`

Column Name	Type	Description
`subreddit_id`	TEXT	The unique identifier for the subreddit.
`post_id`	TEXT	The ID of the Reddit post the comment belongs to.
`parent_id`	TEXT	The ID of the parent comment (if a reply).
`comment_id`	TEXT	Unique identifier for the comment.
`created_at`	TIMESTAMP	The timestamp when the comment was created.
`last_modified_at`	TIMESTAMP	The timestamp when the comment was last modified.
`score`	INTEGER	The score (upvotes minus downvotes) of the comment.
`upvote_ratio`	REAL	The ratio of upvotes to total votes for the comment.
`gilded`	INTEGER	Number of awards (gildings) received by the comment.

Table: `postlinks`

Column Name	Type	Description
`post_id`	TEXT	Unique identifier for the Reddit post.
`end_processed_valid`	INTEGER	Whether the extracted URL from the post resolves to a valid URL.
`end_processed_url`	TEXT	The extracted URL from the Reddit post.
`final_valid`	INTEGER	Whether the final URL from the post resolves to a valid URL after redirections.
`final_status`	INTEGER	HTTP status code of the final URL.
`final_url`	TEXT	The final URL after redirections.
`redirected`	INTEGER	Indicator of whether the posted URL was redirected (1) or not (0).
`in_title`	INTEGER	Indicator of whether the link appears in the post title (1) or post body (0).

Table: `commentlinks`

Column Name	Type	Description
`comment_id`	TEXT	Unique identifier for the Reddit comment.
`end_processed_valid`	INTEGER	Whether the extracted URL from the comment resolves to a valid URL.
`end_processed_url`	TEXT	The extracted URL from the comment.
`final_valid`	INTEGER	Whether the final URL from the comment resolves to a valid URL after redirections.
`final_status`	INTEGER	HTTP status code of the final

Most visited websites by hierachycal categories
kaggle.com
Updated Sep 18, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Natanael de Souza Figueiredo (2020). Most visited websites by hierachycal categories [Dataset]. https://www.kaggle.com/natanael127/most-visited-websites-by-hierachycal-categories/code
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Sep 18, 2020
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Natanael de Souza Figueiredo
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Context

Alexa Internet was founded in April 1996 by Brewster Kahle and Bruce Gilliat. The company's name was chosen in homage to the Library of Alexandria of Ptolemaic Egypt, drawing a parallel between the largest repository of knowledge in the ancient world and the potential of the Internet to become a similar store of knowledge. (from Wikipedia)

The categories list was going out by September, 17h, 2020. So I would like to save it. https://support.alexa.com/hc/en-us/articles/360051913314

This dataset was elaborated by this python script (V2.0): https://github.com/natanael127/dump-alexa-ranking

Content

The sites are grouped in 17 macro categories and this tree ends having more than 360.000 nodes. Subjects are very organized and each of them has its own rank of most accessed domains. So, even the keys of a sub-dictionary may be a good small dataset to use.

Acknowledgements

Thank you my friend André (https://github.com/andrerclaudio) by helping me with tips of Google Colaboratory and computational power to get the data until our deadline.

Inspiration

Alexa ranking was inspired by Library of Alexandria. In the modern world, it may be a good start for AI know more about many, many subjects of the world.
Amount of data created, consumed, and stored 2010-2023, with forecasts to...
statista.com
Updated Nov 21, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2024). Amount of data created, consumed, and stored 2010-2023, with forecasts to 2028 [Dataset]. https://www.statista.com/statistics/871513/worldwide-data-created/
Explore at:
Dataset updated
Nov 21, 2024
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
May 2024
Area covered
Worldwide
Description
The total amount of data created, captured, copied, and consumed globally is forecast to increase rapidly, reaching 149 zettabytes in 2024. Over the next five years up to 2028, global data creation is projected to grow to more than 394 zettabytes. In 2020, the amount of data created and replicated reached a new high. The growth was higher than previously expected, caused by the increased demand due to the COVID-19 pandemic, as more people worked and learned from home and used home entertainment options more often. Storage capacity also growing Only a small percentage of this newly created data is kept though, as just two percent of the data produced and consumed in 2020 was saved and retained into 2021. In line with the strong growth of the data volume, the installed base of storage capacity is forecast to increase, growing at a compound annual growth rate of 19.2 percent over the forecast period from 2020 to 2025. In 2020, the installed base of storage capacity reached 6.7 zettabytes.
d
Wappalyzer Global Website Technology Stack - Lookup API - Technographic Data...
datarade.ai
.json, .csv
Updated Jun 16, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Wappalyzer (2020). Wappalyzer Global Website Technology Stack - Lookup API - Technographic Data [Dataset]. https://datarade.ai/data-products/lookup-api
Explore at:
.json, .csvAvailable download formats
Dataset updated
Jun 16, 2020
Dataset authored and provided by
Wappalyzer
Area covered
Tunisia, Puerto Rico, Sierra Leone, New Caledonia, Afghanistan, American Samoa, Lithuania, Malaysia, Belarus, Hong Kong
Description
Product provided by Wappalyzer. Instant access to website technology stacks.

Lookup API Perform near-instant technology lookups with the Lookup API. Results are fetched from our comprehensive database of millions of websites. If we haven't seen a domain before, we'll index it immediately and report back within minutes.
Mobile internet users worldwide 2020-2029
statista.com
Updated Feb 5, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista Research Department (2025). Mobile internet users worldwide 2020-2029 [Dataset]. https://www.statista.com/topics/779/mobile-internet/
Explore at:
Dataset updated
Feb 5, 2025
Dataset provided by
Statistahttp://statista.com/
Authors
Statista Research Department
Description
The global number of smartphone users in was forecast to continuously increase between 2024 and 2029 by in total 1.8 billion users (+42.62 percent). After the ninth consecutive increasing year, the smartphone user base is estimated to reach 6.1 billion users and therefore a new peak in 2029. Notably, the number of smartphone users of was continuously increasing over the past years.Smartphone users here are limited to internet users of any age using a smartphone. The shown figures have been derived from survey data that has been processed to estimate missing demographics.The shown data are an excerpt of Statista's Key Market Indicators (KMI). The KMI are a collection of primary and secondary indicators on the macro-economic, demographic and technological environment in up to 150 countries and regions worldwide. All indicators are sourced from international and national statistical offices, trade associations and the trade press and they are processed to generate comparable data sets (see supplementary notes under details for more information).Find more key insights for the number of smartphone users in countries like Australia & Oceania and Asia.
Global Land One-kilometer Base Elevation (GLOBE) v.1
catalog.data.gov
datasets.ai
+3more
Updated Oct 18, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
NOAA National Centers for Environmental Information (Point of Contact) (2024). Global Land One-kilometer Base Elevation (GLOBE) v.1 [Dataset]. https://catalog.data.gov/dataset/global-land-one-kilometer-base-elevation-globe-v-11
Explore at:
Dataset updated
Oct 18, 2024
Dataset provided by
National Oceanic and Atmospheric Administrationhttp://www.noaa.gov/
National Centers for Environmental Informationhttps://www.ncei.noaa.gov/
Description
GLOBE is a project to develop the best available 30-arc-second (nominally 1 kilometer) global digital elevation data set. This version of GLOBE contains data from 11 sources, and 17 combinations of source and lineage. It continues much in the tradition of the National Geophysical Data Center's TerrainBase (FGDC 1090), as TerrainBase served as a generally lower-resolution prototype of GLOBE data management and compilation techniques. The GLOBE mosaic has been compiled onto CD-ROMs for the international user community. It is also available from the World Wide Web (linked from the online linkage noted above and anonymous ftp. Improvements to the global model are anticipated, as appropriate data and/or methods are made available. In addition, individual contributions to GLOBE (several areas have more than one candidate) should become available at the same website. GLOBE may be used for technology development, such as helping plan infrastructure for cellular communications networks, other public works, satellite data processing, and environmental monitoring and analysis. GLOBE prototypes (and probably GLOBE itself after its release) have been used to help develop terrain avoidance systems for aircraft. In all cases, GLOBE data should be treated as any potentially useful but guaranteed imperfect data set. Mission- or life-critical applications should consider the documented artifacts, as well as likely undocumented imperfections, in the data.
a
Digital Divide Index - Average Download Speed (Ookla)
hub.arcgis.com
broadband-wacommerce.hub.arcgis.com
Updated Sep 20, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Timmons@WACOM (2023). Digital Divide Index - Average Download Speed (Ookla) [Dataset]. https://hub.arcgis.com/maps/2f2f84805e2c4a319bd9b990ac5ba167
Explore at:
Dataset updated
Sep 20, 2023
Dataset authored and provided by
Timmons@WACOM
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Area covered

Description
This data is used for a broadband mapping initiative conducted by the Washington State Broadband Office. This dataset provides global fixed broadband and mobile (cellular) network performance metrics in zoom level 16 web mercator tiles (approximately 610.8 meters by 610.8 meters at the equator). Data is projected in EPSG:4326. Download speed, upload speed, and latency are collected via the Speedtest by Ookla applications for Android and iOS and averaged for each tile. Measurements are filtered to results containing GPS-quality location accuracy. The data was processed and published to ArcGIS Living Atlas by Esri.AboutSpeedtest data is used today by commercial fixed and mobile network operators around the world to inform network buildout, improve global Internet quality, and increase Internet accessibility. Government regulators such as the United States Federal Communications Commission and the Malaysian Communications and Multimedia Commission use Speedtest data to hold telecommunications entities accountable and direct funds for rural and urban connectivity development. Ookla licenses data to NGOs and educational institutions to fulfill its mission: to help make the internet better, faster and more accessible for everyone. Ookla hopes to further this mission by distributing the data to make it easier for individuals and organizations to use it for the purposes of bridging the social and economic gaps between those with and without modern Internet access.DataHundreds of millions of Speedtests are taken on the Ookla platform each month. In order to create a manageable dataset, we aggregate raw data into tiles. The size of a data tile is defined as a function of "zoom level" (or "z"). At z=0, the size of a tile is the size of the whole world. At z=1, the tile is split in half vertically and horizontally, creating 4 tiles that cover the globe. This tile-splitting continues as zoom level increases, causing tiles to become exponentially smaller as we zoom into a given region. By this definition, tile sizes are actually some fraction of the width/height of Earth according to Web Mercator projection (EPSG:3857). As such, tile size varies slightly depending on latitude, but tile sizes can be estimated in meters.For the purposes of these layers, a zoom level of 16 (z=16) is used for the tiling. This equates to a tile that is approximately 610.8 meters by 610.8 meters at the equator (18 arcsecond blocks). The geometry of each tile is represented in WGS 84 (EPSG:4326) in the tile field.The data can be found at: https://github.com/teamookla/ookla-open-dataUpdate CadenceThe tile aggregates start in Q1 2019 and go through the most recent quarter. They will be updated shortly after the conclusion of the quarter.Esri ProcessingThis layer is a best available aggregation of the original Ookla dataset. This means that for each tile that data is available, the most recent data is used. So for instance, if data is available for a tile for Q2 2019 and for Q4 2020, the Q4 2020 data is awarded to the tile. The default visualization for the layer is the "broadband index". The broadband index is a bivariate index based on both the average download speed and the average upload speed. For Mobile, the score is indexed to a standard of 25 megabits per second (Mbps) download and 3 Mbps upload. A tile with average Speedtest results of 25/3 Mbps is awarded 100 points. Tiles with average speeds above 25/3 are shown in green, tiles with average speeds below this are shown in fuchsia. For Fixed, the score is indexed to a standard of 100 Mbps download and 3 Mbps upload. A tile with average Speedtest results of 100/20 Mbps is awarded 100 points. Tiles with average speeds above 100/20 are shown in green, tiles with average speeds below this are shown in fuchsia.Tile AttributesEach tile contains the following adjoining attributes:The year and the quarter that the tests were performed.The average download speed of all tests performed in the tile, represented in megabits per second.The average upload speed of all tests performed in the tile, represented in megabits per second.The average latency of all tests performed in the tile, represented in millisecondsThe number of tests taken in the tile.The number of unique devices contributing tests in the tile.The quadkey representing the tile.QuadkeysQuadkeys can act as a unique identifier for the tile. This can be useful for joining data spatially from multiple periods (quarters), creating coarser spatial aggregations without using geospatial functions, spatial indexing, partitioning, and an alternative for storing and deriving the tile geometry.LayersThere are two layers:Ookla_Mobile_Tiles - Tiles containing tests taken from mobile devices with GPS-quality location and a cellular connection type (e.g. 4G LTE, 5G NR).Ookla_Fixed_Tiles - Tiles containing tests taken from mobile devices with GPS-quality location and a non-cellular connection type (e.g. WiFi, ethernet).The layers are set to draw at scales 1:3,000,000 and larger.Time Period and update Frequency Layers are generated based on a quarter year of data (three months) and files will be updated and added on a quarterly basis. A /year=2020/quarter=1/ period, the first quarter of the year 2020, would include all data generated on or after 2020-01-01 and before 2020-04-01.
i
Traveling the Silk Road: Non-anonymized datasets
impactcybertrust.org
Updated Mar 4, 2012
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Carnegie Mellon University (2012). Traveling the Silk Road: Non-anonymized datasets [Dataset]. http://doi.org/10.23721/116/1406256
Explore at:
Unique identifier
https://doi.org/10.23721/116/1406256
Dataset updated
Mar 4, 2012
Authors
Carnegie Mellon University
Time period covered
Mar 4, 2012 - Jul 23, 2012
Description
Non-anonymized subset of the databases used in the paper "Traveling the Silk Road: A measurement analysis of a large anonymous online marketplace" (Christin, 2013). In this dataset, textual information (item name, description, or feedback text) and handles have not been anonymized and are thus available. We don't expect any private identifiers or other PII to be present in the data, which was collected from a publicly available website -- the Silk Road anonymous marketplace -- for a few months in 2012.

For less restricted usage terms, please consider the anonymized version, which is also available without any restrictions. This non-anonymized dataset should only be requested if your project MUST rely on full textual descriptions of items and/or feedback.

Christin (2013) Traveling the Silk Road: A measurement analysis of a large anonymous online marketplace. To appear in Proceedings of the 22nd International World Wide Web Conference (WWW'13). Rio de Janeiro, Brazil. May 2013.
a
India: Soils Harmonized World Soil Database - General
hub.arcgis.com
arc-gis-hub-home-arcgishub.hub.arcgis.com
+1more
Updated Feb 1, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
GIS Online (2022). India: Soils Harmonized World Soil Database - General [Dataset]. https://hub.arcgis.com/maps/9f9535990648488a92cdd4d3b76dd43e
Explore at:
Dataset updated
Feb 1, 2022
Dataset authored and provided by
GIS Online
Area covered

Description
Soil is a key natural resource that provides the foundation of basic ecosystem services. Soil determines the types of farms and forests that can grow on a landscape. Soil filters water. Soil helps regulate the Earth's climate by storing large amounts of carbon. Activities that degrade soils reduce the value of the ecosystem services that soil provides. For example, since 1850 35% of human caused green house gas emissions are linked to land use change. The Soil Science Society of America is a good source of of additional information.Dataset SummaryThis layer provides access to a 30 arc-second (roughly 1 km) cell-sized raster with attributes describing the basic properties of soil derived from the Harmonized World Soil Database v 1.2. The values in this layer are for the dominant soil in each mapping unit (sequence field = 1).Attributes in this layer include:Soil Phase 1 and Soil Phase 2 - Phases identify characteristics of soils important for land use or management. Soils may have up to 2 phases with phase 1 being more important than phase 2.Other Properties - provides additional information important for agriculture.Additionally, 3 class description fields were added by Esri based on the document Harmonized World Soil Database Version 1.2 for use in web map pop-ups:Soil Phase 1 DescriptionSoil Phase 2 DescriptionOther Properties DescriptionThe layer is symbolized with the Soil Unit Name field.The document Harmonized World Soil Database Version 1.2 provides more detail on the soil properties attributes contained in this layer.Other attributes contained in this layer include:Soil Mapping Unit Name - the name of the spatially dominant major soil groupSoil Mapping Unit Symbol - a two letter code for labeling the spatially dominant major soil group in thematic mapsData Source - the HWSD is an aggregation of datasets. The data sources are the European Soil Database (ESDB), the 1:1 million soil map of China (CHINA), the Soil and Terrain Database Program (SOTWIS), and the Digital Soil Map of the World (DSMW).Percentage of Mapping Unit covered by dominant componentMore information on the Harmonized World Soil Database is available here.Other layers created from the Harmonized World Soil Database are available on ArcGIS Online:World Soils Harmonized World Soil Database - Bulk DensityWorld Soils Harmonized World Soil Database – ChemistryWorld Soils Harmonized World Soil Database - Exchange CapacityWorld Soils Harmonized World Soil Database – HydricWorld Soils Harmonized World Soil Database – TextureThe authors of this data set request that projects using these data include the following citation:FAO/IIASA/ISRIC/ISSCAS/JRC, 2012. Harmonized World Soil Database (version 1.2). FAO, Rome, Italy and IIASA, Laxenburg, Austria.What can you do with this layer?This layer is suitable for both visualization and analysis. It can be used in ArcGIS Online in web maps and applications and can be used in ArcGIS Desktop.This layer has query, identify, and export image services available. This layer is restricted to a maximum area of 16,000 x 16,000 pixels - an area 4,000 kilometers on a side or an area approximately the size of Europe. The source data for this layer are available here.This layer is part of a larger collection of landscape layers that you can use to perform a wide variety of mapping and analysis tasks.The Living Atlas of the World provides an easy way to explore the landscape layers and many other beautiful and authoritative maps on hundreds of topics.Geonet is a good resource for learning more about landscape layers and the Living Atlas of the World. To get started follow these links:Living Atlas Discussion GroupSoil Data Discussion GroupThe Esri Insider Blog provides an introduction to the Ecophysiographic Mapping project.
🦠 Open Dengue Dataset
kaggle.com
Updated Jul 16, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
mexwell (2024). 🦠 Open Dengue Dataset [Dataset]. https://www.kaggle.com/datasets/mexwell/open-dengue-dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jul 16, 2024
Dataset provided by
Kagglehttp://kaggle.com/
Authors
mexwell
Description
The OpenDengue project aims to build and maintain a database of dengue case counts for every dengue-affected country worldwide since 1990. We collate data from a range of publicly available sources including ministry of health websites, peer-reviewed publications and other disease databases. Please visit our website to learn more about our project and methods.

Original data can be found here and here

Acknowlegement

Foto von National Institute of Allergy and Infectious Diseases auf Unsplash
Film Circulation dataset
zenodo.org
data.niaid.nih.gov
bin, csv, png
Updated Jul 12, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Skadi Loist; Skadi Loist; Evgenia (Zhenya) Samoilova; Evgenia (Zhenya) Samoilova (2024). Film Circulation dataset [Dataset]. http://doi.org/10.5281/zenodo.7887672
Explore at:
csv, png, binAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.7887672
Dataset updated
Jul 12, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Skadi Loist; Skadi Loist; Evgenia (Zhenya) Samoilova; Evgenia (Zhenya) Samoilova
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Complete dataset of “Film Circulation on the International Film Festival Network and the Impact on Global Film Culture”

A peer-reviewed data paper for this dataset is in review to be published in NECSUS_European Journal of Media Studies - an open access journal aiming at enhancing data transparency and reusability, and will be available from https://necsus-ejms.org/ and https://mediarep.org

Please cite this when using the dataset.

Detailed description of the dataset:

1 Film Dataset: Festival Programs

The Film Dataset consists a data scheme image file, a codebook and two dataset tables in csv format.

The codebook (csv file “1_codebook_film-dataset_festival-program”) offers a detailed description of all variables within the Film Dataset. Along with the definition of variables it lists explanations for the units of measurement, data sources, coding and information on missing data.

The csv file “1_film-dataset_festival-program_long” comprises a dataset of all films and the festivals, festival sections, and the year of the festival edition that they were sampled from. The dataset is structured in the long format, i.e. the same film can appear in several rows when it appeared in more than one sample festival. However, films are identifiable via their unique ID.

The csv file “1_film-dataset_festival-program_wide” consists of the dataset listing only unique films (n=9,348). The dataset is in the wide format, i.e. each row corresponds to a unique film, identifiable via its unique ID. For easy analysis, and since the overlap is only six percent, in this dataset the variable sample festival (fest) corresponds to the first sample festival where the film appeared. For instance, if a film was first shown at Berlinale (in February) and then at Frameline (in June of the same year), the sample festival will list “Berlinale”. This file includes information on unique and IMDb IDs, the film title, production year, length, categorization in length, production countries, regional attribution, director names, genre attribution, the festival, festival section and festival edition the film was sampled from, and information whether there is festival run information available through the IMDb data.

2 Survey Dataset

The Survey Dataset consists of a data scheme image file, a codebook and two dataset tables in csv format.

The codebook “2_codebook_survey-dataset” includes coding information for both survey datasets. It lists the definition of the variables or survey questions (corresponding to Samoilova/Loist 2019), units of measurement, data source, variable type, range and coding, and information on missing data.

The csv file “2_survey-dataset_long-festivals_shared-consent” consists of a subset (n=161) of the original survey dataset (n=454), where respondents provided festival run data for films (n=206) and gave consent to share their data for research purposes. This dataset consists of the festival data in a long format, so that each row corresponds to the festival appearance of a film.

The csv file “2_survey-dataset_wide-no-festivals_shared-consent” consists of a subset (n=372) of the original dataset (n=454) of survey responses corresponding to sample films. It includes data only for those films for which respondents provided consent to share their data for research purposes. This dataset is shown in wide format of the survey data, i.e. information for each response corresponding to a film is listed in one row. This includes data on film IDs, film title, survey questions regarding completeness and availability of provided information, information on number of festival screenings, screening fees, budgets, marketing costs, market screenings, and distribution. As the file name suggests, no data on festival screenings is included in the wide format dataset.

3 IMDb & Scripts

The IMDb dataset consists of a data scheme image file, one codebook and eight datasets, all in csv format. It also includes the R scripts that we used for scraping and matching.

The codebook “3_codebook_imdb-dataset” includes information for all IMDb datasets. This includes ID information and their data source, coding and value ranges, and information on missing data.

The csv file “3_imdb-dataset_aka-titles_long” contains film title data in different languages scraped from IMDb in a long format, i.e. each row corresponds to a title in a given language.

The csv file “3_imdb-dataset_awards_long” contains film award data in a long format, i.e. each row corresponds to an award of a given film.

The csv file “3_imdb-dataset_companies_long” contains data on production and distribution companies of films. The dataset is in a long format, so that each row corresponds to a particular company of a particular film.

The csv file “3_imdb-dataset_crew_long” contains data on names and roles of crew members in a long format, i.e. each row corresponds to each crew member. The file also contains binary gender assigned to directors based on their first names using the GenderizeR application.

The csv file “3_imdb-dataset_festival-runs_long” contains festival run data scraped from IMDb in a long format, i.e. each row corresponds to the festival appearance of a given film. The dataset does not include each film screening, but the first screening of a film at a festival within a given year. The data includes festival runs up to 2019.

The csv file “3_imdb-dataset_general-info_wide” contains general information about films such as genre as defined by IMDb, languages in which a film was shown, ratings, and budget. The dataset is in wide format, so that each row corresponds to a unique film.

The csv file “3_imdb-dataset_release-info_long” contains data about non-festival release (e.g., theatrical, digital, tv, dvd/blueray). The dataset is in a long format, so that each row corresponds to a particular release of a particular film.

The csv file “3_imdb-dataset_websites_long” contains data on available websites (official websites, miscellaneous, photos, video clips). The dataset is in a long format, so that each row corresponds to a website of a particular film.

The dataset includes 8 text files containing the script for webscraping. They were written using the R-3.6.3 version for Windows.

The R script “r_1_unite_data” demonstrates the structure of the dataset, that we use in the following steps to identify, scrape, and match the film data.

The R script “r_2_scrape_matches” reads in the dataset with the film characteristics described in the “r_1_unite_data” and uses various R packages to create a search URL for each film from the core dataset on the IMDb website. The script attempts to match each film from the core dataset to IMDb records by first conducting an advanced search based on the movie title and year, and then potentially using an alternative title and a basic search if no matches are found in the advanced search. The script scrapes the title, release year, directors, running time, genre, and IMDb film URL from the first page of the suggested records from the IMDb website. The script then defines a loop that matches (including matching scores) each film in the core dataset with suggested films on the IMDb search page. Matching was done using data on directors, production year (+/- one year), and title, a fuzzy matching approach with two methods: “cosine” and “osa.” where the cosine similarity is used to match titles with a high degree of similarity, and the OSA algorithm is used to match titles that may have typos or minor variations.

The script “r_3_matching” creates a dataset with the matches for a manual check. Each pair of films (original film from the core dataset and the suggested match from the IMDb website was categorized in the following five categories: a) 100% match: perfect match on title, year, and director; b) likely good match; c) maybe match; d) unlikely match; and e) no match). The script also checks for possible doubles in the dataset and identifies them for a manual check.

The script “r_4_scraping_functions” creates a function for scraping the data from the identified matches (based on the scripts described above and manually checked). These functions are used for scraping the data in the next script.

The script “r_5a_extracting_info_sample” uses the function defined in the “r_4_scraping_functions”, in order to scrape the IMDb data for the identified matches. This script does that for the first 100 films, to check, if everything works. Scraping for the entire dataset took a few hours. Therefore, a test with a subsample of 100 films is advisable.

The script “r_5b_extracting_info_all” extracts the data for the entire dataset of the identified matches.

The script “r_5c_extracting_info_skipped” checks the films with missing data (where data was not scraped) and tried to extract data one more time to make sure that the errors were not caused by disruptions in the internet connection or other technical issues.

The script “r_check_logs” is used for troubleshooting and tracking the progress of all of the R scripts used. It gives information on the amount of missing values and errors.

4 Festival Library Dataset

The Festival Library Dataset consists of a data scheme image file, one codebook and one dataset, all in csv format.

The codebook (csv file “4_codebook_festival-library_dataset”) offers a detailed description of all variables within the Library Dataset. It lists the definition of variables, such as location and festival name, and festival categories,
Average daily time spent on social media worldwide 2012-2025
statista.com
Updated Jun 19, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2025). Average daily time spent on social media worldwide 2012-2025 [Dataset]. https://www.statista.com/statistics/433871/daily-social-media-usage-worldwide/
Explore at:
Dataset updated
Jun 19, 2025
Dataset authored and provided by
Statistahttp://statista.com/
Area covered
Worldwide
Description
How much time do people spend on social media? As of 2025, the average daily social media usage of internet users worldwide amounted to 141 minutes per day, down from 143 minutes in the previous year. Currently, the country with the most time spent on social media per day is Brazil, with online users spending an average of 3 hours and 49 minutes on social media each day. In comparison, the daily time spent with social media in the U.S. was just 2 hours and 16 minutes. Global social media usageCurrently, the global social network penetration rate is 62.3 percent. Northern Europe had an 81.7 percent social media penetration rate, topping the ranking of global social media usage by region. Eastern and Middle Africa closed the ranking with 10.1 and 9.6 percent usage reach, respectively. People access social media for a variety of reasons. Users like to find funny or entertaining content and enjoy sharing photos and videos with friends, but mainly use social media to stay in touch with current events friends. Global impact of social mediaSocial media has a wide-reaching and significant impact on not only online activities but also offline behavior and life in general. During a global online user survey in February 2019, a significant share of respondents stated that social media had increased their access to information, ease of communication, and freedom of expression. On the flip side, respondents also felt that social media had worsened their personal privacy, increased a polarization in politics and heightened everyday distractions.
Deep Water Fisheries Catch - Sea Around Us
niue-data.sprep.org
nauru-data.sprep.org
+13more
zip
Updated Feb 20, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Secretariat of the Pacific Regional Environment Programme (2025). Deep Water Fisheries Catch - Sea Around Us [Dataset]. https://niue-data.sprep.org/dataset/deep-water-fisheries-catch-sea-around-us
Explore at:
zip(7560884), zip(2277194), zip(3416488), zip(2623755), zip(2585748), zip(2082951), zip(3366431), zip(2275911), zip(3360309), zip(2459620), zip(2705197), zip(2315699), zip(2484475), zip(2597447), zip(2327685), zip(1947413), zip(2520353), zip(2391700), zip(3021516), zip(2414876), zip(2390899), zip(3316429)Available download formats
Dataset updated
Feb 20, 2025
Dataset provided by
Pacific Regional Environment Programmehttps://www.sprep.org/
License
Public Domain Mark 1.0https://creativecommons.org/publicdomain/mark/1.0/
License information was derived automatically
Area covered
289.41284179688 50.625073063414, 117.14721679688 50.625073063414, 289.41284179688 -53.85252660045)), POLYGON ((117.14721679688 -53.85252660045, Pacific Region
Description
The Sea Around Us is a research initiative at The University of British Columbia (located at the Institute for the Oceans and Fisheries, formerly Fisheries Centre) that assesses the impact of fisheries on the marine ecosystems of the world, and offers mitigating solutions to a range of stakeholders.

The Sea Around Us was initiated in collaboration with The Pew Charitable Trusts in 1999, and in 2014, the Sea Around Us also began a collaboration with The Paul G. Allen Family Foundation to provide African and Asian countries with more accurate and comprehensive fisheries data.

The Sea Around Us provides data and analyses through View Data, articles in peer-reviewed journals, and other media (News). The Sea Around Us regularly update products at the scale of countries’ Exclusive Economic Zones, Large Marine Ecosystems, the High Seas and other spatial scales, and as global maps and summaries.

The Sea Around Us emphasizes catch time series starting in 1950, and related series (e.g., landed value and catch by flag state, fishing sector and catch type), and fisheries-related information on every maritime country (e.g., government subsidies, marine biodiversity). Information is also offered on sub-projects, e.g., the historic expansion of fisheries, the performance of Regional Fisheries Management Organizations, or the likely impact of climate change on fisheries.

The information and data presented on their website is freely available to any user, granted that its source is acknowledged. The Sea Around Us is aware that this information may be incomplete. Please let them know about this via the feedback options available on this website.

If you cite or display any content from the Site, or reference the Sea Around Us, the Sea Around Us – Indian Ocean, the University of British Columbia or the University of Western Australia, in any format, written or otherwise, including print or web publications, presentations, grant applications, websites, other online applications such as blogs, or other works, you must provide appropriate acknowledgement using a citation consistent with the following standard:

When referring to various datasets downloaded from the website, and/or its concept or design, or to several datasets extracted from its underlying databases, cite its architects. Example: Pauly D., Zeller D., Palomares M.L.D. (Editors), 2020. Sea Around Us Concepts, Design and Data (seaaroundus.org).

When referring to a set of values extracted for a given country, EEZ or territory, cite the most recent catch reconstruction report or paper (available on the website) for that country, EEZ or territory. Example: For the Mexican Pacific EEZ, the citation should be “Cisneros-Montemayor AM, Cisneros-Mata MA, Harper S and Pauly D (2015) Unreported marine fisheries catch in Mexico, 1950-2010. Fisheries Centre Working Paper #2015-22, University of British Columbia, Vancouver. 9 p.”, which is accessible on the EEZ page for Mexico (Pacific) on seaaroundus.org.

To help us track the use of Sea Around Us data, we would appreciate you also citing Pauly, Zeller, and Palomares (2020) as the source of the information in an appropriate part of your text;

When using data from our website that are not part of a typical catch reconstruction (e.g., catches by LME or other spatial entity, subsidies given to fisheries, the estuaries in a given country, or the surface area of a given EEZ), cite both the website and the study that generated the underlying database. Many of these can be derived from the ’methods’ texts associated with data pages on seaaroundus.org. Example: Sumaila et al. (2010) for subsides, Alder (2003) for estuaries and Claus et al. (2014) for EEZ delineations, respectively.

The Sea Around Us data are (where not otherwise regulated) under a Creative Commons Attribution Non-Commercial 4.0 International License (https://creativecommons.org/licenses/by-nc/4.0/). Notices regarding copyrights (© The University of British Columbia), license and disclaimer can be found under http://www.seaaroundus.org/terms-and-conditions/. References:

Alder J (2003) Putting the coast in the Sea Around Us Project. The Sea Around Us Newsletter (15): 1-2.

Cisneros-Montemayor AM, Cisneros-Mata MA, Harper S and Pauly D (2015) Unreported marine fisheries catch in Mexico, 1950-2010. Fisheries Centre Working Paper #2015-22, University of British Columbia, Vancouver. 9 p.

Pauly D, Zeller D, and Palomares M.L.D. (Editors) (2020) Sea Around Us Concepts, Design and Data (www.seaaroundus.org)

Claus S, De Hauwere N, Vanhoorne B, Deckers P, Souza Dias F, Hernandez F and Mees J (2014) Marine Regions: Towards a global standard for georeferenced marine names and boundaries. Marine Geodesy 37(2): 99-125.

Sumaila UR, Khan A, Dyck A, Watson R, Munro R, Tydemers P and Pauly D (2010) A bottom-up re-estimation of global fisheries subsidies. Journal of Bioeconomics 12: 201-225.
E-commerce Business Transaction
kaggle.com
Updated May 14, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Gabriel Ramos (2022). E-commerce Business Transaction [Dataset]. https://www.kaggle.com/datasets/gabrielramos87/an-online-shop-business
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
May 14, 2022
Dataset provided by
Kaggle
Authors
Gabriel Ramos
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Context

E-commerce has become a new channel to support businesses development. Through e-commerce, businesses can get access and establish a wider market presence by providing cheaper and more efficient distribution channels for their products or services. E-commerce has also changed the way people shop and consume products and services. Many people are turning to their computers or smart devices to order goods, which can easily be delivered to their homes.

Content

This is a sales transaction data set of UK-based e-commerce (online retail) for one year. This London-based shop has been selling gifts and homewares for adults and children through the website since 2007. Their customers come from all over the world and usually make direct purchases for themselves. There are also small businesses that buy in bulk and sell to other customers through retail outlet channels.

The data set contains 500K rows and 8 columns. The following is the description of each column. 1. TransactionNo (categorical): a six-digit unique number that defines each transaction. The letter “C” in the code indicates a cancellation. 2. Date (numeric): the date when each transaction was generated. 3. ProductNo (categorical): a five or six-digit unique character used to identify a specific product. 4. Product (categorical): product/item name. 5. Price (numeric): the price of each product per unit in pound sterling (£). 6. Quantity (numeric): the quantity of each product per transaction. Negative values related to cancelled transactions. 7. CustomerNo (categorical): a five-digit unique number that defines each customer. 8. Country (categorical): name of the country where the customer resides.

There is a small percentage of order cancellation in the data set. Most of these cancellations were due to out-of-stock conditions on some products. Under this situation, customers tend to cancel an order as they want all products delivered all at once.

Inspiration

Information is a main asset of businesses nowadays. The success of a business in a competitive environment depends on its ability to acquire, store, and utilize information. Data is one of the main sources of information. Therefore, data analysis is an important activity for acquiring new and useful information. Analyze this dataset and try to answer the following questions. 1. How was the sales trend over the months? 2. What are the most frequently purchased products? 3. How many products does the customer purchase in each transaction? 4. What are the most profitable segment customers? 5. Based on your findings, what strategy could you recommend to the business to gain more profit?

Photo by CardMapr on Unsplash

Facebook

Twitter

Click to copy link

Link copied

Cite

The Devastator (2022). Top Visited Websites [Dataset]. https://www.kaggle.com/datasets/thedevastator/the-top-websites-in-the-world/discussion

Top Visited Websites

A dataset of the top visited websites on the internet

Explore at:

71 scholarly articles cite this dataset (View in Google Scholar)

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Dataset updated

Nov 19, 2022

Dataset provided by

Kagglehttp://kaggle.com/

Authors

The Devastator

License

https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

Description

The Top Websites in the World

How They Change Over Time

About this dataset

This dataset consists of the top 50 most visited websites in the world, as well as the category and principal country/territory for each site. The data provides insights into which sites are most popular globally, and what type of content is most popular in different parts of the world

How to use the dataset

This dataset can be used to track the most popular websites in the world over time. It can also be used to compare website popularity between different countries and categories

Research Ideas

To track the most popular websites in the world over time

To see how website popularity changes by region

To find out which website categories are most popular

Acknowledgements

Dataset by Alexa Internet, Inc. (2019), released on Kaggle under the Open Data Commons Public Domain Dedication and License (ODC-PDDL)

License

License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.

Columns

File: df_1.csv | Column name | Description | |:--------------------------------|:---------------------------------------------------------------------| | Site | The name of the website. (String) | | Domain Name | The domain name of the website. (String) | | Category | The category of the website. (String) | | Principal country/territory | The principal country/territory where the website is based. (String) |

Clear search

Close search

Google apps

Main menu

Top Visited Websites

The Top Websites in the World

How They Change Over Time

About this dataset

How to use the dataset

Research Ideas

Acknowledgements

License

Columns

‘Popular Website Traffic Over Time ’ analyzed by Analyst-2

About this dataset

Background

Methodology

Source

How to use this dataset

Acknowledgements

Start A New Notebook!

Data from: UNESCO World Heritage Sites Dataset

UNESCO World Heritage Sites Dataset

UNESCO World Heritage Sites Dataset

About this dataset

How to use the dataset

Research Ideas

Acknowledgements

License

Columns

Acknowledgements

‘Population by Country - 2020’ analyzed by Analyst-2

Context

Content

Acknowledgements

Inspiration

Number of internet users worldwide 2014-2029

US Restaurant POI dataset with metadata

WikiReddit: Tracing Information and Attention Flows Between Online Platforms...

Preprint

Abstract

Datasheet

Motivation

Composition

Collection Process

Preprocessing/cleaning/labeling

Uses

Distribution

Maintenance

SQL Database Schema

Table: posts

Table: comments

Table: postlinks

Table: commentlinks

Most visited websites by hierachycal categories

Context

Content

Acknowledgements

Inspiration

Amount of data created, consumed, and stored 2010-2023, with forecasts to...

Wappalyzer Global Website Technology Stack - Lookup API - Technographic Data...

Mobile internet users worldwide 2020-2029

Global Land One-kilometer Base Elevation (GLOBE) v.1

Digital Divide Index - Average Download Speed (Ookla)

Traveling the Silk Road: Non-anonymized datasets

India: Soils Harmonized World Soil Database - General

🦠 Open Dengue Dataset

Acknowlegement

Film Circulation dataset

Average daily time spent on social media worldwide 2012-2025

Deep Water Fisheries Catch - Sea Around Us

E-commerce Business Transaction

Context

Content

Inspiration

Top Visited Websites

A dataset of the top visited websites on the internet

The Top Websites in the World

How They Change Over Time

About this dataset

How to use the dataset

Research Ideas

Acknowledgements

License

Table: `posts`

Table: `comments`

Table: `postlinks`

Table: `commentlinks`