65 datasets found

Top Visited Websites
kaggle.com
Updated Nov 19, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The Devastator (2022). Top Visited Websites [Dataset]. https://www.kaggle.com/datasets/thedevastator/the-top-websites-in-the-world/discussion
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Nov 19, 2022
Dataset provided by
Kagglehttp://kaggle.com/
Authors
The Devastator
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
The Top Websites in the World

How They Change Over Time

About this dataset

This dataset consists of the top 50 most visited websites in the world, as well as the category and principal country/territory for each site. The data provides insights into which sites are most popular globally, and what type of content is most popular in different parts of the world

How to use the dataset

This dataset can be used to track the most popular websites in the world over time. It can also be used to compare website popularity between different countries and categories

Research Ideas

To track the most popular websites in the world over time

To see how website popularity changes by region

To find out which website categories are most popular

Acknowledgements

Dataset by Alexa Internet, Inc. (2019), released on Kaggle under the Open Data Commons Public Domain Dedication and License (ODC-PDDL)

License

License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.

Columns

File: df_1.csv | Column name | Description | |:--------------------------------|:---------------------------------------------------------------------| | Site | The name of the website. (String) | | Domain Name | The domain name of the website. (String) | | Category | The category of the website. (String) | | Principal country/territory | The principal country/territory where the website is based. (String) |
Number of internet users worldwide 2014-2029
statista.com
Updated Apr 11, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista Research Department (2025). Number of internet users worldwide 2014-2029 [Dataset]. https://www.statista.com/topics/1145/internet-usage-worldwide/
Explore at:
Dataset updated
Apr 11, 2025
Dataset provided by
Statistahttp://statista.com/
Authors
Statista Research Department
Area covered
World
Description
The global number of internet users in was forecast to continuously increase between 2024 and 2029 by in total 1.3 billion users (+23.66 percent). After the fifteenth consecutive increasing year, the number of users is estimated to reach 7 billion users and therefore a new peak in 2029. Notably, the number of internet users of was continuously increasing over the past years.Depicted is the estimated number of individuals in the country or region at hand, that use the internet. As the datasource clarifies, connection quality and usage frequency are distinct aspects, not taken into account here.The shown data are an excerpt of Statista's Key Market Indicators (KMI). The KMI are a collection of primary and secondary indicators on the macro-economic, demographic and technological environment in up to 150 countries and regions worldwide. All indicators are sourced from international and national statistical offices, trade associations and the trade press and they are processed to generate comparable data sets (see supplementary notes under details for more information).Find more key insights for the number of internet users in countries like the Americas and Asia.
d
Web domains - global data (no whois data)
datarade.ai
Updated Jan 12, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Datandard (2024). Web domains - global data (no whois data) [Dataset]. https://datarade.ai/data-products/registered-domains-global-data-no-whois-data-datandard
Explore at:
.csv, .json, .xls, .sql, .txtAvailable download formats
Dataset updated
Jan 12, 2024
Dataset authored and provided by
Datandard
Area covered
Mayotte, Tanzania, Saint Vincent and the Grenadines, Denmark, Turks and Caicos Islands, Saint Barthélemy, Algeria, Madagascar, Pakistan, Costa Rica
Description
A list of domains - updated weekly. Each domain is parsed out in the following fields:

The full domain.

The second level domain

The top level domain

The subdomain(s) - if present

This is not a list of just registered domains but rather domains that has - at some point - returned a valid web response. The dataset can be used as a building block for building other web-based data sets.
Mobile internet users worldwide 2020-2029
statista.com
Updated Feb 5, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista Research Department (2025). Mobile internet users worldwide 2020-2029 [Dataset]. https://www.statista.com/topics/779/mobile-internet/
Explore at:
Dataset updated
Feb 5, 2025
Dataset provided by
Statistahttp://statista.com/
Authors
Statista Research Department
Description
The global number of smartphone users in was forecast to continuously increase between 2024 and 2029 by in total 1.8 billion users (+42.62 percent). After the ninth consecutive increasing year, the smartphone user base is estimated to reach 6.1 billion users and therefore a new peak in 2029. Notably, the number of smartphone users of was continuously increasing over the past years.Smartphone users here are limited to internet users of any age using a smartphone. The shown figures have been derived from survey data that has been processed to estimate missing demographics.The shown data are an excerpt of Statista's Key Market Indicators (KMI). The KMI are a collection of primary and secondary indicators on the macro-economic, demographic and technological environment in up to 150 countries and regions worldwide. All indicators are sourced from international and national statistical offices, trade associations and the trade press and they are processed to generate comparable data sets (see supplementary notes under details for more information).Find more key insights for the number of smartphone users in countries like Australia & Oceania and Asia.
d
Ecommerce Data | Store Location Data | Global Coverage | 61M+ Contacts |...
datarade.ai
Updated Sep 7, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Exellius Systems (2024). Ecommerce Data | Store Location Data | Global Coverage | 61M+ Contacts | (Verified E-mail, Direct Dails)| Decision Makers Contacts| 20+ Attributes [Dataset]. https://datarade.ai/data-products/ecommerce-data-ecommerce-store-data-global-coverage-200-exellius-systems
Explore at:
.bin, .json, .xml, .csv, .xls, .sql, .txtAvailable download formats
Dataset updated
Sep 7, 2024
Dataset authored and provided by
Exellius Systems
Area covered
Heard Island and McDonald Islands, Jersey, Lithuania, Iran (Islamic Republic of), Namibia, Gabon, Spain, Saint Vincent and the Grenadines, Congo (Democratic Republic of the), Seychelles
Description
Revolutionize Customer Engagement with Our Comprehensive Ecommerce Data

Our Ecommerce Data is designed to elevate your customer engagement strategies, providing you with unparalleled insights and precision targeting capabilities. With over 61 million global contacts, this dataset goes beyond conventional data, offering a unique blend of shopping cart links, business emails, phone numbers, and LinkedIn profiles. This comprehensive approach ensures that your marketing strategies are not just effective but also highly personalized, enabling you to connect with your audience on a deeper level.

What Makes Our Ecommerce Data Stand Out?

Unique Features for Enhanced Targeting
Our Ecommerce Data is distinguished by its depth and precision. Unlike many other datasets, it includes shopping cart links—a rare and valuable feature that provides you with direct insights into consumer behavior and purchasing intent. This information allows you to tailor your marketing efforts with unprecedented accuracy. Additionally, the integration of business emails, phone numbers, and LinkedIn profiles adds multiple layers to traditional contact data, enriching your understanding of clients and enabling more personalized engagement.

Robust and Reliable Data Sourcing
We pride ourselves on our dual-sourcing strategy that ensures the highest levels of data accuracy and relevance:

Real-Time Information from 10 Active Publication Sites: Our databases are continuously updated with the latest information, sourced from ten active publication sites that provide real-time data.

Dedicated Contact Discovery Team: Complementing our automated sources, our dedicated Contact Discovery Team conducts thorough research and investigations, ensuring that every piece of data is accurate and reliable. This two-pronged approach guarantees that our Ecommerce Data is both up-to-date and relevant, providing you with a solid foundation for your business strategies.

Primary Use Cases Across Industries

Our Ecommerce Data is versatile and can be leveraged across various industries for multiple applications: - Precision Targeting in Marketing: Create personalized marketing campaigns based on detailed shopping cart activities, ensuring that your outreach resonates with individual customer preferences. - Sales Enrichment: Sales teams can benefit from enriched client profiles that include comprehensive contact information, enabling them to connect with key decision-makers more effectively. - Market Research and Analytics: Research and analytics departments can use this data for in-depth market studies and trend analyses, gaining valuable insights into consumer behavior and market dynamics.

Global Coverage for Comprehensive Engagement

Our Ecommerce Data spans across the globe, providing you with extensive reach and the ability to engage with customers in diverse regions: - North America: United States, Canada, Mexico - Europe: United Kingdom, Germany, France, Italy, Spain, Netherlands, Sweden, and more - Asia: China, Japan, India, South Korea, Singapore, Malaysia, and more - South America: Brazil, Argentina, Chile, Colombia, and more - Africa: South Africa, Nigeria, Kenya, Egypt, and more - Australia and Oceania: Australia, New Zealand - Middle East: United Arab Emirates, Saudi Arabia, Israel, Qatar, and more

Comprehensive Employee and Revenue Size Information

Our dataset also includes detailed information on: - Employee Size: Whether you’re targeting small businesses or large corporations, our data covers all employee sizes, from startups to global enterprises. - Revenue Size: Gain insights into companies across various revenue brackets, enabling you to segment the market more effectively and target your efforts where they will have the most impact.

Seamless Integration into Broader Data Offerings

Our Ecommerce Data is not just a standalone product; it is a critical piece of our broader data ecosystem. It seamlessly integrates with our comprehensive suite of business and consumer datasets, offering you a holistic approach to data-driven decision-making: - Tailored Packages: Choose customized data packages that meet your specific business needs, combining Ecommerce Data with other relevant datasets for a complete view of your market. - Holistic Insights: Whether you are looking for industry-specific details or a broader market overview, our integrated data solutions provide you with the insights necessary to stay ahead of the competition and make informed business decisions.

Elevate Your Business Decisions with Our Ecommerce Data

In essence, our Ecommerce Data is more than just a collection of contacts—it’s a strategic tool designed to give you a competitive edge in understanding and engaging your target audience. By leveraging the power of this comprehensive dataset, you can elevate your business decisions, enhance customer interactions, and navigate the digital landscape with confi...
Data from: UNESCO World Heritage Sites Dataset
kaggle.com
Updated Dec 19, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The Devastator (2023). UNESCO World Heritage Sites Dataset [Dataset]. https://www.kaggle.com/datasets/thedevastator/unesco-world-heritage-sites-dataset/suggestions
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Dec 19, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
The Devastator
Area covered
World
Description
UNESCO World Heritage Sites Dataset

UNESCO World Heritage Sites Dataset

By Throwback Thursday [source]

About this dataset

How to use the dataset

Here are some tips on how to make the most out of this dataset:

Data Exploration:

Begin by understanding the structure and contents of the dataset. Evaluate the number of rows (sites) and columns (attributes) available.

Check for missing values or inconsistencies in data entry that may impact your analysis.

Assess column descriptions to understand what information is included in each attribute.

Geographical Analysis:

Leverage geographical features such as latitude and longitude coordinates provided in this dataset.

Plot these sites on a map using any mapping software or library like Google Maps or Folium for Python. Visualizing their distribution can provide insights into patterns based on location, climate, or cultural factors.

Analyzing Attributes:

Familiarize yourself with different attributes available for analysis. Possible attributes include Name, Description, Category, Region, Country, etc.

Understand each attribute's format and content type (categorical, numerical) for better utilization during data analysis.

Exploring Categories & Regions:

Look at unique categories mentioned in the Category column (e.g., Cultural Site, Natural Site) to explore specific interests. This could help identify clusters within particular heritage types across countries/regions worldwide.

Analyze regions with high concentrations of heritage sites using data visualizations like bar plots or word clouds based on frequency counts.

Identify Trends & Patterns:

Discover recurring themes across various sites by analyzing descriptive text attributes such as names and descriptions.

Identify patterns and correlations between attributes by performing statistical analysis or utilizing machine learning techniques.

Comparison:

Compare different attributes to gain a deeper understanding of the sites.

For example, analyze the number of heritage sites per country/region or compare the distribution between cultural and natural heritage sites.

Additional Data Sources:

Use this dataset as a foundation to combine it with other datasets for in-depth analysis. There are several sources available that provide additional data on UNESCO World Heritage Sites, such as travel blogs, official tourism websites, or academic research databases.

Remember to cite this dataset appropriately if you use it in

Research Ideas

Travel Planning: This dataset can be used to identify and plan visits to UNESCO World Heritage sites around the world. It provides information about the location, category, and date of inscription for each site, allowing users to prioritize their travel destinations based on personal interests or preferences.

Cultural Preservation: Researchers or organizations interested in cultural preservation can use this dataset to analyze trends in UNESCO World Heritage site listings over time. By studying factors such as geographical distribution, types of sites listed, and inscription dates, they can gain insights into patterns of cultural heritage recognition and protection.

Statistical Analysis: The dataset can be used for statistical analysis to explore various aspects related to UNESCO World Heritage sites. For example, it could be used to examine the correlation between a country's economic indicators (such as GDP per capita) and the number or type of World Heritage sites it possesses. This analysis could provide insights into the relationship between economic development and cultural preservation efforts at a global scale

Acknowledgements

If you use this dataset in your research, please credit the original authors. Data Source

License

See the dataset description for more information.

Columns

Acknowledgements

If you use this dataset in your research, please credit the original authors. If you use this dataset in your research, please credit Throwback Thursday.

WikiReddit: Tracing Information and Attention Flows Between Online Platforms...

zenodo.org

bin

Updated May 4, 2025

Facebook

Twitter

Click to copy link

Link copied

Cite

Patrick Gildersleve; Patrick Gildersleve; Anna Beers; Anna Beers; Viviane Ito; Viviane Ito; Agustin Orozco; Agustin Orozco; Francesca Tripodi; Francesca Tripodi (2025). WikiReddit: Tracing Information and Attention Flows Between Online Platforms [Dataset]. http://doi.org/10.5281/zenodo.14653265

Explore at:

binAvailable download formats

Unique identifier

https://doi.org/10.5281/zenodo.14653265

Dataset updated

May 4, 2025

Dataset provided by

Zenodohttp://zenodo.org/

Authors

Patrick Gildersleve; Patrick Gildersleve; Anna Beers; Anna Beers; Viviane Ito; Viviane Ito; Agustin Orozco; Agustin Orozco; Francesca Tripodi; Francesca Tripodi

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Time period covered

Jan 15, 2025

Description

Preprint

Gildersleve, P., Beers, A., Ito, V., Orozco, A., & Tripodi, F. (2025). WikiReddit: Tracing Information and Attention Flows Between Online Platforms. arXiv [Cs.CY]. https://doi.org/10.48550/arXiv.2502.04942

Accepted at the International AAAI Conference on Web and Social Media (ICWSM) 2025

Abstract

The World Wide Web is a complex interconnected digital ecosystem, where information and attention flow between platforms and communities throughout the globe. These interactions co-construct how we understand the world, reflecting and shaping public discourse. Unfortunately, researchers often struggle to understand how information circulates and evolves across the web because platform-specific data is often siloed and restricted by linguistic barriers. To address this gap, we present a comprehensive, multilingual dataset capturing all Wikipedia links shared in posts and comments on Reddit from 2020 to 2023, excluding those from private and NSFW subreddits. Each linked Wikipedia article is enriched with revision history, page view data, article ID, redirects, and Wikidata identifiers. Through a research agreement with Reddit, our dataset ensures user privacy while providing a query and ID mechanism that integrates with the Reddit and Wikipedia APIs. This enables extended analyses for researchers studying how information flows across platforms. For example, Reddit discussions use Wikipedia for deliberation and fact-checking which subsequently influences Wikipedia content, by driving traffic to articles or inspiring edits. By analyzing the relationship between information shared and discussed on these platforms, our dataset provides a foundation for examining the interplay between social media discourse and collaborative knowledge consumption and production.

Datasheet

Motivation

The motivations for this dataset stem from the challenges researchers face in studying the flow of information across the web. While the World Wide Web enables global communication and collaboration, data silos, linguistic barriers, and platform-specific restrictions hinder our ability to understand how information circulates, evolves, and impacts public discourse. Wikipedia and Reddit, as major hubs of knowledge sharing and discussion, offer an invaluable lens into these processes. However, without comprehensive data capturing their interactions, researchers are unable to fully examine how platforms co-construct knowledge. This dataset bridges this gap, providing the tools needed to study the interconnectedness of social media and collaborative knowledge systems.

Composition

WikiReddit, a comprehensive dataset capturing all Wikipedia mentions (including links) shared in posts and comments on Reddit from 2020 to 2023, excluding those from private and NSFW (not safe for work) subreddits. The SQL database comprises 336K total posts, 10.2M comments, 1.95M unique links, and 1.26M unique articles spanning 59 languages on Reddit and 276 Wikipedia language subdomains. Each linked Wikipedia article is enriched with its revision history and page view data within a ±10-day window of its posting, as well as article ID, redirects, and Wikidata identifiers. Supplementary anonymous metadata from Reddit posts and comments further contextualizes the links, offering a robust resource for analysing cross-platform information flows, collective attention dynamics, and the role of Wikipedia in online discourse.

Collection Process

Data was collected from the Reddit4Researchers and Wikipedia APIs. No personally identifiable information is published in the dataset. Data from Reddit to Wikipedia is linked via the hyperlink and article titles appearing in Reddit posts.

Preprocessing/cleaning/labeling

Extensive processing with tools such as regex was applied to the Reddit post/comment text to extract the Wikipedia URLs. Redirects for Wikipedia URLs and article titles were found through the API and mapped to the collected data. Reddit IDs are hashed with SHA-256 for post/comment/user/subreddit anonymity.

Uses

We foresee several applications of this dataset and preview four here. First, Reddit linking data can be used to understand how attention is driven from one platform to another. Second, Reddit linking data can shed light on how Wikipedia's archive of knowledge is used in the larger social web. Third, our dataset could provide insights into how external attention is topically distributed across Wikipedia. Our dataset can help extend that analysis into the disparities in what types of external communities Wikipedia is used in, and how it is used. Fourth, relatedly, a topic analysis of our dataset could reveal how Wikipedia usage on Reddit contributes to societal benefits and harms. Our dataset could help examine if homogeneity within the Reddit and Wikipedia audiences shapes topic patterns and assess whether these relationships mitigate or amplify problematic engagement online.

Distribution

The dataset is publicly shared with a Creative Commons Attribution 4.0 International license. The article describing this dataset should be cited: https://doi.org/10.48550/arXiv.2502.04942

Maintenance

Patrick Gildersleve will maintain this dataset, and add further years of content as and when available.

SQL Database Schema

Table: `posts`

Column Name	Type	Description
`subreddit_id`	TEXT	The unique identifier for the subreddit.
`crosspost_parent_id`	TEXT	The ID of the original Reddit post if this post is a crosspost.
`post_id`	TEXT	Unique identifier for the Reddit post.
`created_at`	TIMESTAMP	The timestamp when the post was created.
`updated_at`	TIMESTAMP	The timestamp when the post was last updated.
`language_code`	TEXT	The language code of the post.
`score`	INTEGER	The score (upvotes minus downvotes) of the post.
`upvote_ratio`	REAL	The ratio of upvotes to total votes.
`gildings`	INTEGER	Number of awards (gildings) received by the post.
`num_comments`	INTEGER	Number of comments on the post.

Table: `comments`

Column Name	Type	Description
`subreddit_id`	TEXT	The unique identifier for the subreddit.
`post_id`	TEXT	The ID of the Reddit post the comment belongs to.
`parent_id`	TEXT	The ID of the parent comment (if a reply).
`comment_id`	TEXT	Unique identifier for the comment.
`created_at`	TIMESTAMP	The timestamp when the comment was created.
`last_modified_at`	TIMESTAMP	The timestamp when the comment was last modified.
`score`	INTEGER	The score (upvotes minus downvotes) of the comment.
`upvote_ratio`	REAL	The ratio of upvotes to total votes for the comment.
`gilded`	INTEGER	Number of awards (gildings) received by the comment.

Table: `postlinks`

Column Name	Type	Description
`post_id`	TEXT	Unique identifier for the Reddit post.
`end_processed_valid`	INTEGER	Whether the extracted URL from the post resolves to a valid URL.
`end_processed_url`	TEXT	The extracted URL from the Reddit post.
`final_valid`	INTEGER	Whether the final URL from the post resolves to a valid URL after redirections.
`final_status`	INTEGER	HTTP status code of the final URL.
`final_url`	TEXT	The final URL after redirections.
`redirected`	INTEGER	Indicator of whether the posted URL was redirected (1) or not (0).
`in_title`	INTEGER	Indicator of whether the link appears in the post title (1) or post body (0).

Table: `commentlinks`

Column Name	Type	Description
`comment_id`	TEXT	Unique identifier for the Reddit comment.
`end_processed_valid`	INTEGER	Whether the extracted URL from the comment resolves to a valid URL.
`end_processed_url`	TEXT	The extracted URL from the comment.
`final_valid`	INTEGER	Whether the final URL from the comment resolves to a valid URL after redirections.
`final_status`	INTEGER	HTTP status code of the final

i
Traveling the Silk Road: Non-anonymized datasets
impactcybertrust.org
Updated Mar 4, 2012
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Carnegie Mellon University (2012). Traveling the Silk Road: Non-anonymized datasets [Dataset]. http://doi.org/10.23721/116/1406256
Explore at:
Unique identifier
https://doi.org/10.23721/116/1406256
Dataset updated
Mar 4, 2012
Authors
Carnegie Mellon University
Time period covered
Mar 4, 2012 - Jul 23, 2012
Description
Non-anonymized subset of the databases used in the paper "Traveling the Silk Road: A measurement analysis of a large anonymous online marketplace" (Christin, 2013). In this dataset, textual information (item name, description, or feedback text) and handles have not been anonymized and are thus available. We don't expect any private identifiers or other PII to be present in the data, which was collected from a publicly available website -- the Silk Road anonymous marketplace -- for a few months in 2012.

For less restricted usage terms, please consider the anonymized version, which is also available without any restrictions. This non-anonymized dataset should only be requested if your project MUST rely on full textual descriptions of items and/or feedback.

Christin (2013) Traveling the Silk Road: A measurement analysis of a large anonymous online marketplace. To appear in Proceedings of the 22nd International World Wide Web Conference (WWW'13). Rio de Janeiro, Brazil. May 2013.
2023 Fortune 1000 Companies
kaggle.com
Updated Sep 8, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
k04dRunn3r (2023). 2023 Fortune 1000 Companies [Dataset]. https://www.kaggle.com/datasets/jeannicolasduval/2023-fortune-1000-companies-info
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Sep 8, 2023
Dataset provided by
Kaggle
Authors
k04dRunn3r
Description
Data from Fortune 500's 2023 ranking.
Includes data on top 1000 companies w/ additional info (Stock symbol/*ticker*, CEO name).

Update (New dataset): 2024 Fortune 1000 Companies

What Is the Fortune 1000?

From Investopedia:

The Fortune 1000 is an annual list of the 1000 largest American companies maintained by the popular magazine Fortune Fortune ranks the eligible companies by revenue generated from core operations, discounted operations, and consolidated subsidiaries Since revenue is the basis for inclusion, every company is authorized to operate in the United States and files a 10-K or comparable financial statement with a government agency -- .

Project Background

Fortune magazine publishes this list every year and some lists can be found from different sources. From looking at this year's available datasets, some features were missing or could not be found. This was built from scraping the standard features as well as what's included on Company Info (such as CEO, Ticker and website) from the Fortune magazine website. Details on how the data was generated can be found on this notebook where a few of the features were also visualized.

The source code from the 2023 fortune 500 Ranking includes 1000 companies. A reference page (slug) to additional info is included for each companies which were also scrapped to complete the dataset.

The Dataset

Available formats: csv, parquet

Features are follows:

[Note: References to datatypes are relevant when using the parquet file; Labels refer to the original website names]

Rank
dtype: int64; Label: Rank

Company
dtype: object; Label: Company

Ticker
dtype: object; Label: Ticker

Sector
dtype: category; Label: Sector

Industry
dtype: category; Label: Industry

Profitable
dtype: category; Label: Profitable

Founder_is_CEO
dtype: category; Label: Founder is CEO

FemaleCEO
dtype: category; Label: Female CEO

Growth_in_Jobs
dtype: category; Label: Growth in Jobs

Change_in_Rank
dtype: float64; Label: Change in Rank (Full 1000)

Gained_in_Rank
dtype: category; Label: Gained in Rank

Dropped_in_Rank
dtype: category; Label: Dropped in Rank

Newcomer_to_the_Fortune500
dtype: category; Label: Newcomer to the Fortune 500

Global500
dtype: category; Label: Global 500

Best_Companies
dtype: category; Label: Best Companies

Number_of_employees
dtype: int64; Label: Employees

MarketCap_March31_M
dtype: float64; Label: Market Value — as of March 31, 2023 ($M)

Revenues_M
dtype: int64; Label: Revenues ($M)

RevenuePercentChange
dtype: float64; Label: Revenue Percent Change

Profits_M
dtype: int64; Label: Profits ($M)

ProfitsPercentChange
dtype: float64; Label: Profits Percent Change

Assets_M
dtype: int64; Label: Assets ($M)

CEO
dtype: object; Label: CEO

Country
dtype: category; Label: Country

HeadquartersCity
dtype: object; Label: Headquarters City

HeadquartersState
dtype: category; Label: Headquarters State

Website
dtype: object; Label: Website

CompanyType
dtype: category; Label: Company type

Footnote
dtype: object; Label: Footnote

MarketCap_Updated_M
dtype: float64; Label: Market value ($M)

Updated
dtype: datetime64[ns]; Label: Updated Click to add a cell.
o
Education Attainment and Enrollment around the World - Dataset - Data...
data.opendata.am
Updated Jul 7, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2023). Education Attainment and Enrollment around the World - Dataset - Data Catalog Armenia [Dataset]. https://data.opendata.am/dataset/dcwb0038973
Explore at:
Dataset updated
Jul 7, 2023
Area covered
World
Description
Patterns of educational attainment vary greatly across countries, and across population groups within countries. In some countries, virtually all children complete basic education whereas in others large groups fall short. The primary purpose of this database, and the associated research program, is to document and analyze these differences using a compilation of a variety of household-based data sets: Demographic and Health Surveys (DHS); Multiple Indicator Cluster Surveys (MICS); Living Standards Measurement Study Surveys (LSMS); as well as country-specific Integrated Household Surveys (IHS) such as Socio-Economic Surveys.As shown at the website associated with this database, there are dramatic differences in attainment by wealth. When households are ranked according to their wealth status (or more precisely, a proxy based on the assets owned by members of the household) there are striking differences in the attainment patterns of children from the richest 20 percent compared to the poorest 20 percent.In Mali in 2012 only 34 percent of 15 to 19 year olds in the poorest quintile have completed grade 1 whereas 80 percent of the richest quintile have done so. In many countries, for example Pakistan, Peru and Indonesia, almost all the children from the wealthiest households have completed at least one year of schooling. In some countries, like Mali and Pakistan, wealth gaps are evident from grade 1 on, in other countries, like Peru and Indonesia, wealth gaps emerge later in the school system.The EdAttain website allows a visual exploration of gaps in attainment and enrollment within and across countries, based on the international database which spans multiple years from over 120 countries and includes indicators disaggregated by wealth, gender and urban/rural location. The database underlying that site can be downloaded from here.
Worldwide Soundscapes project meta-data
zenodo.org
Updated Dec 9, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kevin F.A. Darras; Kevin F.A. Darras; Rodney Rountree; Rodney Rountree; Steven Van Wilgenburg; Steven Van Wilgenburg; Amandine Gasc; Amandine Gasc; 松海李; 松海李; 黎君董; 黎君董; Yuhang Song; Youfang Chen; Youfang Chen; Thomas Cherico Wanger; Thomas Cherico Wanger; Yuhang Song (2022). Worldwide Soundscapes project meta-data [Dataset]. http://doi.org/10.5281/zenodo.7415473
Explore at:
Unique identifier
https://doi.org/10.5281/zenodo.7415473
Dataset updated
Dec 9, 2022
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Kevin F.A. Darras; Kevin F.A. Darras; Rodney Rountree; Rodney Rountree; Steven Van Wilgenburg; Steven Van Wilgenburg; Amandine Gasc; Amandine Gasc; 松海李; 松海李; 黎君董; 黎君董; Yuhang Song; Youfang Chen; Youfang Chen; Thomas Cherico Wanger; Thomas Cherico Wanger; Yuhang Song
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The Worldwide Soundscapes project is a global, open inventory of spatio-temporally replicated soundscape datasets. This Zenodo entry comprises the data tables that constitute its (meta-)database, as well as their description.

The overview of all sampling sites can be found on the corresponding project on ecoSound-web, as well as a demonstration collection containing selected recordings. More information on the project can be found here and on ResearchGate.

The audio recording criteria justifying inclusion into the meta-database are:

Stationary (no transects, towed sensors or microphones mounted on cars)

Passive (unattended, no human disturbance by the recordist)

Ambient (no spatial or temporal focus on a particular species or direction)

Spatially and/or temporally replicated (multiple sites sampled at least at one common daytime or multiple days sampled at least in one common site)

The individual columns of the provided data tables are described in the following. Data tables are linked through primary keys; joining them will result in a database.

datasets

dataset_id: incremental integer, primary key

name: name of the dataset. if it is repeated, incremental integers should be used in the "subset" column to differentiate them.

subset: incremental integer that can be used to distinguish datasets with identical names

collaborators: full names of people deemed responsible for the dataset, separated by commas

contributors: full names of people who are not the main collaborators but who have significantly contributed to the dataset, and who could be contacted for in-depth analyses, separated by commas.

date_added: when the datased was added (DD/MM/YYYY)

URL_open_recordings: if recordings (even only some) from this dataset are openly available, indicate the internet link where they can be found.

URL_project: internet link for further information about the corresponding project

DOI_publication: DOI of corresponding publications, separated by comma

core_realm_IUCN: The core realm of the dataset. Datasets may have multiple realms, but the main one should be listed. Datasets may contain sampling sites from different realms in the "sites" sheet. IUCN Global Ecosystem Typology (v2.0): https://global-ecosystems.org/

medium: the physical medium the microphone is situated in

protected_area: Whether the sampling sites were situated in protected areas or not, or only some.

GADM0: For datasets on land or in territorial waters, Global Administrative Database level0
https://gadm.org/

GADM1: For datasets on land or in territorial waters, Global Administrative Database level1
https://gadm.org/

GADM2: For datasets on land or in territorial waters, Global Administrative Database level2
https://gadm.org/

IHO: For marine locations, the sea area that encompassess all the sampling locations according to the International Hydrographic Organisation. Map here: https://www.arcgis.com/home/item.html?id=44e04407fbaf4d93afcb63018fbca9e2

locality: optional free text about the locality

latitude_numeric_region: study region approximate centroid latitude in WGS84 decimal degrees

longitude_numeric_region: study region approximate centroid longitude in WGS84 decimal degrees

sites_number: number of sites sampled

year_start: starting year of the sampling

year_end: ending year of the sampling

deployment_schedule: description of the sampling schedule, provisional

temporal_recording_selection: list environmental exclusion criteria that were used to determine which recording days or times to discard

high_pass_filter_Hz: frequency of the high-pass filter of the recorder, in Hz

variable_sampling_frequency: Does the sampling frequency vary? If it does, write "NA" in the sampling_frequency_kHz column and indicate it in the sampling_frequency_kHz column inside the deployments sheet

sampling_frequency_kHz: frequency the microphone was sampled at (sounds of half that frequency will be recorded)

variable_recorder:

recorder: recorder model used

microphone: microphone used

freshwater_recordist_position: position of the recordist relative to the microphone during sampling (only for freshwater)

collaborator_comments: free-text field for comments by the collaborators

validated: This cell is checked if the contents of all sheets are complete and have been found to be coherent and consistent with our requirements.

validator_name: name of person doing the validation

validation_comments: validators: please insert the date when someone was contacted

cross-check: this cell is checked if the collaborators confirm the spatial and temporal data after checking the corresponding site maps, deployment and operation time graphs found at https://drive.google.com/drive/folders/1qfwXH_7dpFCqyls-c6b8RZ_fbcn9kXbp?usp=share_link

datasets-sites

dataset_ID: primary key of datasets table

dataset_name: lookup field

site_ID: primary key of sites table

site_name: lookup field

sites

site_ID: unique site IDs, larger than 1000 for compatibility with ecoSound-web

site_name: name or code of sampling site as used in respective projects

latitude_numeric: exact numeric degrees coordinates of latitude

longitude_numeric: exact numeric degrees coordinates of longitude

topography_m: for sites on land: elevation. For marine sites: depth (negative). in meters

freshwater_depth_m

realm: Ecosystem type according to IUCN GET https://global-ecosystems.org/

biome: Ecosystem type according to IUCN GET https://global-ecosystems.org/

functional_group: Ecosystem type according to IUCN GET https://global-ecosystems.org/

comments

deployments

dataset_ID: primary key of datasets table

dataset_name: lookup field

deployment: use identical subscript letters to denote rows that belong to the same deployment. For instance, you may use different operation times and schedules for different target taxa within one deployment.

start_date_min: earliest date of deployment start, double-click cell to get date-picker

start_date_max: latest date of deployment start, if applicable (only used when recorders were deployed over several days), double-click cell to get date-picker

start_time_mixed: deployment start local time, either in HH:MM format or a choice of solar daytimes (sunrise, sunset, noon, midnight). Corresponds to the recording start time for continuous recording deployments. If multiple start times were used, you should mention the latest start time (corresponds to the earliest daytime from which all recorders are active). If applicable, positive or negative offsets from solar times can be mentioned (For example: if data are collected one hour before sunrise, this will be "sunrise-60")

permanent: is the deployment permanent (in which case it would be ongoing and the end date or duration would be unknown)?

variable_duration_days: is the duration of the deployment variable? in days

duration_days: deployment duration per recorder (use the minimum if variable)

end_date_min: earliest date of deployment end, only needed if duration is variable, double-click cell to get date-picker

end_date_max: latest date of deployment end, only needed if duration is variable, double-click cell to get date-picker

end_time_mixed: deployment end local time, either in HH:MM format or a choice of solar daytimes (sunrise, sunset, noon, midnight). Corresponds to the recording end time for continuous recording deployments.

recording_time: does the recording last from the deployment start time to the end time (continuous) or at scheduled daily intervals (scheduled)? Note: we consider recordings with duty cycles to be continuous.

operation_start_time_mixed: scheduled recording start local time, either in HH:MM format or a choice of solar daytimes (sunrise, sunset, noon, midnight). If applicable, positive or negative offsets from solar times can be mentioned (For example: if data are collected one hour before sunrise, this will be "sunrise-60")

operation_duration_minutes: duration of operation in minutes, if constant

operation_end_time_mixed: scheduled recording end local time, either in HH:MM format or a choice of solar daytimes (sunrise, sunset, noon, midnight). If applicable, positive or negative offsets from solar times can be mentioned (For example: if data are collected one hour before sunrise, this will be "sunrise-60")

duty_cycle_minutes: duty cycle of the recording (i.e. the fraction of minutes when it is recording), written as "recording(minutes)/period(minutes)". For example: "1/6" if the recorder is active for 1 minute and standing by for 5 minutes.

sampling_frequency_kHz: only indicate the sampling frequency if it is variable within a particular dataset so that we need to code different frequencies for different deployments

recorder

subset_sites: If the deployment was not done in all the sites of the
d
Swash Web Browsing Clickstream Data - 1.5M Worldwide Users - GDPR Compliant
datarade.ai
.csv, .xls
Updated Jun 27, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Swash (2023). Swash Web Browsing Clickstream Data - 1.5M Worldwide Users - GDPR Compliant [Dataset]. https://datarade.ai/data-products/swash-blockchain-bitcoin-and-web3-enthusiasts-swash
Explore at:
.csv, .xlsAvailable download formats
Dataset updated
Jun 27, 2023
Dataset authored and provided by
Swash
Area covered
India, Jordan, Belarus, Jamaica, Monaco, Liechtenstein, Russian Federation, Latvia, Saint Vincent and the Grenadines, Uzbekistan
Description
Unlock the Power of Behavioural Data with GDPR-Compliant Clickstream Insights.

Swash clickstream data offers a comprehensive and GDPR-compliant dataset sourced from users worldwide, encompassing both desktop and mobile browsing behaviour. Here's an in-depth look at what sets us apart and how our data can benefit your organisation.

User-Centric Approach: Unlike traditional data collection methods, we take a user-centric approach by rewarding users for the data they willingly provide. This unique methodology ensures transparent data collection practices, encourages user participation, and establishes trust between data providers and consumers.

Wide Coverage and Varied Categories: Our clickstream data covers diverse categories, including search, shopping, and URL visits. Whether you are interested in understanding user preferences in e-commerce, analysing search behaviour across different industries, or tracking website visits, our data provides a rich and multi-dimensional view of user activities.

GDPR Compliance and Privacy: We prioritise data privacy and strictly adhere to GDPR guidelines. Our data collection methods are fully compliant, ensuring the protection of user identities and personal information. You can confidently leverage our clickstream data without compromising privacy or facing regulatory challenges.

Market Intelligence and Consumer Behaviuor: Gain deep insights into market intelligence and consumer behaviour using our clickstream data. Understand trends, preferences, and user behaviour patterns by analysing the comprehensive user-level, time-stamped raw or processed data feed. Uncover valuable information about user journeys, search funnels, and paths to purchase to enhance your marketing strategies and drive business growth.

High-Frequency Updates and Consistency: We provide high-frequency updates and consistent user participation, offering both historical data and ongoing daily delivery. This ensures you have access to up-to-date insights and a continuous data feed for comprehensive analysis. Our reliable and consistent data empowers you to make accurate and timely decisions.

Custom Reporting and Analysis: We understand that every organisation has unique requirements. That's why we offer customisable reporting options, allowing you to tailor the analysis and reporting of clickstream data to your specific needs. Whether you need detailed metrics, visualisations, or in-depth analytics, we provide the flexibility to meet your reporting requirements.

Data Quality and Credibility: We take data quality seriously. Our data sourcing practices are designed to ensure responsible and reliable data collection. We implement rigorous data cleaning, validation, and verification processes, guaranteeing the accuracy and reliability of our clickstream data. You can confidently rely on our data to drive your decision-making processes.
Average daily time spent on social media worldwide 2012-2025
statista.com
Updated Jun 19, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2025). Average daily time spent on social media worldwide 2012-2025 [Dataset]. https://www.statista.com/statistics/433871/daily-social-media-usage-worldwide/
Explore at:
Dataset updated
Jun 19, 2025
Dataset authored and provided by
Statistahttp://statista.com/
Area covered
Worldwide
Description
How much time do people spend on social media? As of 2025, the average daily social media usage of internet users worldwide amounted to 141 minutes per day, down from 143 minutes in the previous year. Currently, the country with the most time spent on social media per day is Brazil, with online users spending an average of 3 hours and 49 minutes on social media each day. In comparison, the daily time spent with social media in the U.S. was just 2 hours and 16 minutes. Global social media usageCurrently, the global social network penetration rate is 62.3 percent. Northern Europe had an 81.7 percent social media penetration rate, topping the ranking of global social media usage by region. Eastern and Middle Africa closed the ranking with 10.1 and 9.6 percent usage reach, respectively. People access social media for a variety of reasons. Users like to find funny or entertaining content and enjoy sharing photos and videos with friends, but mainly use social media to stay in touch with current events friends. Global impact of social mediaSocial media has a wide-reaching and significant impact on not only online activities but also offline behavior and life in general. During a global online user survey in February 2019, a significant share of respondents stated that social media had increased their access to information, ease of communication, and freedom of expression. On the flip side, respondents also felt that social media had worsened their personal privacy, increased a polarization in politics and heightened everyday distractions.
Film Circulation dataset
zenodo.org
data.niaid.nih.gov
bin, csv, png
Updated Jul 12, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Skadi Loist; Skadi Loist; Evgenia (Zhenya) Samoilova; Evgenia (Zhenya) Samoilova (2024). Film Circulation dataset [Dataset]. http://doi.org/10.5281/zenodo.7887672
Explore at:
csv, png, binAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.7887672
Dataset updated
Jul 12, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Skadi Loist; Skadi Loist; Evgenia (Zhenya) Samoilova; Evgenia (Zhenya) Samoilova
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Complete dataset of “Film Circulation on the International Film Festival Network and the Impact on Global Film Culture”

A peer-reviewed data paper for this dataset is in review to be published in NECSUS_European Journal of Media Studies - an open access journal aiming at enhancing data transparency and reusability, and will be available from https://necsus-ejms.org/ and https://mediarep.org

Please cite this when using the dataset.

Detailed description of the dataset:

1 Film Dataset: Festival Programs

The Film Dataset consists a data scheme image file, a codebook and two dataset tables in csv format.

The codebook (csv file “1_codebook_film-dataset_festival-program”) offers a detailed description of all variables within the Film Dataset. Along with the definition of variables it lists explanations for the units of measurement, data sources, coding and information on missing data.

The csv file “1_film-dataset_festival-program_long” comprises a dataset of all films and the festivals, festival sections, and the year of the festival edition that they were sampled from. The dataset is structured in the long format, i.e. the same film can appear in several rows when it appeared in more than one sample festival. However, films are identifiable via their unique ID.

The csv file “1_film-dataset_festival-program_wide” consists of the dataset listing only unique films (n=9,348). The dataset is in the wide format, i.e. each row corresponds to a unique film, identifiable via its unique ID. For easy analysis, and since the overlap is only six percent, in this dataset the variable sample festival (fest) corresponds to the first sample festival where the film appeared. For instance, if a film was first shown at Berlinale (in February) and then at Frameline (in June of the same year), the sample festival will list “Berlinale”. This file includes information on unique and IMDb IDs, the film title, production year, length, categorization in length, production countries, regional attribution, director names, genre attribution, the festival, festival section and festival edition the film was sampled from, and information whether there is festival run information available through the IMDb data.

2 Survey Dataset

The Survey Dataset consists of a data scheme image file, a codebook and two dataset tables in csv format.

The codebook “2_codebook_survey-dataset” includes coding information for both survey datasets. It lists the definition of the variables or survey questions (corresponding to Samoilova/Loist 2019), units of measurement, data source, variable type, range and coding, and information on missing data.

The csv file “2_survey-dataset_long-festivals_shared-consent” consists of a subset (n=161) of the original survey dataset (n=454), where respondents provided festival run data for films (n=206) and gave consent to share their data for research purposes. This dataset consists of the festival data in a long format, so that each row corresponds to the festival appearance of a film.

The csv file “2_survey-dataset_wide-no-festivals_shared-consent” consists of a subset (n=372) of the original dataset (n=454) of survey responses corresponding to sample films. It includes data only for those films for which respondents provided consent to share their data for research purposes. This dataset is shown in wide format of the survey data, i.e. information for each response corresponding to a film is listed in one row. This includes data on film IDs, film title, survey questions regarding completeness and availability of provided information, information on number of festival screenings, screening fees, budgets, marketing costs, market screenings, and distribution. As the file name suggests, no data on festival screenings is included in the wide format dataset.

3 IMDb & Scripts

The IMDb dataset consists of a data scheme image file, one codebook and eight datasets, all in csv format. It also includes the R scripts that we used for scraping and matching.

The codebook “3_codebook_imdb-dataset” includes information for all IMDb datasets. This includes ID information and their data source, coding and value ranges, and information on missing data.

The csv file “3_imdb-dataset_aka-titles_long” contains film title data in different languages scraped from IMDb in a long format, i.e. each row corresponds to a title in a given language.

The csv file “3_imdb-dataset_awards_long” contains film award data in a long format, i.e. each row corresponds to an award of a given film.

The csv file “3_imdb-dataset_companies_long” contains data on production and distribution companies of films. The dataset is in a long format, so that each row corresponds to a particular company of a particular film.

The csv file “3_imdb-dataset_crew_long” contains data on names and roles of crew members in a long format, i.e. each row corresponds to each crew member. The file also contains binary gender assigned to directors based on their first names using the GenderizeR application.

The csv file “3_imdb-dataset_festival-runs_long” contains festival run data scraped from IMDb in a long format, i.e. each row corresponds to the festival appearance of a given film. The dataset does not include each film screening, but the first screening of a film at a festival within a given year. The data includes festival runs up to 2019.

The csv file “3_imdb-dataset_general-info_wide” contains general information about films such as genre as defined by IMDb, languages in which a film was shown, ratings, and budget. The dataset is in wide format, so that each row corresponds to a unique film.

The csv file “3_imdb-dataset_release-info_long” contains data about non-festival release (e.g., theatrical, digital, tv, dvd/blueray). The dataset is in a long format, so that each row corresponds to a particular release of a particular film.

The csv file “3_imdb-dataset_websites_long” contains data on available websites (official websites, miscellaneous, photos, video clips). The dataset is in a long format, so that each row corresponds to a website of a particular film.

The dataset includes 8 text files containing the script for webscraping. They were written using the R-3.6.3 version for Windows.

The R script “r_1_unite_data” demonstrates the structure of the dataset, that we use in the following steps to identify, scrape, and match the film data.

The R script “r_2_scrape_matches” reads in the dataset with the film characteristics described in the “r_1_unite_data” and uses various R packages to create a search URL for each film from the core dataset on the IMDb website. The script attempts to match each film from the core dataset to IMDb records by first conducting an advanced search based on the movie title and year, and then potentially using an alternative title and a basic search if no matches are found in the advanced search. The script scrapes the title, release year, directors, running time, genre, and IMDb film URL from the first page of the suggested records from the IMDb website. The script then defines a loop that matches (including matching scores) each film in the core dataset with suggested films on the IMDb search page. Matching was done using data on directors, production year (+/- one year), and title, a fuzzy matching approach with two methods: “cosine” and “osa.” where the cosine similarity is used to match titles with a high degree of similarity, and the OSA algorithm is used to match titles that may have typos or minor variations.

The script “r_3_matching” creates a dataset with the matches for a manual check. Each pair of films (original film from the core dataset and the suggested match from the IMDb website was categorized in the following five categories: a) 100% match: perfect match on title, year, and director; b) likely good match; c) maybe match; d) unlikely match; and e) no match). The script also checks for possible doubles in the dataset and identifies them for a manual check.

The script “r_4_scraping_functions” creates a function for scraping the data from the identified matches (based on the scripts described above and manually checked). These functions are used for scraping the data in the next script.

The script “r_5a_extracting_info_sample” uses the function defined in the “r_4_scraping_functions”, in order to scrape the IMDb data for the identified matches. This script does that for the first 100 films, to check, if everything works. Scraping for the entire dataset took a few hours. Therefore, a test with a subsample of 100 films is advisable.

The script “r_5b_extracting_info_all” extracts the data for the entire dataset of the identified matches.

The script “r_5c_extracting_info_skipped” checks the films with missing data (where data was not scraped) and tried to extract data one more time to make sure that the errors were not caused by disruptions in the internet connection or other technical issues.

The script “r_check_logs” is used for troubleshooting and tracking the progress of all of the R scripts used. It gives information on the amount of missing values and errors.

4 Festival Library Dataset

The Festival Library Dataset consists of a data scheme image file, one codebook and one dataset, all in csv format.

The codebook (csv file “4_codebook_festival-library_dataset”) offers a detailed description of all variables within the Library Dataset. It lists the definition of variables, such as location and festival name, and festival categories,

Small Business Cybersecurity 2020-2021 Checklist

data.mendeley.com

Updated Sep 12, 2020

+ more versions

Facebook

Twitter

Click to copy link

Link copied

Cite

lissa coffey (2020). Small Business Cybersecurity 2020-2021 Checklist [Dataset]. http://doi.org/10.17632/gk9t7zs5hz.1

Explore at:

Unique identifier

https://doi.org/10.17632/gk9t7zs5hz.1

Dataset updated

Sep 12, 2020

Authors

lissa coffey

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Cyber attacks are a growing concern for small businesses during COVID-19 . Be Protected While You Work. Upgrade Your Small Business's Virus Protection Today! Before going for a Cyber security solutions for small to mid-sized businesses deliver enterprise-level protection.

Download this (Checklist for a Small Firm's Cybersecurity Program 2020-2021) data set to deploy secure functioning of various aspects of your small business including, employee data, website and more.This checklist is provided to
assist small member firms with limited resources to establish a cybersecurity program to identify and assess cybersecurity threats,
protect assets from cyber intrusions,
detect when their systems and assets have been compromised,
plan for the response when a compromise occurs and implement a plan to recover lost, stolen or unavailable assets. 
Train employees in security principles.
Protect information, computers, and networks from malware attacks.
Provide firewall security for your Internet connection.
Create a mobile device action plan.
 Make backup copies of important business data and information.
 Learn about the threats and how to protect your website.
 Protect Your Small Business site.
 Learn the basics for protecting your business web sites from cyber attacks at WP Hacked Help Blog

Created With Inputs From Security Experts at WP Hacked Help - Pioneer In WordPress Malware Removal & Security

E-commerce - Users of a French C2C fashion store
kaggle.com
zip
Updated Mar 17, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jeffrey Mvutu Mabilama (2020). E-commerce - Users of a French C2C fashion store [Dataset]. https://www.kaggle.com/jmmvutu/ecommerce-users-of-a-french-c2c-fashion-store
Explore at:
zip(1906187 bytes)Available download formats
Dataset updated
Mar 17, 2020
Authors
Jeffrey Mvutu Mabilama
License
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Description
Context

There are a lot of unknowns when running an E-commerce store, even when you have analytics to guide your decisions.

Users are an important factor in an e-commerce business. This is especially true in a C2C-oriented store, since they are both the suppliers (by uploading their products) AND the customers (by purchasing other user's articles).

This dataset aims to serve as a benchmark for an e-commerce fashion store. Using this dataset, you may want to try and understand what you can expect of your users and determine in advance how your grows may be.

For instance, if you see that most of your users are not very active, you may look into this dataset to compare your store's performance.

If you think this kind of dataset may be useful or if you liked it, don't forget to show your support or appreciation with an upvote/comment. You may even include how you think this dataset might be of use to you. This way, I will be more aware of specific needs and be able to adapt my datasets to suits more your needs.

This dataset is part of a preview of a much larger dataset. Please contact me for more.

Content

What is inside is more than just rows and columns. Make it easy for others to get started by describing how you acquired the data and what time period it represents, too.

The data was scraped from a successful online C2C fashion store with over 9M registered users. The store was first launched in Europe around 2009 then expanded worldwide.

Visitors vs Users: Visitors do not appear in this dataset. Only registered users are included. "Visitors" cannot purchase an article but can view the catalog.

Acknowledgements

We wouldn't be here without the help of others. If you owe any attributions or thanks, include them here along with any citations of past research.

Inspiration

Questions you might want to answer using this dataset:

Are e-commerce users interested in social network feature ?

Are my users active enough (compared to those of this dataset) ?

How likely are people from other countries to sign up in a C2C website ?

How many users are likely to drop off after years of using my service ?

License

CC-BY-NC-SA 4.0

For other licensing options, contact me.
Global Land One-kilometer Base Elevation (GLOBE) v.1
catalog.data.gov
datasets.ai
+3more
Updated Oct 18, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
NOAA National Centers for Environmental Information (Point of Contact) (2024). Global Land One-kilometer Base Elevation (GLOBE) v.1 [Dataset]. https://catalog.data.gov/dataset/global-land-one-kilometer-base-elevation-globe-v-11
Explore at:
Dataset updated
Oct 18, 2024
Dataset provided by
National Oceanic and Atmospheric Administrationhttp://www.noaa.gov/
National Centers for Environmental Informationhttps://www.ncei.noaa.gov/
Description
GLOBE is a project to develop the best available 30-arc-second (nominally 1 kilometer) global digital elevation data set. This version of GLOBE contains data from 11 sources, and 17 combinations of source and lineage. It continues much in the tradition of the National Geophysical Data Center's TerrainBase (FGDC 1090), as TerrainBase served as a generally lower-resolution prototype of GLOBE data management and compilation techniques. The GLOBE mosaic has been compiled onto CD-ROMs for the international user community. It is also available from the World Wide Web (linked from the online linkage noted above and anonymous ftp. Improvements to the global model are anticipated, as appropriate data and/or methods are made available. In addition, individual contributions to GLOBE (several areas have more than one candidate) should become available at the same website. GLOBE may be used for technology development, such as helping plan infrastructure for cellular communications networks, other public works, satellite data processing, and environmental monitoring and analysis. GLOBE prototypes (and probably GLOBE itself after its release) have been used to help develop terrain avoidance systems for aircraft. In all cases, GLOBE data should be treated as any potentially useful but guaranteed imperfect data set. Mission- or life-critical applications should consider the documented artifacts, as well as likely undocumented imperfections, in the data.
d
Click Global Data | Web Traffic Data + Transaction Data | Consumer and B2B...
datarade.ai
.csv
Updated Mar 13, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Consumer Edge (2025). Click Global Data | Web Traffic Data + Transaction Data | Consumer and B2B Shopper Insights | 59 Countries, 3-Day Lag, Daily Delivery [Dataset]. https://datarade.ai/data-products/click-global-data-web-traffic-data-transaction-data-con-consumer-edge
Explore at:
.csvAvailable download formats
Dataset updated
Mar 13, 2025
Dataset authored and provided by
Consumer Edge
Area covered
Marshall Islands, Congo, Bermuda, Finland, Bosnia and Herzegovina, El Salvador, Nauru, South Africa, Montserrat, Sri Lanka
Description
Click Web Traffic Combined with Transaction Data: A New Dimension of Shopper Insights

Consumer Edge is a leader in alternative consumer data for public and private investors and corporate clients. Click enhances the unparalleled accuracy of CE Transact by allowing investors to delve deeper and browse further into global online web traffic for CE Transact companies and more. Leverage the unique fusion of web traffic and transaction datasets to understand the addressable market and understand spending behavior on consumer and B2B websites. See the impact of changes in marketing spend, search engine algorithms, and social media awareness on visits to a merchant’s website, and discover the extent to which product mix and pricing drive or hinder visits and dwell time. Plus, Click uncovers a more global view of traffic trends in geographies not covered by Transact. Doubleclick into better forecasting, with Click.

Consumer Edge’s Click is available in machine-readable file delivery and enables: • Comprehensive Global Coverage: Insights across 620+ brands and 59 countries, including key markets in the US, Europe, Asia, and Latin America. • Integrated Data Ecosystem: Click seamlessly maps web traffic data to CE entities and stock tickers, enabling a unified view across various business intelligence tools. • Near Real-Time Insights: Daily data delivery with a 5-day lag ensures timely, actionable insights for agile decision-making. • Enhanced Forecasting Capabilities: Combining web traffic indicators with transaction data helps identify patterns and predict revenue performance.

Use Case: Analyze Year Over Year Growth Rate by Region

Problem A public investor wants to understand how a company’s year-over-year growth differs by region.

Solution The firm leveraged Consumer Edge Click data to: • Gain visibility into key metrics like views, bounce rate, visits, and addressable spend • Analyze year-over-year growth rates for a time period • Breakout data by geographic region to see growth trends

Metrics Include: • Spend • Items • Volume • Transactions • Price Per Volume

Inquire about a Click subscription to perform more complex, near real-time analyses on public tickers and private brands as well as for industries beyond CPG like: • Monitor web traffic as a leading indicator of stock performance and consumer demand • Analyze customer interest and sentiment at the brand and sub-brand levels

Consumer Edge offers a variety of datasets covering the US, Europe (UK, Austria, France, Germany, Italy, Spain), and across the globe, with subscription options serving a wide range of business needs.

Consumer Edge is the Leader in Data-Driven Insights Focused on the Global Consumer
Kickstarter Data, Global, 2009-2023
icpsr.umich.edu
ascii, delimited, r +3
Updated Apr 9, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Leland, Jonathan (2024). Kickstarter Data, Global, 2009-2023 [Dataset]. http://doi.org/10.3886/ICPSR38050.v3
Explore at:
stata, r, spss, sas, delimited, asciiAvailable download formats
Unique identifier
https://doi.org/10.3886/ICPSR38050.v3
Dataset updated
Apr 9, 2024
Dataset provided by
Inter-university Consortium for Political and Social Researchhttps://www.icpsr.umich.edu/web/pages/
Authors
Leland, Jonathan
License
https://www.icpsr.umich.edu/web/ICPSR/studies/38050/termshttps://www.icpsr.umich.edu/web/ICPSR/studies/38050/terms
Time period covered
2009 - 2023
Area covered
Global
Description
Launched on April 28, 2009, Kickstarter is a Public Benefit Corporation based in Brooklyn, New York. It is a global crowdfunding platform that helps to fund new creative projects and ideas through direct support from individuals (backers) from around the world who pledge money to bring these projects and ideas to life. Kickstarter supports many different kinds of projects. Everything from films, games, and music to art, design, and technology. Funding on Kickstarter is based on the all-or-nothing model. Backers who pledge their support towards a particular project won't be charged unless the funding goal has been reached. Successfully funded projects reward their backers with one-of-a-kind experiences, e.g., limited editions, or copies of the creative work being produced. This study includes three datasets: (1) Kickstarter Project (public-use file), (2) Backer Location file, and (3) Kickstarter Project (restricted-use file). The public-use Kickstarter Project dataset contains detailed information about all successful and unsuccessful Kickstarter projects (N=610,015) from 2009-2023, including the project category and subcategory, project location (city, state (for U.S.-based projects), and country), funding goal in original and U.S. currencies, amount pledged in dollars, and the number of backers for each project. The restricted file adds the project title, 150-character project description, and the URL for the project on the Kickstarter site. The Backer Location dataset includes information about backers' country and state and the total amount pledged for each geographic location.
d
Worldwide | Import & Export Traders | 150k
datarade.ai
.csv, .xls, .txt
Updated Jun 24, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Growth Marketing (2024). Worldwide | Import & Export Traders | 150k [Dataset]. https://datarade.ai/data-products/worldwide-import-export-traders-300k-growth-marketing
Explore at:
.csv, .xls, .txtAvailable download formats
Dataset updated
Jun 24, 2024
Dataset authored and provided by
Growth Marketing
Area covered
Mauritius, Gambia, Canada, Montserrat, Honduras, Lesotho, Kenya, Togo, Senegal, Finland
Description
Our tabular dataset offers comprehensive B2B contact information extracted from import and export trades designed to fuel lead generation efforts. With meticulous field-checking processes, our data is a reliable resource for businesses seeking to expand their networks and explore new trade opportunities.

Each entry in our dataset undergoes rigorous validation protocols to ensure accuracy and completeness. Our quality control measures include cross-referencing multiple sources, verifying contact details, and validating trade information against authoritative databases. Maintaining high data integrity standards guarantees that our clients receive actionable insights to drive their business strategies forward.

The dataset encompasses many industries, capturing import and export trades across diverse sectors and regions. Our dataset provides a panoramic view of global trade dynamics from manufacturing to technology, agriculture to healthcare. With detailed information on products, quantities, and trading partners, businesses can identify promising leads, forge strategic partnerships, and capitalize on emerging market trends.

Our dataset offers substantial coverage in terms of scale, encompassing millions of trade transactions and B2B contacts worldwide. Whether clients seek to explore new markets, source reliable suppliers, or connect with potential buyers, our dataset is a valuable asset for informed decision-making.

On the data marketplace, we offer flexible licensing options tailored to meet the diverse needs of our clients. Whether they require a subset of data for targeted campaigns or the entire dataset for comprehensive market analysis, we provide customizable solutions to accommodate varying requirements.

Our commitment to transparency and data privacy ensures that clients can confidently leverage our dataset, knowing that their information is handled with the utmost care and security. We adhere to stringent data protection regulations and industry best practices, safeguarding sensitive information and fostering trust among our clientele.

Our tabular dataset of import and export trades B2B contacts represents a goldmine of opportunities for businesses seeking to expand their global footprint. With unparalleled accuracy, breadth, and flexibility, it is a cornerstone for successful lead generation and strategic decision-making in today's dynamic marketplace.

Fields: - First Name - Last Name - Title - Company - Company Name for Emails - Email - Seniority - Departments - First Phone - Corporate Phone - Employees - Industry - Person Linkedin Url - Website - Company Linkedin Url - Facebook Url - City - State - Country - Company Address - Company City - Company State - Company Country - Company Phone - Technologies - Annual Revenue

Facebook

Twitter

Click to copy link

Link copied

Cite

The Devastator (2022). Top Visited Websites [Dataset]. https://www.kaggle.com/datasets/thedevastator/the-top-websites-in-the-world/discussion

Top Visited Websites

A dataset of the top visited websites on the internet

Explore at:

71 scholarly articles cite this dataset (View in Google Scholar)

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Dataset updated

Nov 19, 2022

Dataset provided by

Kagglehttp://kaggle.com/

Authors

The Devastator

License

https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

Description

The Top Websites in the World

How They Change Over Time

About this dataset

This dataset consists of the top 50 most visited websites in the world, as well as the category and principal country/territory for each site. The data provides insights into which sites are most popular globally, and what type of content is most popular in different parts of the world

How to use the dataset

This dataset can be used to track the most popular websites in the world over time. It can also be used to compare website popularity between different countries and categories

Research Ideas

To track the most popular websites in the world over time

To see how website popularity changes by region

To find out which website categories are most popular

Acknowledgements

Dataset by Alexa Internet, Inc. (2019), released on Kaggle under the Open Data Commons Public Domain Dedication and License (ODC-PDDL)

License

License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.

Columns

File: df_1.csv | Column name | Description | |:--------------------------------|:---------------------------------------------------------------------| | Site | The name of the website. (String) | | Domain Name | The domain name of the website. (String) | | Category | The category of the website. (String) | | Principal country/territory | The principal country/territory where the website is based. (String) |

Clear search

Close search

Google apps

Main menu

Top Visited Websites

The Top Websites in the World

How They Change Over Time

About this dataset

How to use the dataset

Research Ideas

Acknowledgements

License

Columns

Number of internet users worldwide 2014-2029

Web domains - global data (no whois data)

Mobile internet users worldwide 2020-2029

Ecommerce Data | Store Location Data | Global Coverage | 61M+ Contacts |...

Data from: UNESCO World Heritage Sites Dataset

UNESCO World Heritage Sites Dataset

UNESCO World Heritage Sites Dataset

About this dataset

How to use the dataset

Research Ideas

Acknowledgements

License

Columns

Acknowledgements

WikiReddit: Tracing Information and Attention Flows Between Online Platforms...

Preprint

Abstract

Datasheet

Motivation

Composition

Collection Process

Preprocessing/cleaning/labeling

Uses

Distribution

Maintenance

SQL Database Schema

Table: posts

Table: comments

Table: postlinks

Table: commentlinks

Traveling the Silk Road: Non-anonymized datasets

2023 Fortune 1000 Companies

What Is the Fortune 1000?

Project Background

The Dataset

Education Attainment and Enrollment around the World - Dataset - Data...

Worldwide Soundscapes project meta-data

Swash Web Browsing Clickstream Data - 1.5M Worldwide Users - GDPR Compliant

Average daily time spent on social media worldwide 2012-2025

Film Circulation dataset

Small Business Cybersecurity 2020-2021 Checklist

E-commerce - Users of a French C2C fashion store

Context

Content

Acknowledgements

Inspiration

License

Global Land One-kilometer Base Elevation (GLOBE) v.1

Click Global Data | Web Traffic Data + Transaction Data | Consumer and B2B...

Kickstarter Data, Global, 2009-2023

Worldwide | Import & Export Traders | 150k

Top Visited Websites

A dataset of the top visited websites on the internet

The Top Websites in the World

How They Change Over Time

About this dataset

How to use the dataset

Research Ideas

Acknowledgements

License

Columns

Table: `posts`

Table: `comments`

Table: `postlinks`

Table: `commentlinks`