Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Analysis of ‘Popular Website Traffic Over Time ’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/yamqwe/popular-website-traffice on 13 February 2022.
--- Dataset description provided by original source is as follows ---
Background
Have you every been in a conversation and the question comes up, who uses Bing? This question comes up occasionally because people wonder if these sites have any views. For this research study, we are going to be exploring popular website traffic for many popular websites.
Methodology
The data collected originates from SimilarWeb.com.
Source
For the analysis and study, go to The Concept Center
This dataset was created by Chase Willden and contains around 0 samples along with 1/1/2017, Social Media, technical information and other features such as: - 12/1/2016 - 3/1/2017 - and more.
- Analyze 11/1/2016 in relation to 2/1/2017
- Study the influence of 4/1/2017 on 1/1/2017
- More datasets
If you use this dataset in your research, please credit Chase Willden
--- Original source retains full ownership of the source dataset ---
The global number of internet users in was forecast to continuously increase between 2024 and 2029 by in total 1.3 billion users (+23.66 percent). After the fifteenth consecutive increasing year, the number of users is estimated to reach 7 billion users and therefore a new peak in 2029. Notably, the number of internet users of was continuously increasing over the past years.Depicted is the estimated number of individuals in the country or region at hand, that use the internet. As the datasource clarifies, connection quality and usage frequency are distinct aspects, not taken into account here.The shown data are an excerpt of Statista's Key Market Indicators (KMI). The KMI are a collection of primary and secondary indicators on the macro-economic, demographic and technological environment in up to 150 countries and regions worldwide. All indicators are sourced from international and national statistical offices, trade associations and the trade press and they are processed to generate comparable data sets (see supplementary notes under details for more information).Find more key insights for the number of internet users in countries like the Americas and Asia.
Patterns of educational attainment vary greatly across countries, and across population groups within countries. In some countries, virtually all children complete basic education whereas in others large groups fall short. The primary purpose of this database, and the associated research program, is to document and analyze these differences using a compilation of a variety of household-based data sets: Demographic and Health Surveys (DHS); Multiple Indicator Cluster Surveys (MICS); Living Standards Measurement Study Surveys (LSMS); as well as country-specific Integrated Household Surveys (IHS) such as Socio-Economic Surveys.As shown at the website associated with this database, there are dramatic differences in attainment by wealth. When households are ranked according to their wealth status (or more precisely, a proxy based on the assets owned by members of the household) there are striking differences in the attainment patterns of children from the richest 20 percent compared to the poorest 20 percent.In Mali in 2012 only 34 percent of 15 to 19 year olds in the poorest quintile have completed grade 1 whereas 80 percent of the richest quintile have done so. In many countries, for example Pakistan, Peru and Indonesia, almost all the children from the wealthiest households have completed at least one year of schooling. In some countries, like Mali and Pakistan, wealth gaps are evident from grade 1 on, in other countries, like Peru and Indonesia, wealth gaps emerge later in the school system.The EdAttain website allows a visual exploration of gaps in attainment and enrollment within and across countries, based on the international database which spans multiple years from over 120 countries and includes indicators disaggregated by wealth, gender and urban/rural location. The database underlying that site can be downloaded from here.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This users dataset is a preview of a much bigger dataset, with lots of related data (product listings of sellers, comments on listed products, etc...).
My Telegram bot will answer your queries and allow you to contact me.
There are a lot of unknowns when running an E-commerce store, even when you have analytics to guide your decisions.
Users are an important factor in an e-commerce business. This is especially true in a C2C-oriented store, since they are both the suppliers (by uploading their products) AND the customers (by purchasing other user's articles).
This dataset aims to serve as a benchmark for an e-commerce fashion store. Using this dataset, you may want to try and understand what you can expect of your users and determine in advance how your grows may be.
If you think this kind of dataset may be useful or if you liked it, don't forget to show your support or appreciation with an upvote/comment. You may even include how you think this dataset might be of use to you. This way, I will be more aware of specific needs and be able to adapt my datasets to suits more your needs.
This dataset is part of a preview of a much larger dataset. Please contact me for more.
The data was scraped from a successful online C2C fashion store with over 10M registered users. The store was first launched in Europe around 2009 then expanded worldwide.
Visitors vs Users: Visitors do not appear in this dataset. Only registered users are included. "Visitors" cannot purchase an article but can view the catalog.
We wouldn't be here without the help of others. If you owe any attributions or thanks, include them here along with any citations of past research.
Questions you might want to answer using this dataset:
Example works:
For other licensing options, contact me.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Analysis of ‘Countries of the World’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/fernandol/countries-of-the-world on 12 November 2021.
--- Dataset description provided by original source is as follows ---
World fact sheet, fun to link with other datasets.
Information on population, region, area size, infant mortality and more.
Source: All these data sets are made up of data from the US government. Generally they are free to use if you use the data in the US. If you are outside of the US, you may need to contact the US Govt to ask.
Data from the World Factbook is public domain. The website says "The World Factbook is in the public domain and may be used freely by anyone at anytime without seeking permission."
https://www.cia.gov/library/publications/the-world-factbook/docs/faqs.html
When making visualisations related to countries, sometimes it is interesting to group them by attributes such as region, or weigh their importance by population, GDP or other variables.
--- Original source retains full ownership of the source dataset ---
Context There's a story behind every dataset and here's your opportunity to share yours.
Content What's inside is more than just rows and columns. Make it easy for others to get started by describing how you acquired the data and what time period it represents, too.
Acknowledgements We wouldn't be here without the help of others. If you owe any attributions or thanks, include them here along with any citations of past research.
Inspiration Your data will be in front of the world's largest data science community. What questions do you want to see answered?
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Analysis of ‘Population by Country - 2020’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/tanuprabhu/population-by-country-2020 on 28 January 2022.
--- Dataset description provided by original source is as follows ---
I always wanted to access a data set that was related to the world’s population (Country wise). But I could not find a properly documented data set. Rather, I just created one manually.
Now I knew I wanted to create a dataset but I did not know how to do so. So, I started to search for the content (Population of countries) on the internet. Obviously, Wikipedia was my first search. But I don't know why the results were not acceptable. And also there were only I think 190 or more countries. So then I surfed the internet for quite some time until then I stumbled upon a great website. I think you probably have heard about this. The name of the website is Worldometer. This is exactly the website I was looking for. This website had more details than Wikipedia. Also, this website had more rows I mean more countries with their population.
Once I got the data, now my next hard task was to download it. Of course, I could not get the raw form of data. I did not mail them regarding the data. Now I learned a new skill which is very important for a data scientist. I read somewhere that to obtain the data from websites you need to use this technique. Any guesses, keep reading you will come to know in the next paragraph.
https://fiverr-res.cloudinary.com/images/t_main1,q_auto,f_auto/gigs/119580480/original/68088c5f588ec32a6b3a3a67ec0d1b5a8a70648d/do-web-scraping-and-data-mining-with-python.png" alt="alt text">
You are right its, Web Scraping. Now I learned this so that I could convert the data into a CSV format. Now I will give you the scraper code that I wrote and also I somehow found a way to directly convert the pandas data frame to a CSV(Comma-separated fo format) and store it on my computer. Now just go through my code and you will know what I'm talking about.
Below is the code that I used to scrape the code from the website
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F3200273%2Fe814c2739b99d221de328c72a0b2571e%2FCapture.PNG?generation=1581314967227445&alt=media" alt="">
Now I couldn't have got the data without Worldometer. So special thanks to the website. It is because of them I was able to get the data.
As far as I know, I don't have any questions to ask. You guys can let me know by finding your ways to use the data and let me know via kernel if you find something interesting
--- Original source retains full ownership of the source dataset ---
Point of Interest (POI) is defined as an entity (such as a business) at a ground location (point) which may be (of interest). We provide high-quality POI data that is fresh, consistent, customizable, easy to use and with high-density coverage for all countries of the world.
This is our process flow:
Our machine learning systems continuously crawl for new POI data
Our geoparsing and geocoding calculates their geo locations
Our categorization systems cleanup and standardize the datasets
Our data pipeline API publishes the datasets on our data store
A new POI comes into existence. It could be a bar, a stadium, a museum, a restaurant, a cinema, or store, etc.. In today's interconnected world its information will appear very quickly in social media, pictures, websites, press releases. Soon after that, our systems will pick it up.
POI Data is in constant flux. Every minute worldwide over 200 businesses will move, over 600 new businesses will open their doors and over 400 businesses will cease to exist. And over 94% of all businesses have a public online presence of some kind tracking such changes. When a business changes, their website and social media presence will change too. We'll then extract and merge the new information, thus creating the most accurate and up-to-date business information dataset across the globe.
We offer our customers perpetual data licenses for any dataset representing this ever changing information, downloaded at any given point in time. This makes our company's licensing model unique in the current Data as a Service - DaaS Industry. Our customers don't have to delete our data after the expiration of a certain "Term", regardless of whether the data was purchased as a one time snapshot, or via our data update pipeline.
Customers requiring regularly updated datasets may subscribe to our Annual subscription plans. Our data is continuously being refreshed, therefore subscription plans are recommended for those who need the most up to date data. The main differentiators between us vs the competition are our flexible licensing terms and our data freshness.
Data samples may be downloaded at https://store.poidata.xyz/us
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The World Wide Web is a complex interconnected digital ecosystem, where information and attention flow between platforms and communities throughout the globe. These interactions co-construct how we understand the world, reflecting and shaping public discourse. Unfortunately, researchers often struggle to understand how information circulates and evolves across the web because platform-specific data is often siloed and restricted by linguistic barriers. To address this gap, we present a comprehensive, multilingual dataset capturing all Wikipedia links shared in posts and comments on Reddit from 2020 to 2023, excluding those from private and NSFW subreddits. Each linked Wikipedia article is enriched with revision history, page view data, article ID, redirects, and Wikidata identifiers. Through a research agreement with Reddit, our dataset ensures user privacy while providing a query and ID mechanism that integrates with the Reddit and Wikipedia APIs. This enables extended analyses for researchers studying how information flows across platforms. For example, Reddit discussions use Wikipedia for deliberation and fact-checking which subsequently influences Wikipedia content, by driving traffic to articles or inspiring edits. By analyzing the relationship between information shared and discussed on these platforms, our dataset provides a foundation for examining the interplay between social media discourse and collaborative knowledge consumption and production.
The motivations for this dataset stem from the challenges researchers face in studying the flow of information across the web. While the World Wide Web enables global communication and collaboration, data silos, linguistic barriers, and platform-specific restrictions hinder our ability to understand how information circulates, evolves, and impacts public discourse. Wikipedia and Reddit, as major hubs of knowledge sharing and discussion, offer an invaluable lens into these processes. However, without comprehensive data capturing their interactions, researchers are unable to fully examine how platforms co-construct knowledge. This dataset bridges this gap, providing the tools needed to study the interconnectedness of social media and collaborative knowledge systems.
WikiReddit, a comprehensive dataset capturing all Wikipedia mentions (including links) shared in posts and comments on Reddit from 2020 to 2023, excluding those from private and NSFW (not safe for work) subreddits. The SQL database comprises 336K total posts, 10.2M comments, 1.95M unique links, and 1.26M unique articles spanning 59 languages on Reddit and 276 Wikipedia language subdomains. Each linked Wikipedia article is enriched with its revision history and page view data within a ±10-day window of its posting, as well as article ID, redirects, and Wikidata identifiers. Supplementary anonymous metadata from Reddit posts and comments further contextualizes the links, offering a robust resource for analysing cross-platform information flows, collective attention dynamics, and the role of Wikipedia in online discourse.
Data was collected from the Reddit4Researchers and Wikipedia APIs. No personally identifiable information is published in the dataset. Data from Reddit to Wikipedia is linked via the hyperlink and article titles appearing in Reddit posts.
Extensive processing with tools such as regex was applied to the Reddit post/comment text to extract the Wikipedia URLs. Redirects for Wikipedia URLs and article titles were found through the API and mapped to the collected data. Reddit IDs are hashed with SHA-256 for post/comment/user/subreddit anonymity.
We foresee several applications of this dataset and preview four here. First, Reddit linking data can be used to understand how attention is driven from one platform to another. Second, Reddit linking data can shed light on how Wikipedia's archive of knowledge is used in the larger social web. Third, our dataset could provide insights into how external attention is topically distributed across Wikipedia. Our dataset can help extend that analysis into the disparities in what types of external communities Wikipedia is used in, and how it is used. Fourth, relatedly, a topic analysis of our dataset could reveal how Wikipedia usage on Reddit contributes to societal benefits and harms. Our dataset could help examine if homogeneity within the Reddit and Wikipedia audiences shapes topic patterns and assess whether these relationships mitigate or amplify problematic engagement online.
The dataset is publicly shared with a Creative Commons Attribution 4.0 International license. The article describing this dataset should be cited: https://doi.org/10.48550/arXiv.2502.04942
Patrick Gildersleve will maintain this dataset, and add further years of content as and when available.
posts
Column Name | Type | Description |
---|---|---|
subreddit_id | TEXT | The unique identifier for the subreddit. |
crosspost_parent_id | TEXT | The ID of the original Reddit post if this post is a crosspost. |
post_id | TEXT | Unique identifier for the Reddit post. |
created_at | TIMESTAMP | The timestamp when the post was created. |
updated_at | TIMESTAMP | The timestamp when the post was last updated. |
language_code | TEXT | The language code of the post. |
score | INTEGER | The score (upvotes minus downvotes) of the post. |
upvote_ratio | REAL | The ratio of upvotes to total votes. |
gildings | INTEGER | Number of awards (gildings) received by the post. |
num_comments | INTEGER | Number of comments on the post. |
comments
Column Name | Type | Description |
---|---|---|
subreddit_id | TEXT | The unique identifier for the subreddit. |
post_id | TEXT | The ID of the Reddit post the comment belongs to. |
parent_id | TEXT | The ID of the parent comment (if a reply). |
comment_id | TEXT | Unique identifier for the comment. |
created_at | TIMESTAMP | The timestamp when the comment was created. |
last_modified_at | TIMESTAMP | The timestamp when the comment was last modified. |
score | INTEGER | The score (upvotes minus downvotes) of the comment. |
upvote_ratio | REAL | The ratio of upvotes to total votes for the comment. |
gilded | INTEGER | Number of awards (gildings) received by the comment. |
postlinks
Column Name | Type | Description |
---|---|---|
post_id | TEXT | Unique identifier for the Reddit post. |
end_processed_valid | INTEGER | Whether the extracted URL from the post resolves to a valid URL. |
end_processed_url | TEXT | The extracted URL from the Reddit post. |
final_valid | INTEGER | Whether the final URL from the post resolves to a valid URL after redirections. |
final_status | INTEGER | HTTP status code of the final URL. |
final_url | TEXT | The final URL after redirections. |
redirected | INTEGER | Indicator of whether the posted URL was redirected (1) or not (0). |
in_title | INTEGER | Indicator of whether the link appears in the post title (1) or post body (0). |
commentlinks
Column Name | Type | Description |
---|---|---|
comment_id | TEXT | Unique identifier for the Reddit comment. |
end_processed_valid | INTEGER | Whether the extracted URL from the comment resolves to a valid URL. |
end_processed_url | TEXT | The extracted URL from the comment. |
final_valid | INTEGER | Whether the final URL from the comment resolves to a valid URL after redirections. |
final_status | INTEGER | HTTP status code of the final |
The global number of smartphone users in was forecast to continuously increase between 2024 and 2029 by in total 1.8 billion users (+42.62 percent). After the ninth consecutive increasing year, the smartphone user base is estimated to reach 6.1 billion users and therefore a new peak in 2029. Notably, the number of smartphone users of was continuously increasing over the past years.Smartphone users here are limited to internet users of any age using a smartphone. The shown figures have been derived from survey data that has been processed to estimate missing demographics.The shown data are an excerpt of Statista's Key Market Indicators (KMI). The KMI are a collection of primary and secondary indicators on the macro-economic, demographic and technological environment in up to 150 countries and regions worldwide. All indicators are sourced from international and national statistical offices, trade associations and the trade press and they are processed to generate comparable data sets (see supplementary notes under details for more information).Find more key insights for the number of smartphone users in countries like Australia & Oceania and Asia.
The United Nations Educational, Scientific and Cultural Organization (UNESCO) seeks to encourage the identification, protection and preservation of cultural and natural heritage around the world considered to be of outstanding value to humanity. This is embodied in an international treaty called the Convention concerning the Protection of the World Cultural and Natural Heritage , adopted by UNESCO in 1972.As of January 2017, 1052 sites are listed: 814 cultural, 203 natural, and 35 mixed properties, in 165 states parties.For more information, visit: whc.unesco.org/en/list/
Public Domain Mark 1.0https://creativecommons.org/publicdomain/mark/1.0/
License information was derived automatically
The World Database on Protected Areas (WDPA) is the most comprehensive global database of marine and terrestrial protected areas, updated on a monthly basis, and is one of the key global biodiversity data sets being widely used by scientists, businesses, governments, International secretariats and others to inform planning, policy decisions and management. The WDPA is a joint project between UN Environment and the International Union for Conservation of Nature (IUCN). The compilation and management of the WDPA is carried out by UN Environment World Conservation Monitoring Centre (UNEP-WCMC), in collaboration with governments, non-governmental organisations, academia and industry. There are monthly updates of the data which are made available online through the Protected Planet website where the data is both viewable and downloadable. Data and information on the world's protected areas compiled in the WDPA are used for reporting to the Convention on Biological Diversity on progress towards reaching the Aichi Biodiversity Targets (particularly Target 11), to the UN to track progress towards the 2030 Sustainable Development Goals, to some of the Intergovernmental Science-Policy Platform on Biodiversity and Ecosystem Services (IPBES) core indicators, and other international assessments and reports including the Global Biodiversity Outlook, as well as for the publication of the United Nations List of Protected Areas. Every two years, UNEP-WCMC releases the Protected Planet Report on the status of the world's protected areas and recommendations on how to meet international goals and targets. Many platforms are incorporating the WDPA to provide integrated information to diverse users, including businesses and governments, in a range of sectors including mining, oil and gas, and finance. For example, the WDPA is included in the Integrated Biodiversity Assessment Tool, an innovative decision support tool that gives users easy access to up-to-date information that allows them to identify biodiversity risks and opportunities within a project boundary. The reach of the WDPA is further enhanced in services developed by other parties, such as the Global Forest Watch and the Digital Observatory for Protected Areas, which provide decision makers with access to monitoring and alert systems that allow whole landscapes to be managed better. Together, these applications of the WDPA demonstrate the growing value and significance of the Protected Planet initiative.
Abstract copyright UK Data Service and data collection copyright owner.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Analysis of ‘Climate Change Dataset’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/yamqwe/climate-change-datae on 28 January 2022.
--- Dataset description provided by original source is as follows ---
Data from World Development Indicators and Climate Change Knowledge Portal on climate systems, exposure to climate impacts, resilience, greenhouse gas emissions, and energy use.
In addition to the data available here and through the Climate Data API, the Climate Change Knowledge Portal has a web interface to a collection of water indicators that may be used to assess the impact of climate change across over 8,000 water basins worldwide. You may use the web interface to download the data for any of these basins.
Here is how to navigate to the water data:
- Go to the Climate Change Knowledge Portal home page (http://climateknowledgeportal.worldbank.org/)
- Click any region on the map Click a country In the navigation menu
- Click "Impacts" and then "Water" Click the map to select a specific water basin
- Click "Click here to get access to data and indicators" Please be sure to observe the disclaimers on the website regarding uncertainties and use of the water data.
Attribution: Climate Change Data, World Bank Group.
World Bank Data Catalog Terms of Use
Source: http://data.worldbank.org/data-catalog/climate-change
This dataset was created by World Bank and contains around 10000 samples along with 2009, 1993, technical information and other features such as: - 1994 - Series Code - and more.
- Analyze 1995 in relation to Scale
- Study the influence of 1998 on Country Code
- More datasets
If you use this dataset in your research, please credit World Bank
--- Original source retains full ownership of the source dataset ---
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
ABSTRACT
End-to-end (E2E) testing is a software validation approach that simulates realistic user scenarios throughout the entire workflow of an application. In the context of web
applications, E2E testing involves two activities: Graphic User Interface (GUI) testing, which simulates user interactions with the web app’s GUI through web browsers, and performance testing, which evaluates system workload handling. Despite its recognized importance in delivering high-quality web applications, the availability of large-scale datasets featuring real-world E2E web tests remains limited, hindering research in the field.
To address this gap, we present E2EGit, a comprehensive dataset of non-trivial open-source web projects collected on GitHub that adopt E2E testing. By analyzing over 5,000 web repositories across popular programming languages (JAVA, JAVASCRIPT, TYPESCRIPT, and PYTHON), we identified 472 repositories implementing 43,670 automated Web GUI tests with popular browser automation frameworks (SELENIUM, PLAYWRIGHT, CYPRESS, PUPPETEER), and 84 repositories that featured 271 automated performance tests implemented leveraging the most popular open-source tools (JMETER, LOCUST). Among these, 13 repositories implemented both types of testing for a total of 786 Web GUI tests and 61 performance tests.
DATASET DESCRIPTION
The dataset is provided as an SQLite database, whose structure is illustrated in Figure 3 (in the paper), which consists of five tables, each serving a specific purpose.
The repository table contains information on 1.5 million repositories collected using the SEART tool on May 4. It includes 34 fields detailing repository characteristics. The
non_trivial_repository table is a subset of the previous one, listing repositories that passed the two filtering stages described in the pipeline. For each repository, it specifies whether it is a web repository using JAVA, JAVASCRIPT, TYPESCRIPT, or PYTHON frameworks. A repository may use multiple frameworks, with corresponding fields (e.g., is web java) set to true, and the field web dependencies listing the detected web frameworks. For Web GUI testing, the dataset includes two additional tables; gui_testing_test _details, where each row represents a test file, providing the file path, the browser automation framework used, the test engine employed, and the number of tests implemented in the file. gui_testing_repo_details, aggregating data from the previous table at the repository level. Each of the 472 repositories has a row summarizing
the number of test files using frameworks like SELENIUM or PLAYWRIGHT, test engines like JUNIT, and the total number of tests identified. For performance testing, the performance_testing_test_details table contains 410 rows, one for each test identified. Each row includes the file path, whether the test uses JMETER or LOCUST, and extracted details such as the number of thread groups, concurrent users, and requests. Notably, some fields may be absent—for instance, if external files (e.g., CSVs defining workloads) were unavailable, or in the case of Locust tests, where parameters like duration and concurrent users are specified via the command line.
To cite this article refer to this citation:
@inproceedings{di2025e2egit,
title={E2EGit: A Dataset of End-to-End Web Tests in Open Source Projects},
author={Di Meglio, Sergio and Starace, Luigi Libero Lucio and Pontillo, Valeria and Opdebeeck, Ruben and De Roover, Coen and Di Martino, Sergio},
booktitle={2025 IEEE/ACM 22nd International Conference on Mining Software Repositories (MSR)},
pages={10--15},
year={2025},
organization={IEEE/ACM}
}
This work has been partially supported by the Italian PNRR MUR project PE0000013-FAIR.
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
The World Bank is an international financial institution that provides loans to countries of the world for capital projects. The World Bank's stated goal is the reduction of poverty. Source: https://en.wikipedia.org/wiki/World_Bank
This dataset combines key health statistics from a variety of sources to provide a look at global health and population trends. It includes information on nutrition, reproductive health, education, immunization, and diseases from over 200 countries.
Update Frequency: Biannual
For more information, see the World Bank website.
Fork this kernel to get started with this dataset.
https://datacatalog.worldbank.org/dataset/health-nutrition-and-population-statistics
https://cloud.google.com/bigquery/public-data/world-bank-hnp
Dataset Source: World Bank. This dataset is publicly available for anyone to use under the following terms provided by the Dataset Source - http://www.data.gov/privacy-policy#data_policy - and is provided "AS IS" without any warranty, express or implied, from Google. Google disclaims all liability for any damages, direct or indirect, resulting from the use of the dataset.
Citation: The World Bank: Health Nutrition and Population Statistics
Banner Photo by @till_indeman from Unplash.
What’s the average age of first marriages for females around the world?
Our Web Scraping dataset includes such data points as company name, location, headcount, industry, and size, among others. It offers extensive fresh and historical data, including even companies that operate in stealth mode.
For lead generation
With millions of companies from around the globe, this scraped data enables you to filter potential clients based on specific criteria and hasten the conversion process.
Use cases
For market and business analysis
Our Web Scraping Data on companies gives information about millions of businesses, allowing you to evaluate your competitors.
Use cases
For Investors
We recommend Web Scraping Data for investors to discover and evaluate businesses with the highest potential.
Gain strategic business insights, enhance decision-making, and maintain algorithms that signal investment opportunities with Coresignal’s global Web Scraping Data.
Use cases
For sales prospecting
Web Scraping Data saves time your employees would otherwise use it to find potential clients and choose the best prospects manually.
Use cases
The NASA Earth Exchange (NEX) Global Daily Downscaled Projections (NEX-GDDP) dataset is comprised of downscaled climate scenarios that are derived from the General Circulation Model (GCM) runs conducted under the Coupled Model Intercomparison Project Phase 5 (CMIP5) [Taylor et al. 2012] and across the two of the four greenhouse gas emissions scenarios known as Representative Concentration Pathways (RCPs) [Meinshausen et al. 2011] developed for the Fifth Assessment Report of the Intergovernmental Panel on Climate Change (IPCC AR5). The dataset is an ensemble of projections from 21 different models and two RCPs (RCP 4.5 and RCP 8.5), and provides daily estimates of maximum and minimum temperatures and precipitation using a daily Bias-Correction - Spatial Disaggregation (BCSD) method (Thrasher, et al., 2012). The data spans the entire globe with a 0.25 degree (~25-kilometer) spatial resolution for the periods from 1950 through 2005 (Historical) and from 2006 to 2100 (Climate Projections).
Soil is a key natural resource that provides the foundation of basic ecosystem services. Soil determines the types of farms and forests that can grow on a landscape. Soil filters water. Soil helps regulate the Earth's climate by storing large amounts of carbon. Activities that degrade soils reduce the value of the ecosystem services that soil provides. For example, since 1850 35% of human caused green house gas emissions are linked to land use change. The Soil Science Society of America is a good source of of additional information.Dataset SummaryThis layer provides access to a 30 arc-second (roughly 1 km) cell-sized raster with attributes describing the basic properties of soil derived from the Harmonized World Soil Database v 1.2. The values in this layer are for the dominant soil in each mapping unit (sequence field = 1).Attributes in this layer include:Soil Phase 1 and Soil Phase 2 - Phases identify characteristics of soils important for land use or management. Soils may have up to 2 phases with phase 1 being more important than phase 2.Other Properties - provides additional information important for agriculture.Additionally, 3 class description fields were added by Esri based on the document Harmonized World Soil Database Version 1.2 for use in web map pop-ups:Soil Phase 1 DescriptionSoil Phase 2 DescriptionOther Properties DescriptionThe layer is symbolized with the Soil Unit Name field.The document Harmonized World Soil Database Version 1.2 provides more detail on the soil properties attributes contained in this layer.Other attributes contained in this layer include:Soil Mapping Unit Name - the name of the spatially dominant major soil groupSoil Mapping Unit Symbol - a two letter code for labeling the spatially dominant major soil group in thematic mapsData Source - the HWSD is an aggregation of datasets. The data sources are the European Soil Database (ESDB), the 1:1 million soil map of China (CHINA), the Soil and Terrain Database Program (SOTWIS), and the Digital Soil Map of the World (DSMW).Percentage of Mapping Unit covered by dominant componentMore information on the Harmonized World Soil Database is available here.Other layers created from the Harmonized World Soil Database are available on ArcGIS Online:World Soils Harmonized World Soil Database - Bulk DensityWorld Soils Harmonized World Soil Database – ChemistryWorld Soils Harmonized World Soil Database - Exchange CapacityWorld Soils Harmonized World Soil Database – HydricWorld Soils Harmonized World Soil Database – TextureThe authors of this data set request that projects using these data include the following citation:FAO/IIASA/ISRIC/ISSCAS/JRC, 2012. Harmonized World Soil Database (version 1.2). FAO, Rome, Italy and IIASA, Laxenburg, Austria.What can you do with this layer?This layer is suitable for both visualization and analysis. It can be used in ArcGIS Online in web maps and applications and can be used in ArcGIS Desktop.This layer has query, identify, and export image services available. This layer is restricted to a maximum area of 16,000 x 16,000 pixels - an area 4,000 kilometers on a side or an area approximately the size of Europe. The source data for this layer are available here.This layer is part of a larger collection of landscape layers that you can use to perform a wide variety of mapping and analysis tasks.The Living Atlas of the World provides an easy way to explore the landscape layers and many other beautiful and authoritative maps on hundreds of topics.Geonet is a good resource for learning more about landscape layers and the Living Atlas of the World. To get started follow these links:Living Atlas Discussion GroupSoil Data Discussion GroupThe Esri Insider Blog provides an introduction to the Ecophysiographic Mapping project.
Public Domain Mark 1.0https://creativecommons.org/publicdomain/mark/1.0/
License information was derived automatically
The Sea Around Us is a research initiative at The University of British Columbia (located at the Institute for the Oceans and Fisheries, formerly Fisheries Centre) that assesses the impact of fisheries on the marine ecosystems of the world, and offers mitigating solutions to a range of stakeholders.
The Sea Around Us was initiated in collaboration with The Pew Charitable Trusts in 1999, and in 2014, the Sea Around Us also began a collaboration with The Paul G. Allen Family Foundation to provide African and Asian countries with more accurate and comprehensive fisheries data.
The Sea Around Us provides data and analyses through View Data, articles in peer-reviewed journals, and other media (News). The Sea Around Us regularly update products at the scale of countries’ Exclusive Economic Zones, Large Marine Ecosystems, the High Seas and other spatial scales, and as global maps and summaries.
The Sea Around Us emphasizes catch time series starting in 1950, and related series (e.g., landed value and catch by flag state, fishing sector and catch type), and fisheries-related information on every maritime country (e.g., government subsidies, marine biodiversity). Information is also offered on sub-projects, e.g., the historic expansion of fisheries, the performance of Regional Fisheries Management Organizations, or the likely impact of climate change on fisheries.
The information and data presented on their website is freely available to any user, granted that its source is acknowledged. The Sea Around Us is aware that this information may be incomplete. Please let them know about this via the feedback options available on this website.
If you cite or display any content from the Site, or reference the Sea Around Us, the Sea Around Us – Indian Ocean, the University of British Columbia or the University of Western Australia, in any format, written or otherwise, including print or web publications, presentations, grant applications, websites, other online applications such as blogs, or other works, you must provide appropriate acknowledgement using a citation consistent with the following standard:
When referring to various datasets downloaded from the website, and/or its concept or design, or to several datasets extracted from its underlying databases, cite its architects. Example: Pauly D., Zeller D., Palomares M.L.D. (Editors), 2020. Sea Around Us Concepts, Design and Data (seaaroundus.org).
When referring to a set of values extracted for a given country, EEZ or territory, cite the most recent catch reconstruction report or paper (available on the website) for that country, EEZ or territory. Example: For the Mexican Pacific EEZ, the citation should be “Cisneros-Montemayor AM, Cisneros-Mata MA, Harper S and Pauly D (2015) Unreported marine fisheries catch in Mexico, 1950-2010. Fisheries Centre Working Paper #2015-22, University of British Columbia, Vancouver. 9 p.”, which is accessible on the EEZ page for Mexico (Pacific) on seaaroundus.org.
To help us track the use of Sea Around Us data, we would appreciate you also citing Pauly, Zeller, and Palomares (2020) as the source of the information in an appropriate part of your text;
When using data from our website that are not part of a typical catch reconstruction (e.g., catches by LME or other spatial entity, subsidies given to fisheries, the estuaries in a given country, or the surface area of a given EEZ), cite both the website and the study that generated the underlying database. Many of these can be derived from the ’methods’ texts associated with data pages on seaaroundus.org. Example: Sumaila et al. (2010) for subsides, Alder (2003) for estuaries and Claus et al. (2014) for EEZ delineations, respectively.
The Sea Around Us data are (where not otherwise regulated) under a Creative Commons Attribution Non-Commercial 4.0 International License (https://creativecommons.org/licenses/by-nc/4.0/). Notices regarding copyrights (© The University of British Columbia), license and disclaimer can be found under http://www.seaaroundus.org/terms-and-conditions/. References:
Alder J (2003) Putting the coast in the Sea Around Us Project. The Sea Around Us Newsletter (15): 1-2.
Cisneros-Montemayor AM, Cisneros-Mata MA, Harper S and Pauly D (2015) Unreported marine fisheries catch in Mexico, 1950-2010. Fisheries Centre Working Paper #2015-22, University of British Columbia, Vancouver. 9 p.
Pauly D, Zeller D, and Palomares M.L.D. (Editors) (2020) Sea Around Us Concepts, Design and Data (www.seaaroundus.org)
Claus S, De Hauwere N, Vanhoorne B, Deckers P, Souza Dias F, Hernandez F and Mees J (2014) Marine Regions: Towards a global standard for georeferenced marine names and boundaries. Marine Geodesy 37(2): 99-125.
Sumaila UR, Khan A, Dyck A, Watson R, Munro R, Tydemers P and Pauly D (2010) A bottom-up re-estimation of global fisheries subsidies. Journal of Bioeconomics 12: 201-225.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Analysis of ‘Popular Website Traffic Over Time ’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/yamqwe/popular-website-traffice on 13 February 2022.
--- Dataset description provided by original source is as follows ---
Background
Have you every been in a conversation and the question comes up, who uses Bing? This question comes up occasionally because people wonder if these sites have any views. For this research study, we are going to be exploring popular website traffic for many popular websites.
Methodology
The data collected originates from SimilarWeb.com.
Source
For the analysis and study, go to The Concept Center
This dataset was created by Chase Willden and contains around 0 samples along with 1/1/2017, Social Media, technical information and other features such as: - 12/1/2016 - 3/1/2017 - and more.
- Analyze 11/1/2016 in relation to 2/1/2017
- Study the influence of 4/1/2017 on 1/1/2017
- More datasets
If you use this dataset in your research, please credit Chase Willden
--- Original source retains full ownership of the source dataset ---