68 datasets found

DataForSEO Labs API for keyword research and search analytics, real-time...
datarade.ai
.json
Updated Jun 4, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
DataForSEO (2021). DataForSEO Labs API for keyword research and search analytics, real-time data for all Google locations and languages [Dataset]. https://datarade.ai/data-products/dataforseo-labs-api-for-keyword-research-and-search-analytics-dataforseo
Explore at:
.jsonAvailable download formats
Dataset updated
Jun 4, 2021
Dataset provided by
Authors
DataForSEO
Area covered
Tokelau, Kenya, Morocco, Korea (Democratic People's Republic of), Azerbaijan, Mauritania, Micronesia (Federated States of), Isle of Man, Cocos (Keeling) Islands, Armenia
Description
DataForSEO Labs API offers three powerful keyword research algorithms and historical keyword data:

• Related Keywords from the “searches related to” element of Google SERP. • Keyword Suggestions that match the specified seed keyword with additional words before, after, or within the seed key phrase. • Keyword Ideas that fall into the same category as specified seed keywords. • Historical Search Volume with current cost-per-click, and competition values.

Based on in-market categories of Google Ads, you can get keyword ideas from the relevant Categories For Domain and discover relevant Keywords For Categories. You can also obtain Top Google Searches with AdWords and Bing Ads metrics, product categories, and Google SERP data.

You will find well-rounded ways to scout the competitors:

• Domain Whois Overview with ranking and traffic info from organic and paid search. • Ranked Keywords that any domain or URL has positions for in SERP. • SERP Competitors and the rankings they hold for the keywords you specify. • Competitors Domain with a full overview of its rankings and traffic from organic and paid search. • Domain Intersection keywords for which both specified domains rank within the same SERPs. • Subdomains for the target domain you specify along with the ranking distribution across organic and paid search. • Relevant Pages of the specified domain with rankings and traffic data. • Domain Rank Overview with ranking and traffic data from organic and paid search. • Historical Rank Overview with historical data on rankings and traffic of the specified domain from organic and paid search. • Page Intersection keywords for which the specified pages rank within the same SERP.

All DataForSEO Labs API endpoints function in the Live mode. This means you will be provided with the results in response right after sending the necessary parameters with a POST request.

The limit is 2000 API calls per minute, however, you can contact our support team if your project requires higher rates.

We offer well-rounded API documentation, GUI for API usage control, comprehensive client libraries for different programming languages, free sandbox API testing, ad hoc integration, and deployment support.

We have a pay-as-you-go pricing model. You simply add funds to your account and use them to get data. The account balance doesn't expire.
Google Analytics Sample
kaggle.com
zip
Updated Sep 19, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Google BigQuery (2019). Google Analytics Sample [Dataset]. https://www.kaggle.com/datasets/bigquery/google-analytics-sample
Explore at:
zip(0 bytes)Available download formats
Dataset updated
Sep 19, 2019
Dataset provided by
BigQueryhttps://cloud.google.com/bigquery
Googlehttp://google.com/
Authors
Google BigQuery
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Context

The Google Merchandise Store sells Google branded merchandise. The data is typical of what you would see for an ecommerce website.

Content

The sample dataset contains Google Analytics 360 data from the Google Merchandise Store, a real ecommerce store. The Google Merchandise Store sells Google branded merchandise. The data is typical of what you would see for an ecommerce website. It includes the following kinds of information:

Traffic source data: information about where website visitors originate. This includes data about organic traffic, paid search traffic, display traffic, etc. Content data: information about the behavior of users on the site. This includes the URLs of pages that visitors look at, how they interact with content, etc. Transactional data: information about the transactions that occur on the Google Merchandise Store website.

Fork this kernel to get started.

Acknowledgements

Data from: https://bigquery.cloud.google.com/table/bigquery-public-data:google_analytics_sample.ga_sessions_20170801

Banner Photo by Edho Pratama from Unsplash.

Inspiration

What is the total number of transactions generated per device browser in July 2017?

The real bounce rate is defined as the percentage of visits with a single pageview. What was the real bounce rate per traffic source?

What was the average number of product pageviews for users who made a purchase in July 2017?

What was the average number of product pageviews for users who did not make a purchase in July 2017?

What was the average total transactions per user that made a purchase in July 2017?

What is the average amount of money spent per session in July 2017?

What is the sequence of pages viewed?
Leading websites worldwide 2024, by monthly visits
statista.com
flwrdeptvarieties.store
Updated Mar 24, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2025). Leading websites worldwide 2024, by monthly visits [Dataset]. https://www.statista.com/statistics/1201880/most-visited-websites-worldwide/
Explore at:
Dataset updated
Mar 24, 2025
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
Nov 2024
Area covered
World
Description
In November 2024, Google.com was the most popular website worldwide with 136 billion average monthly visits. The online platform has held the top spot as the most popular website since June 2010, when it pulled ahead of Yahoo into first place. Second-ranked YouTube generated more than 72.8 billion monthly visits in the measured period. The internet leaders: search, social, and e-commerce Social networks, search engines, and e-commerce websites shape the online experience as we know it. While Google leads the global online search market by far, YouTube and Facebook have become the world’s most popular websites for user generated content, solidifying Alphabet’s and Meta’s leadership over the online landscape. Meanwhile, websites such as Amazon and eBay generate millions in profits from the sale and distribution of goods, making the e-market sector an integral part of the global retail scene. What is next for online content? Powering social media and websites like Reddit and Wikipedia, user-generated content keeps moving the internet’s engines. However, the rise of generative artificial intelligence will bring significant changes to how online content is produced and handled. ChatGPT is already transforming how online search is performed, and news of Google's 2024 deal for licensing Reddit content to train large language models (LLMs) signal that the internet is likely to go through a new revolution. While AI's impact on the online market might bring both opportunities and challenges, effective content management will remain crucial for profitability on the web.
ScrapeHero Data Cloud - Free and Easy to use
datarade.ai
.json, .csv
Updated Feb 8, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Scrapehero (2022). ScrapeHero Data Cloud - Free and Easy to use [Dataset]. https://datarade.ai/data-products/scrapehero-data-cloud-free-and-easy-to-use-scrapehero
Explore at:
.json, .csvAvailable download formats
Dataset updated
Feb 8, 2022
Dataset provided by
ScrapeHero
Authors
Scrapehero
Area covered
Bhutan, Bahamas, Ghana, Slovakia, Anguilla, Dominica, Niue, Portugal, Chad, Bahrain
Description
The Easiest Way to Collect Data from the Internet Download anything you see on the internet into spreadsheets within a few clicks using our ready-made web crawlers or a few lines of code using our APIs

We have made it as simple as possible to collect data from websites

Easy to Use Crawlers Amazon Product Details and Pricing Scraper Amazon Product Details and Pricing Scraper Get product information, pricing, FBA, best seller rank, and much more from Amazon.

Google Maps Search Results Google Maps Search Results Get details like place name, phone number, address, website, ratings, and open hours from Google Maps or Google Places search results.

Twitter Scraper Twitter Scraper Get tweets, Twitter handle, content, number of replies, number of retweets, and more. All you need to provide is a URL to a profile, hashtag, or an advance search URL from Twitter.

Amazon Product Reviews and Ratings Amazon Product Reviews and Ratings Get customer reviews for any product on Amazon and get details like product name, brand, reviews and ratings, and more from Amazon.

Google Reviews Scraper Google Reviews Scraper Scrape Google reviews and get details like business or location name, address, review, ratings, and more for business and places.

Walmart Product Details & Pricing Walmart Product Details & Pricing Get the product name, pricing, number of ratings, reviews, product images, URL other product-related data from Walmart.

Amazon Search Results Scraper Amazon Search Results Scraper Get product search rank, pricing, availability, best seller rank, and much more from Amazon.

Amazon Best Sellers Amazon Best Sellers Get the bestseller rank, product name, pricing, number of ratings, rating, product images, and more from any Amazon Bestseller List.

Google Search Scraper Google Search Scraper Scrape Google search results and get details like search rank, paid and organic results, knowledge graph, related search results, and more.

Walmart Product Reviews & Ratings Walmart Product Reviews & Ratings Get customer reviews for any product on Walmart.com and get details like product name, brand, reviews, and ratings.

Scrape Emails and Contact Details Scrape Emails and Contact Details Get emails, addresses, contact numbers, social media links from any website.

Walmart Search Results Scraper Walmart Search Results Scraper Get Product details such as pricing, availability, reviews, ratings, and more from Walmart search results and categories.

Glassdoor Job Listings Glassdoor Job Listings Scrape job details such as job title, salary, job description, location, company name, number of reviews, and ratings from Glassdoor.

Indeed Job Listings Indeed Job Listings Scrape job details such as job title, salary, job description, location, company name, number of reviews, and ratings from Indeed.

LinkedIn Jobs Scraper Premium LinkedIn Jobs Scraper Scrape job listings on LinkedIn and extract job details such as job title, job description, location, company name, number of reviews, and more.

Redfin Scraper Premium Redfin Scraper Scrape real estate listings from Redfin. Extract property details such as address, price, mortgage, redfin estimate, broker name and more.

Yelp Business Details Scraper Yelp Business Details Scraper Scrape business details from Yelp such as phone number, address, website, and more from Yelp search and business details page.

Zillow Scraper Premium Zillow Scraper Scrape real estate listings from Zillow. Extract property details such as address, price, Broker, broker name and more.

Amazon product offers and third party sellers Amazon product offers and third party sellers Get product pricing, delivery details, FBA, seller details, and much more from the Amazon offer listing page.

Realtor Scraper Premium Realtor Scraper Scrape real estate listings from Realtor.com. Extract property details such as Address, Price, Area, Broker and more.

Target Product Details & Pricing Target Product Details & Pricing Get product details from search results and category pages such as pricing, availability, rating, reviews, and 20+ data points from Target.

Trulia Scraper Premium Trulia Scraper Scrape real estate listings from Trulia. Extract property details such as Address, Price, Area, Mortgage and more.

Amazon Customer FAQs Amazon Customer FAQs Get FAQs for any product on Amazon and get details like the question, answer, answered user name, and more.

Yellow Pages Scraper Yellow Pages Scraper Get details like business name, phone number, address, website, ratings, and more from Yellow Pages search results.
c
City Website Analytics
data.ccrpc.org
csv, json, rdf, xml
Updated Aug 3, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
data.urbanaillinois.us (2022). City Website Analytics [Dataset]. https://data.ccrpc.org/dataset/city-website-analytics
Explore at:
xml, json, csv, rdfAvailable download formats
Dataset updated
Aug 3, 2022
Dataset provided by
data.urbanaillinois.us
Description
Information about pages on the City's website including their age and their Google Analytics data (everything from "PageViews" and to the right). If the Google Analytics fields are empty, the page hasn't been visited recently at all.
i
Online Shoppers Purchasing Intention Dataset
ieee-dataport.org
Updated Jan 9, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
C. O. Sakar (2025). Online Shoppers Purchasing Intention Dataset [Dataset]. http://doi.org/10.21227/e73k-cd23
Explore at:
Unique identifier
https://doi.org/10.21227/e73k-cd23
Dataset updated
Jan 9, 2025
Dataset provided by
IEEE Dataport
Authors
C. O. Sakar
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The dataset consists of feature vectors belonging to 12,330 sessions. The dataset was formed so that each session would belong to a different user in a 1-year period to avoid any tendency to a specific campaign, special day, user profile, or period. Of the 12,330 sessions in the dataset, 84.5% (10,422) were negative class samples that did not end with shopping, and the rest (1908) were positive class samples ending with shopping.The dataset consists of 10 numerical and 8 categorical attributes. The 'Revenue' attribute can be used as the class label.The dataset contains 18 columns, each representing specific attributes of online shopping behavior:Administrative and Administrative_Duration: Number of pages visited and time spent on administrative pages.Informational and Informational_Duration: Number of pages visited and time spent on informational pages.ProductRelated and ProductRelated_Duration: Number of pages visited and time spent on product-related pages.BounceRates and ExitRates: Metrics indicating user behavior during the session.PageValues: Value of the page based on e-commerce metrics.SpecialDay: Likelihood of shopping based on special days.Month: Month of the session.OperatingSystems, Browser, Region, TrafficType: Technical and geographical attributes.VisitorType: Categorizes users as returning, new, or others.Weekend: Indicates if the session occurred on a weekend.Revenue: Target variable indicating whether a transaction was completed (True or False).The original dataset has been picked up from the UCI Machine Learning Repository, the link to which is as follows :https://archive.ics.uci.edu/dataset/468/online+shoppers+purchasing+intention+datasetAdditional Variable InformationThe dataset consists of 10 numerical and 8 categorical attributes. The 'Revenue' attribute can be used as the class label. "Administrative", "Administrative Duration", "Informational", "Informational Duration", "Product Related" and "Product Related Duration" represent the number of different types of pages visited by the visitor in that session and total time spent in each of these page categories. The values of these features are derived from the URL information of the pages visited by the user and updated in real time when a user takes an action, e.g. moving from one page to another. The "Bounce Rate", "Exit Rate" and "Page Value" features represent the metrics measured by "Google Analytics" for each page in the e-commerce site. The value of "Bounce Rate" feature for a web page refers to the percentage of visitors who enter the site from that page and then leave ("bounce") without triggering any other requests to the analytics server during that session. The value of "Exit Rate" feature for a specific web page is calculated as for all pageviews to the page, the percentage that were the last in the session. The "Page Value" feature represents the average value for a web page that a user visited before completing an e-commerce transaction. The "Special Day" feature indicates the closeness of the site visiting time to a specific special day (e.g. Mother’s Day, Valentine's Day) in which the sessions are more likely to be finalized with transaction. The value of this attribute is determined by considering the dynamics of e-commerce such as the duration between the order date and delivery date. For example, for Valentina’s day, this value takes a nonzero value between February 2 and February 12, zero before and after this date unless it is close to another special day, and its maximum value of 1 on February 8. The dataset also includes operating system, browser, region, traffic type, visitor type as returning or new visitor, a Boolean value indicating whether the date of the visit is weekend, and month of the year.

DoH-Gen-F-FGHOQS

zenodo.org
data.niaid.nih.gov

application/gzip, csv

Updated Jun 3, 2022

+ more versions

Facebook

Twitter

Click to copy link

Link copied

Cite

Kamil Jeřábek; Kamil Jeřábek; Karel Hynek; Karel Hynek; Tomáš Čejka; Tomáš Čejka; Ondřej Ryšavý; Ondřej Ryšavý (2022). DoH-Gen-F-FGHOQS [Dataset]. http://doi.org/10.5281/zenodo.5957122

Explore at:

application/gzip, csvAvailable download formats

Unique identifier

https://doi.org/10.5281/zenodo.5957122

Dataset updated

Jun 3, 2022

Dataset provided by

Zenodohttp://zenodo.org/

Authors

Kamil Jeřábek; Kamil Jeřábek; Karel Hynek; Karel Hynek; Tomáš Čejka; Tomáš Čejka; Ondřej Ryšavý; Ondřej Ryšavý

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Please refer to the original data article for further data description: Jeřábek & Hynek et al., Collection of datasets with DNS over HTTPS traffic In: Data in Brief Journal ,DOI:10.1016/j.dib.2022.108310

Dataset of DNS over HTTPS traffic from Firefox (FFMuc, Google, Hostux, OpenDNS Quad9, Switch)
The dataset contains DoH and HTTPS traffic that was captured in a virtualized environment (Docker) and generated automatically by Firefox browser with enabled DoH towards 6 different DoH servers (FFMuc, Google, Hostux, OpenDNS Quad9, Switch) and a web page loads towards a sample of web pages taken from Majestic Million dataset. The data are provided in the form of PCAP files. However, we also provided TLS enriched flow data that are generated with opensource ipfixprobe flow exporter. Other than TLS related information is not relevant since the dataset comprises only encrypted TLS traffic. The TLS enriched flow data are provided in the form of CSV files with the following columns:

Column Name	Column Description
DST_IP	Destination IP address
SRC_IP	Source IP address
BYTES	The number of transmitted bytes from Source to Destination
BYTES_REV	The number of transmitted bytes from Destination to Source
TIME_FIRST	Timestamp of the first packet in the flow in format YYYY-MM-DDTHH-MM-SS
TIME_LAST	Timestamp of the last packet in the flow in format YYYY-MM-DDTHH-MM-SS
PACKETS	The number of packets transmitted from Source to Destination
PACKETS_REV	The number of packets transmitted from Destination to Source
DST_PORT	Destination port
SRC_PORT	Source port
PROTOCOL	The number of transport protocol
TCP_FLAGS	Logic OR across all TCP flags in the packets transmitted from Source to Destination
TCP_FLAGS_REV	Logic OR across all TCP flags in the packets transmitted from Destination to Source
TLS_ALPN	The Value of Application Protocol Negotiation Extension sent from Server
TLS_JA3	The JA3 fingerprint
TLS_SNI	The value of Server Name Indication Extension sent by Client

The DoH resolvers in the dataset can be identified by IP addresses written in doh_resolver_ip.csv file.

The main part of the dataset is located in DoH-Gen-F-FGHOQS.tar.gz and has the following structure:

.
└─── data          | - Main directory with data
   └── generated     | - Directory with generated captures
     ├── pcap      | - Generated PCAPs
     │  └── firefox
     └── tls-flow-csv  | - Generated CSV flow data
       └── firefox

Total stats of generated data:

Name	Value
Total Data Size	46.7 GB
Total files	12
DoH extracted tls flows	~98 K
Non-DoH extracted tls flows	~353 K

DoH Server information

Name	Provider	DoH query url
FFMuc	https://ffmuc.net	https://doh.ffmuc.net/dns-query
Google	https://google.com	https://dns.google/dns-query
Hostux	https://dns.hostux.net/en/	https://dns.hostux.net/dns-query
OpenDNS	https://www.opendns.com	https://doh.opendns.com/dns-query
Quad9	https://www.quad9.net	https://dns.quad9.net/dns-query
Switch	https://www.switch.ch	https://dns.switch.ch/dns-query

Please cite the original article:

@article{Jerabek2022,
title = {Collection of datasets with DNS over HTTPS traffic},
journal = {Data in Brief},
volume = {42},
pages = {108310},
year = {2022},
issn = {2352-3409},
doi = {https://doi.org/10.1016/j.dib.2022.108310},
url = {https://www.sciencedirect.com/science/article/pii/S2352340922005121},
author = {Kamil Jeřábek and Karel Hynek and Tomáš Čejka and Ondřej Ryšavý}
}

Digital Geomorphic-GIS Map of Cape Lookout National Seashore, North Carolina...
catalog.data.gov
datasets.ai
Updated Jun 4, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
National Park Service (2024). Digital Geomorphic-GIS Map of Cape Lookout National Seashore, North Carolina (1:24,000 scale 2008 mapping) (NPS, GRD, GRI, CALO, CALO_geomorphology digital map) adapted from North Carolina Geological Survey unpublished digital data and maps by Coffey and Nickerson (2008) [Dataset]. https://catalog.data.gov/dataset/digital-geomorphic-gis-map-of-cape-lookout-national-seashore-north-carolina-1-24000-scale-
Explore at:
Dataset updated
Jun 4, 2024
Dataset provided by
National Park Servicehttp://www.nps.gov/
Area covered
North Carolina, Cape Lookout
Description
The Digital Geomorphic-GIS Map of Cape Lookout National Seashore, North Carolina (1:24,000 scale 2008 mapping) is composed of GIS data layers and GIS tables, and is available in the following GRI-supported GIS data formats: 1.) an ESRI file geodatabase (calo_geomorphology.gdb), a 2.) Open Geospatial Consortium (OGC) geopackage, and 3.) 2.2 KMZ/KML file for use in Google Earth, however, this format version of the map is limited in data layers presented and in access to GRI ancillary table information. The file geodatabase format is supported with a 1.) ArcGIS Pro 3.X map file (.mapx) file (calo_geomorphology.mapx) and individual Pro 3.X layer (.lyrx) files (for each GIS data layer). The OGC geopackage is supported with a QGIS project (.qgz) file. Upon request, the GIS data is also available in ESRI shapefile format. Contact Stephanie O'Meara (see contact information below) to acquire the GIS data in these GIS data formats. In addition to the GIS data and supporting GIS files, three additional files comprise a GRI digital geologic-GIS dataset or map: 1.) a readme file (calo_geology_gis_readme.pdf), 2.) the GRI ancillary map information document (.pdf) file (calo_geomorphology.pdf) which contains geologic unit descriptions, as well as other ancillary map information and graphics from the source map(s) used by the GRI in the production of the GRI digital geologic-GIS data for the park, and 3.) a user-friendly FAQ PDF version of the metadata (calo_geomorphology_metadata_faq.pdf). Please read the calo_geology_gis_readme.pdf for information pertaining to the proper extraction of the GIS data and other map files. Google Earth software is available for free at: https://www.google.com/earth/versions/. QGIS software is available for free at: https://www.qgis.org/en/site/. Users are encouraged to only use the Google Earth data for basic visualization, and to use the GIS data for any type of data analysis or investigation. The data were completed as a component of the Geologic Resources Inventory (GRI) program, a National Park Service (NPS) Inventory and Monitoring (I&M) Division funded program that is administered by the NPS Geologic Resources Division (GRD). For a complete listing of GRI products visit the GRI publications webpage: https://www.nps.gov/subjects/geology/geologic-resources-inventory-products.htm. For more information about the Geologic Resources Inventory Program visit the GRI webpage: https://www.nps.gov/subjects/geology/gri.htm. At the bottom of that webpage is a "Contact Us" link if you need additional information. You may also directly contact the program coordinator, Jason Kenworthy (jason_kenworthy@nps.gov). Source geologic maps and data used to complete this GRI digital dataset were provided by the following: North Carolina Geological Survey. Detailed information concerning the sources used and their contribution the GRI product are listed in the Source Citation section(s) of this metadata record (calo_geomorphology_metadata.txt or calo_geomorphology_metadata_faq.pdf). Users of this data are cautioned about the locational accuracy of features within this dataset. Based on the source map scale of 1:24,000 and United States National Map Accuracy Standards features are within (horizontally) 12.2 meters or 40 feet of their actual location as presented by this dataset. Users of this data should thus not assume the location of features is exactly where they are portrayed in Google Earth, ArcGIS Pro, QGIS or other software used to display this dataset. All GIS and ancillary tables were produced as per the NPS GRI Geology-GIS Geodatabase Data Model v. 2.3. (available at: https://www.nps.gov/articles/gri-geodatabase-model.htm).
Google Landmarks Dataset v2
github.com
paperswithcode.com
+2more
Updated Sep 27, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Google (2019). Google Landmarks Dataset v2 [Dataset]. https://github.com/cvdfoundation/google-landmark
Explore at:
Dataset updated
Sep 27, 2019
Dataset provided by
Googlehttp://google.com/
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This is the second version of the Google Landmarks dataset (GLDv2), which contains images annotated with labels representing human-made and natural landmarks. The dataset can be used for landmark recognition and retrieval experiments. This version of the dataset contains approximately 5 million images, split into 3 sets of images: train, index and test. The dataset was presented in our CVPR'20 paper. In this repository, we present download links for all dataset files and relevant code for metric computation. This dataset was associated to two Kaggle challenges, on landmark recognition and landmark retrieval. Results were discussed as part of a CVPR'19 workshop. In this repository, we also provide scores for the top 10 teams in the challenges, based on the latest ground-truth version. Please visit the challenge and workshop webpages for more details on the data, tasks and technical solutions from top teams.
Common languages used for web content 2025, by share of websites
statista.com
Updated Feb 11, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2025). Common languages used for web content 2025, by share of websites [Dataset]. https://www.statista.com/statistics/262946/most-common-languages-on-the-internet/
Explore at:
Dataset updated
Feb 11, 2025
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
Feb 2025
Area covered
Worldwide
Description
As of February 2025, English was the most popular language for web content, with over 49.4 percent of websites using it. Spanish ranked second, with six percent of web content, while the content in the German language followed, with 5.6 percent. English as the leading online language United States and India, the countries with the most internet users after China, are also the world's biggest English-speaking markets. The internet user base in both countries combined, as of January 2023, was over a billion individuals. This has led to most of the online information being created in English. Consequently, even those who are not native speakers may use it for convenience. Global internet usage by regions As of October 2024, the number of internet users worldwide was 5.52 billion. In the same period, Northern Europe and North America were leading in terms of internet penetration rates worldwide, with around 97 percent of its populations accessing the internet.
u
Data from: Inventory of online public databases and repositories holding...
agdatacommons.nal.usda.gov
datadiscoverystudio.org
+2more
txt
Updated Feb 8, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Erin Antognoli; Jonathan Sears; Cynthia Parr (2024). Inventory of online public databases and repositories holding agricultural data in 2017 [Dataset]. http://doi.org/10.15482/USDA.ADC/1389839
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.15482/USDA.ADC/1389839
Dataset updated
Feb 8, 2024
Dataset provided by
Ag Data Commons
Authors
Erin Antognoli; Jonathan Sears; Cynthia Parr
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
United States agricultural researchers have many options for making their data available online. This dataset aggregates the primary sources of ag-related data and determines where researchers are likely to deposit their agricultural data. These data serve as both a current landscape analysis and also as a baseline for future studies of ag research data. Purpose As sources of agricultural data become more numerous and disparate, and collaboration and open data become more expected if not required, this research provides a landscape inventory of online sources of open agricultural data. An inventory of current agricultural data sharing options will help assess how the Ag Data Commons, a platform for USDA-funded data cataloging and publication, can best support data-intensive and multi-disciplinary research. It will also help agricultural librarians assist their researchers in data management and publication. The goals of this study were to

establish where agricultural researchers in the United States-- land grant and USDA researchers, primarily ARS, NRCS, USFS and other agencies -- currently publish their data, including general research data repositories, domain-specific databases, and the top journals compare how much data is in institutional vs. domain-specific vs. federal platforms determine which repositories are recommended by top journals that require or recommend the publication of supporting data ascertain where researchers not affiliated with funding or initiatives possessing a designated open data repository can publish data

Approach The National Agricultural Library team focused on Agricultural Research Service (ARS), Natural Resources Conservation Service (NRCS), and United States Forest Service (USFS) style research data, rather than ag economics, statistics, and social sciences data. To find domain-specific, general, institutional, and federal agency repositories and databases that are open to US research submissions and have some amount of ag data, resources including re3data, libguides, and ARS lists were analysed. Primarily environmental or public health databases were not included, but places where ag grantees would publish data were considered.
Search methods We first compiled a list of known domain specific USDA / ARS datasets / databases that are represented in the Ag Data Commons, including ARS Image Gallery, ARS Nutrition Databases (sub-components), SoyBase, PeanutBase, National Fungus Collection, i5K Workspace @ NAL, and GRIN. We then searched using search engines such as Bing and Google for non-USDA / federal ag databases, using Boolean variations of “agricultural data” /“ag data” / “scientific data” + NOT + USDA (to filter out the federal / USDA results). Most of these results were domain specific, though some contained a mix of data subjects. We then used search engines such as Bing and Google to find top agricultural university repositories using variations of “agriculture”, “ag data” and “university” to find schools with agriculture programs. Using that list of universities, we searched each university web site to see if their institution had a repository for their unique, independent research data if not apparent in the initial web browser search. We found both ag specific university repositories and general university repositories that housed a portion of agricultural data. Ag specific university repositories are included in the list of domain-specific repositories. Results included Columbia University – International Research Institute for Climate and Society, UC Davis – Cover Crops Database, etc. If a general university repository existed, we determined whether that repository could filter to include only data results after our chosen ag search terms were applied. General university databases that contain ag data included Colorado State University Digital Collections, University of Michigan ICPSR (Inter-university Consortium for Political and Social Research), and University of Minnesota DRUM (Digital Repository of the University of Minnesota). We then split out NCBI (National Center for Biotechnology Information) repositories. Next we searched the internet for open general data repositories using a variety of search engines, and repositories containing a mix of data, journals, books, and other types of records were tested to determine whether that repository could filter for data results after search terms were applied. General subject data repositories include Figshare, Open Science Framework, PANGEA, Protein Data Bank, and Zenodo. Finally, we compared scholarly journal suggestions for data repositories against our list to fill in any missing repositories that might contain agricultural data. Extensive lists of journals were compiled, in which USDA published in 2012 and 2016, combining search results in ARIS, Scopus, and the Forest Service's TreeSearch, plus the USDA web sites Economic Research Service (ERS), National Agricultural Statistics Service (NASS), Natural Resources and Conservation Service (NRCS), Food and Nutrition Service (FNS), Rural Development (RD), and Agricultural Marketing Service (AMS). The top 50 journals' author instructions were consulted to see if they (a) ask or require submitters to provide supplemental data, or (b) require submitters to submit data to open repositories. Data are provided for Journals based on a 2012 and 2016 study of where USDA employees publish their research studies, ranked by number of articles, including 2015/2016 Impact Factor, Author guidelines, Supplemental Data?, Supplemental Data reviewed?, Open Data (Supplemental or in Repository) Required? and Recommended data repositories, as provided in the online author guidelines for each the top 50 journals. Evaluation We ran a series of searches on all resulting general subject databases with the designated search terms. From the results, we noted the total number of datasets in the repository, type of resource searched (datasets, data, images, components, etc.), percentage of the total database that each term comprised, any dataset with a search term that comprised at least 1% and 5% of the total collection, and any search term that returned greater than 100 and greater than 500 results. We compared domain-specific databases and repositories based on parent organization, type of institution, and whether data submissions were dependent on conditions such as funding or affiliation of some kind. Results A summary of the major findings from our data review:

Over half of the top 50 ag-related journals from our profile require or encourage open data for their published authors. There are few general repositories that are both large AND contain a significant portion of ag data in their collection. GBIF (Global Biodiversity Information Facility), ICPSR, and ORNL DAAC were among those that had over 500 datasets returned with at least one ag search term and had that result comprise at least 5% of the total collection.
Not even one quarter of the domain-specific repositories and datasets reviewed allow open submission by any researcher regardless of funding or affiliation.

See included README file for descriptions of each individual data file in this dataset. Resources in this dataset:Resource Title: Journals. File Name: Journals.csvResource Title: Journals - Recommended repositories. File Name: Repos_from_journals.csvResource Title: TDWG presentation. File Name: TDWG_Presentation.pptxResource Title: Domain Specific ag data sources. File Name: domain_specific_ag_databases.csvResource Title: Data Dictionary for Ag Data Repository Inventory. File Name: Ag_Data_Repo_DD.csvResource Title: General repositories containing ag data. File Name: general_repos_1.csvResource Title: README and file inventory. File Name: README_InventoryPublicDBandREepAgData.txt
Z
Dataset and trained models belonging to the article 'Distant reading...
data.niaid.nih.gov
zenodo.org
Updated Sep 28, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Smits, Thomas (2021). Dataset and trained models belonging to the article 'Distant reading patterns of iconicity in 940.000 online circulations of 26 iconic photographs' [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_4244000
Explore at:
Dataset updated
Sep 28, 2021
Dataset provided by
Ros, Ruben
Smits, Thomas
Description
Quantifying Iconicity - Zenodo

The Dataset

This dataset contains the material collected for the article "Distant reading 940,000 online circulations of 26 iconic photographs" (to be) published in New Media & Society (DOI: 10.1177/14614448211049459). We identified 26 iconic photographs based on earlier work (Van der Hoeven, 2019). The Google Cloud Vision (GCV) API was subsequently used to identify webpages that host a reproduction of the iconic image. The GCV API uses computer vision methods and the Google index to retrieve these reproductions. The code for calling the API and parsing the data can be found on GitHub: https://github.com/rubenros1795/ReACT_GCV.

The core dataset consists of .tsv-files with the URLs that refer to the webpages. Other metadata provided by the GCV API is also found in the file and manually generated metadata. This includes: - the URL that refers specifically to the image. This can be an URL that refers to a full match or a partial match - the title of the page - the iteration number. Because the GCV API puts a limit on its output, we had to reupload the identified images to the API to extend our search. We continued these iterations until no more new unique URLs were found - the language found by the langid Python module link, along with the normalized score. - the labels associated with the image by Google - the scrape date

Alongside the .tsv-files, there are several other elements in the following folder structure:

├── data │ ├── embeddings │ └── doc2vec │ └── input-text │ └── metadata │ └── umap │ └── evaluation │ └── results │ └── diachronic-plots │ └── top-words │ └── tsv

The /embeddings folder contains the doc2vec models, the training input for the models, the metadata (id, URL, date) and the UMAP embeddings used in the GMM clustering. Please note that the date parser was not able to find dates for all webpages and for this reason not all training texts have associated metadata.

The /evaluation folder contains the AIC and BIC scores for GMM clustering with different numbers of clusters.

The /results folder contains the top words associated with the clusters and the diachronic cluster prominence plots.

Data Cleaning and Curation

Our pipeline contained several interventions to prevent noise in the data. First, in between the iterations we manually checked the scraped photos for relevance. We did so because reuploading an iconic image that is paired with another, irrelevant, one results in reproductions of the irrelevant one in the next iteration. Because we did not catch all noise, we used Scale Invariant Feature Transform (SIFT), a basic computer vision algorithm, to remove images that did not meet a threshold of ten keypoints. By doing so we removed completely unrelated photographs, but left room for variations of the original (such as painted versions of Che Guevara, or cropped versions of the Napalm Girl image). Another issue was the parsing of webpage texts. After experimenting with different webpage parsers that aim to extract 'relevant' text it proved too difficult to use one solution for all our webpages. Therefore we simply parsed all the text contained in commonly used html-tags, such as <p>, <h1> etc.
Data from: Census of State and Local Law Enforcement Agencies (CSLLEA), 2008...
icpsr.umich.edu
catalog.data.gov
ascii, delimited, sas +2
Updated Aug 3, 2011
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
United States Department of Justice. Office of Justice Programs. Bureau of Justice Statistics (2011). Census of State and Local Law Enforcement Agencies (CSLLEA), 2008 [Dataset]. http://doi.org/10.3886/ICPSR27681.v1
Explore at:
stata, spss, sas, ascii, delimitedAvailable download formats
Unique identifier
https://doi.org/10.3886/ICPSR27681.v1
Dataset updated
Aug 3, 2011
Dataset provided by
Inter-university Consortium for Political and Social Researchhttps://www.icpsr.umich.edu/web/pages/
Authors
United States Department of Justice. Office of Justice Programs. Bureau of Justice Statistics
License
https://www.icpsr.umich.edu/web/ICPSR/studies/27681/termshttps://www.icpsr.umich.edu/web/ICPSR/studies/27681/terms
Time period covered
2008
Area covered
United States
Dataset funded by
United States Department of Justicehttp://justice.gov/
Office of Justice Programshttps://ojp.gov/
Bureau of Justice Statisticshttp://bjs.ojp.gov/
Description
The BJS Census of State and Local Law Enforcement Agencies (CSLLEA) is conducted every 4 years to provide a complete enumeration of agencies and their employees. Employment data are reported by agencies for sworn and nonsworn (civilian) personnel and, within these categories, by full-time or part-time status. The pay period that included September 30, 2008, was the reference date for all personnel data. Agencies also complete a checklist of functions they regularly perform, or have primary responsibility for, within the following areas: patrol and response, criminal investigation, traffic and vehicle-related functions, detention-related functions, court-related functions, special public safety functions (e.g., animal control), task force participation, and specialized functions (e.g., search and rescue). The CSLLEA provides national data on the number of state and local law enforcement agencies and employees for local police departments, sheriffs' offices, state law enforcement agencies, and special jurisdiction agencies. It also serves as the sampling frame for BJS surveys of law enforcement agencies.
Digital Geologic-GIS Map of Sagamore Hill National Historic Site and...
catalog.data.gov
gimi9.com
+1more
Updated Jun 5, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
National Park Service (2024). Digital Geologic-GIS Map of Sagamore Hill National Historic Site and Vicinity, New York (NPS, GRD, GRI, SAHI, SAHI digital map) adapted from U.S. Geological Survey Water-Supply Paper maps by Isbister (1966) and Lubke (1964) [Dataset]. https://catalog.data.gov/dataset/digital-geologic-gis-map-of-sagamore-hill-national-historic-site-and-vicinity-new-york-nps
Explore at:
Dataset updated
Jun 5, 2024
Dataset provided by
National Park Servicehttp://www.nps.gov/
Area covered
New York
Description
The Digital Geologic-GIS Map of Sagamore Hill National Historic Site and Vicinity, New York is composed of GIS data layers and GIS tables, and is available in the following GRI-supported GIS data formats: 1.) a 10.1 file geodatabase (sahi_geology.gdb), a 2.) Open Geospatial Consortium (OGC) geopackage, and 3.) 2.2 KMZ/KML file for use in Google Earth, however, this format version of the map is limited in data layers presented and in access to GRI ancillary table information. The file geodatabase format is supported with a 1.) ArcGIS Pro map file (.mapx) file (sahi_geology.mapx) and individual Pro layer (.lyrx) files (for each GIS data layer), as well as with a 2.) 10.1 ArcMap (.mxd) map document (sahi_geology.mxd) and individual 10.1 layer (.lyr) files (for each GIS data layer). The OGC geopackage is supported with a QGIS project (.qgz) file. Upon request, the GIS data is also available in ESRI 10.1 shapefile format. Contact Stephanie O'Meara (see contact information below) to acquire the GIS data in these GIS data formats. In addition to the GIS data and supporting GIS files, three additional files comprise a GRI digital geologic-GIS dataset or map: 1.) A GIS readme file (sahi_geology_gis_readme.pdf), 2.) the GRI ancillary map information document (.pdf) file (sahi_geology.pdf) which contains geologic unit descriptions, as well as other ancillary map information and graphics from the source map(s) used by the GRI in the production of the GRI digital geologic-GIS data for the park, and 3.) a user-friendly FAQ PDF version of the metadata (sahi_geology_metadata_faq.pdf). Please read the sahi_geology_gis_readme.pdf for information pertaining to the proper extraction of the GIS data and other map files. Google Earth software is available for free at: https://www.google.com/earth/versions/. QGIS software is available for free at: https://www.qgis.org/en/site/. Users are encouraged to only use the Google Earth data for basic visualization, and to use the GIS data for any type of data analysis or investigation. The data were completed as a component of the Geologic Resources Inventory (GRI) program, a National Park Service (NPS) Inventory and Monitoring (I&M) Division funded program that is administered by the NPS Geologic Resources Division (GRD). For a complete listing of GRI products visit the GRI publications webpage: For a complete listing of GRI products visit the GRI publications webpage: https://www.nps.gov/subjects/geology/geologic-resources-inventory-products.htm. For more information about the Geologic Resources Inventory Program visit the GRI webpage: https://www.nps.gov/subjects/geology/gri,htm. At the bottom of that webpage is a "Contact Us" link if you need additional information. You may also directly contact the program coordinator, Jason Kenworthy (jason_kenworthy@nps.gov). Source geologic maps and data used to complete this GRI digital dataset were provided by the following: U.S. Geological Survey. Detailed information concerning the sources used and their contribution the GRI product are listed in the Source Citation section(s) of this metadata record (sahi_geology_metadata.txt or sahi_geology_metadata_faq.pdf). Users of this data are cautioned about the locational accuracy of features within this dataset. Based on the source map scale of 1:62,500 and United States National Map Accuracy Standards features are within (horizontally) 31.8 meters or 104.2 feet of their actual location as presented by this dataset. Users of this data should thus not assume the location of features is exactly where they are portrayed in Google Earth, ArcGIS, QGIS or other software used to display this dataset. All GIS and ancillary tables were produced as per the NPS GRI Geology-GIS Geodatabase Data Model v. 2.3. (available at: https://www.nps.gov/articles/gri-geodatabase-model.htm).
Z
Transparency in Keyword Faceted Search: a dataset of Google Shopping html...
data.niaid.nih.gov
zenodo.org
Updated Jan 24, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Petrocchi Marinella (2020). Transparency in Keyword Faceted Search: a dataset of Google Shopping html pages [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_1491556
Explore at:
Dataset updated
Jan 24, 2020
Dataset provided by
De Nicola Rocco
Hoang Van Tien
Petrocchi Marinella
Cozza Vittoria
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset contains a collection of around 2,000 HTML pages: these web pages contain the search results obtained in return to queries for different products, searched by a set of synthetic users surfing Google Shopping (US version) from different locations, in July, 2016.

Each file in the collection has a name where there is indicated the location from where the search has been done, the userID, and the searched product: no_email_LOCATION_USERID.PRODUCT.shopping_testing.#.html

The locations are Philippines (PHI), United States (US), India (IN). The userIDs: 26 to 30 for users searching from Philippines, 1 to 5 from US, 11 to 15 from India.

Products have been choice following 130 keywords (e.g., MP3 player, MP4 Watch, Personal organizer, Television, etc.).

In the following, we describe how the search results have been collected.

Each user has a fresh profile. The creation of a new profile corresponds to launch a new, isolated, web browser client instance and open the Google Shopping US web page.

To mimic real users, the synthetic users can browse, scroll pages, stay on a page, and click on links.

A fully-fledged web browser is used to get the correct desktop version of the website under investigation. This is because websites could be designed to behave according to user agents, as witnessed by the differences between the mobile and desktop versions of the same website.

The prices are the retail ones displayed by Google Shopping in US dollars (thus, excluding shipping fees).

Several frameworks have been proposed for interacting with web browsers and analysing results from search engines. This research adopts OpenWPM. OpenWPM is automatised with Selenium to efficiently create and manage different users with isolated Firefox and Chrome client instances, each of them with their own associated cookies.

The experiments run, on average, 24 hours. In each of them, the software runs on our local server, but the browser's traffic is redirected to the designated remote servers (i.e., to India), via tunneling in SOCKS proxies. This way, all commands are simultaneously distributed over all proxies. The experiments adopt the Mozilla Firefox browser (version 45.0) for the web browsing tasks and run under Ubuntu 14.04. Also, for each query, we consider the first page of results, counting 40 products. Among them, the focus of the experiments is mostly on the top 10 and top 3 results.

Due to connection errors, one of the Philippine profiles have no associated results. Also, for Philippines, a few keywords did not lead to any results: videocassette recorders, totes, umbrellas. Similarly, for US, no results were for totes and umbrellas.

The search results have been analyzed in order to check if there were evidence of price steering, based on users' location.

One term of usage applies:

In any research product whose findings are based on this dataset, please cite

@inproceedings{DBLP:conf/ircdl/CozzaHPN19, author = {Vittoria Cozza and Van Tien Hoang and Marinella Petrocchi and Rocco {De Nicola}}, title = {Transparency in Keyword Faceted Search: An Investigation on Google Shopping}, booktitle = {Digital Libraries: Supporting Open Science - 15th Italian Research Conference on Digital Libraries, {IRCDL} 2019, Pisa, Italy, January 31 - February 1, 2019, Proceedings}, pages = {29--43}, year = {2019}, crossref = {DBLP:conf/ircdl/2019}, url = {https://doi.org/10.1007/978-3-030-11226-4_3}, doi = {10.1007/978-3-030-11226-4_3}, timestamp = {Fri, 18 Jan 2019 23:22:50 +0100}, biburl = {https://dblp.org/rec/bib/conf/ircdl/CozzaHPN19}, bibsource = {dblp computer science bibliography, https://dblp.org} }
Digital Geologic-GIS Map of Joshua Tree National Park, California (NPS, GRD,...
s.cnmilf.com
datasets.ai
+2more
Updated Jun 4, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
National Park Service (2024). Digital Geologic-GIS Map of Joshua Tree National Park, California (NPS, GRD, GRI, JOTR, JOTR digital map) adapted from a U.S. Geological Survey Open-File Report map by Powell, Matti and Cossette (2015), and an ESRI USA Topo Web Map Service map (2013) [Dataset]. https://s.cnmilf.com/user74170196/https/catalog.data.gov/dataset/digital-geologic-gis-map-of-joshua-tree-national-park-california-nps-grd-gri-jotr-jotr-dig
Explore at:
Dataset updated
Jun 4, 2024
Dataset provided by
National Park Servicehttp://www.nps.gov/
Area covered
California, United States
Description
The Digital Geologic-GIS Map of Joshua Tree National Park, California is composed of GIS data layers and GIS tables, and is available in the following GRI-supported GIS data formats: 1.) a 10.1 file geodatabase (jotr_geology.gdb), a 2.) Open Geospatial Consortium (OGC) geopackage, and 3.) 2.2 KMZ/KML file for use in Google Earth, however, this format version of the map is limited in data layers presented and in access to GRI ancillary table information. The file geodatabase format is supported with a 1.) ArcGIS Pro map file (.mapx) file (jotr_geology.mapx) and individual Pro layer (.lyrx) files (for each GIS data layer), as well as with a 2.) 10.1 ArcMap (.mxd) map document (jotr_geology.mxd) and individual 10.1 layer (.lyr) files (for each GIS data layer). The OGC geopackage is supported with a QGIS project (.qgz) file. Upon request, the GIS data is also available in ESRI 10.1 shapefile format. Contact Stephanie O'Meara (see contact information below) to acquire the GIS data in these GIS data formats. In addition to the GIS data and supporting GIS files, three additional files comprise a GRI digital geologic-GIS dataset or map: 1.) a readme file (jotr_geology_gis_readme.pdf), 2.) the GRI ancillary map information document (.pdf) file (jotr_geology.pdf) which contains geologic unit descriptions, as well as other ancillary map information and graphics from the source map(s) used by the GRI in the production of the GRI digital geologic-GIS data for the park, and 3.) a user-friendly FAQ PDF version of the metadata (jotr_geology_metadata_faq.pdf). Please read the jotr_geology_gis_readme.pdf for information pertaining to the proper extraction of the GIS data and other map files. Google Earth software is available for free at: https://www.google.com/earth/versions/. QGIS software is available for free at: https://www.qgis.org/en/site/. Users are encouraged to only use the Google Earth data for basic visualization, and to use the GIS data for any type of data analysis or investigation. The data were completed as a component of the Geologic Resources Inventory (GRI) program, a National Park Service (NPS) Inventory and Monitoring (I&M) Division funded program that is administered by the NPS Geologic Resources Division (GRD). For a complete listing of GRI products visit the GRI publications webpage: For a complete listing of GRI products visit the GRI publications webpage: https://www.nps.gov/subjects/geology/geologic-resources-inventory-products.htm. For more information about the Geologic Resources Inventory Program visit the GRI webpage: https://www.nps.gov/subjects/geology/gri,htm. At the bottom of that webpage is a "Contact Us" link if you need additional information. You may also directly contact the program coordinator, Jason Kenworthy (jason_kenworthy@nps.gov). Source geologic maps and data used to complete this GRI digital dataset were provided by the following: U.S. Geological Survey and ESRI. Detailed information concerning the sources used and their contribution the GRI product are listed in the Source Citation section(s) of this metadata record (jotr_geology_metadata.txt or jotr_geology_metadata_faq.pdf). Users of this data are cautioned about the locational accuracy of features within this dataset. Based on the source map scale of 1:100,000 and United States National Map Accuracy Standards features are within (horizontally) 50.8 meters or 166.7 feet of their actual _location as presented by this dataset. Users of this data should thus not assume the _location of features is exactly where they are portrayed in Google Earth, ArcGIS, QGIS or other software used to display this dataset. All GIS and ancillary tables were produced as per the NPS GRI Geology-GIS Geodatabase Data Model v. 2.3. (available at: https://www.nps.gov/articles/gri-geodatabase-model.htm).
U
OpenStreetMap
data.ubdc.ac.uk
data.europa.eu
+1more
shp, xml
Updated Nov 8, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Greater London Authority (2023). OpenStreetMap [Dataset]. https://data.ubdc.ac.uk/dataset/openstreetmap
Explore at:
xml, shpAvailable download formats
Dataset updated
Nov 8, 2023
Dataset provided by
Greater London Authority
Description
http://www.openstreetmap.org/images/osm_logo.png" alt=""> OpenStreetMap (openstreetmap.org) is a global collaborative mapping project, which offers maps and map data released with an open license, encouraging free re-use and re-distribution. The data is created by a large community of volunteers who use a variety of simple on-the-ground surveying techniques, and wiki-syle editing tools to collaborate as they create the maps, in a process which is open to everyone. The project originated in London, and an active community of mappers and developers are based here. Mapping work in London is ongoing (and you can help!) but the coverage is already good enough for many uses.

Browse the map of London on OpenStreetMap.org

Downloads:

The whole of England updated daily:

england.osm.bz2 ~185M - .osm formatted raw XML data (compressed)

england.shp.zip ~156M - ESRI shapefiles

For more details of downloads available from OpenStreetMap, including downloading the whole planet, see 'planet.osm' on the wiki.

Data access APIs:

Download small areas of the map by bounding-box. For example this URL requests the data around Trafalgar Square:
http://api.openstreetmap.org/api/0.6/map?bbox=-0.13062,51.5065,-0.12557,51.50969

Data filtered by "tag". For example this URL returns all elements in London tagged shop=supermarket:
http://www.informationfreeway.org/api/0.6/*[shop=supermarket][bbox=-0.48,51.30,0.21,51.70]

The .osm format

The format of the data is a raw XML represention of all the elements making up the map. OpenStreetMap is composed of interconnected "nodes" and "ways" (and sometimes "relations") each with a set of name=value pairs called "tags". These classify and describe properties of the elements, and ultimately influence how they get drawn on the map. To understand more about tags, and different ways of working with this data format refer to the following pages on the OpenStreetMap wiki.

.osm - About the raw XML data format of OpenStreetMap

Downloading data - More details of the API and extract download options

Xapi - The extended api for filtering by tag

Map Features - A list of tags, and what map features they represent

Converting map data between formats - catalogue of tools for converting from .osm to other formats

Simple embedded maps

Rather than working with raw map data, you may prefer to embed maps from OpenStreetMap on your website with a simple bit of javascript. You can also present overlays of other data, in a manner very similar to working with google maps. In fact you can even use the google maps API to do this. See OSM on your own website for details and links to various javascript map libraries.

Help build the map!

The OpenStreetMap project aims to attract large numbers of contributors who all chip in a little bit to help build the map. Although the map editing tools take a little while to learn, they are designed to be as simple as possible, so that everyone can get involved. This project offers an exciting means of allowing local London communities to take ownership of their part of the map.

Read about how to Get Involved and see the London page for details of OpenStreetMap community events.
Digital Quaternary Surficial Geologic-GIS Map of Santa Barbara Island,...
catalog.data.gov
Updated Jun 5, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
National Park Service (2024). Digital Quaternary Surficial Geologic-GIS Map of Santa Barbara Island, California (NPS, GRD, GRI, CHIS, SABA_surficial digital map) adapted from a U.S. Geological Survey Scientific Investigations Map by Schmidt, Minor and Bedford (2018) [Dataset]. https://catalog.data.gov/dataset/digital-quaternary-surficial-geologic-gis-map-of-santa-barbara-island-california-nps-grd-g
Explore at:
Dataset updated
Jun 5, 2024
Dataset provided by
National Park Servicehttp://www.nps.gov/
Area covered
Santa Barbara Island, California
Description
The Digital Quaternary Surficial Geologic-GIS Map of Santa Barbara Island, California is composed of GIS data layers and GIS tables, and is available in the following GRI-supported GIS data formats: 1.) a 10.1 file geodatabase (saba_surficial_geology.gdb), a 2.) Open Geospatial Consortium (OGC) geopackage, and 3.) 2.2 KMZ/KML file for use in Google Earth, however, this format version of the map is limited in data layers presented and in access to GRI ancillary table information. The file geodatabase format is supported with a 1.) ArcGIS Pro map file (.mapx) file (saba_surficial_geology.mapx) and individual Pro layer (.lyrx) files (for each GIS data layer), as well as with a 2.) 10.1 ArcMap (.mxd) map document (saba_surficial_geology.mxd) and individual 10.1 layer (.lyr) files (for each GIS data layer). The OGC geopackage is supported with a QGIS project (.qgz) file. Upon request, the GIS data is also available in ESRI 10.1 shapefile format. Contact Stephanie O'Meara (see contact information below) to acquire the GIS data in these GIS data formats. In addition to the GIS data and supporting GIS files, three additional files comprise a GRI digital geologic-GIS dataset or map: 1.) this file (chis_geology_gis_readme.pdf), 2.) the GRI ancillary map information document (.pdf) file (chis_geology.pdf) which contains geologic unit descriptions, as well as other ancillary map information and graphics from the source map(s) used by the GRI in the production of the GRI digital geologic-GIS data for the park, and 3.) a user-friendly FAQ PDF version of the metadata (saba_surficial_geology_metadata_faq.pdf). Please read the chis_geology_gis_readme.pdf for information pertaining to the proper extraction of the GIS data and other map files. Google Earth software is available for free at: https://www.google.com/earth/versions/. QGIS software is available for free at: https://www.qgis.org/en/site/. Users are encouraged to only use the Google Earth data for basic visualization, and to use the GIS data for any type of data analysis or investigation. The data were completed as a component of the Geologic Resources Inventory (GRI) program, a National Park Service (NPS) Inventory and Monitoring (I&M) Division funded program that is administered by the NPS Geologic Resources Division (GRD). For a complete listing of GRI products visit the GRI publications webpage: For a complete listing of GRI products visit the GRI publications webpage: https://www.nps.gov/subjects/geology/geologic-resources-inventory-products.htm. For more information about the Geologic Resources Inventory Program visit the GRI webpage: https://www.nps.gov/subjects/geology/gri,htm. At the bottom of that webpage is a "Contact Us" link if you need additional information. You may also directly contact the program coordinator, Jason Kenworthy (jason_kenworthy@nps.gov). Source geologic maps and data used to complete this GRI digital dataset were provided by the following: U.S. Geological Survey. Detailed information concerning the sources used and their contribution the GRI product are listed in the Source Citation section(s) of this metadata record (saba_surficial_geology_metadata.txt or saba_surficial_geology_metadata_faq.pdf). Users of this data are cautioned about the locational accuracy of features within this dataset. Based on the source map scale of 1:12,000 and United States National Map Accuracy Standards features are within (horizontally) 6.1 meters or 20 feet of their actual location as presented by this dataset. Users of this data should thus not assume the location of features is exactly where they are portrayed in Google Earth, ArcGIS, QGIS or other software used to display this dataset. All GIS and ancillary tables were produced as per the NPS GRI Geology-GIS Geodatabase Data Model v. 2.3. (available at: https://www.nps.gov/articles/gri-geodatabase-model.htm).
australia.gov.au web traffic
data.gov.au
csv
Updated Dec 20, 2018
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Digital Transformation Agency (2018). australia.gov.au web traffic [Dataset]. https://data.gov.au/dataset/australia-gov-au-web-traffic
Explore at:
csvAvailable download formats
Dataset updated
Dec 20, 2018
Dataset provided by
Digital Transformation Agencyhttp://dta.gov.au/
License
Attribution 3.0 (CC BY 3.0)https://creativecommons.org/licenses/by/3.0/
License information was derived automatically
Area covered
Australia
Description
Below you’ll find a month by month breakdown of traffic on the australia.gov.au website along the following lines: Pageviews Visits Pages per visit Average time on page Devices This data is generated …Show full descriptionBelow you’ll find a month by month breakdown of traffic on the australia.gov.au website along the following lines: Pageviews Visits Pages per visit Average time on page Devices This data is generated using Google analytics. Please Note: This is an initial version of the data only. We’re looking forward to hearing your feedback on what other metrics are of interest to you. Please let us know by sending an email to data@digital.gov.au.
conceptual_captions
hf-proxy-cf.effarig.site
Updated Jun 21, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Google Research Datasets (2024). conceptual_captions [Dataset]. https://hf-proxy-cf.effarig.site/datasets/google-research-datasets/conceptual_captions
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jun 21, 2024
Dataset provided by
Googlehttp://google.com/
Authors
Google Research Datasets
License
https://choosealicense.com/licenses/other/https://choosealicense.com/licenses/other/
Description
Dataset Card for Conceptual Captions

Dataset Summary

Conceptual Captions is a dataset consisting of ~3.3M images annotated with captions. In contrast with the curated style of other image caption annotations, Conceptual Caption images and their raw descriptions are harvested from the web, and therefore represent a wider variety of styles. More precisely, the raw descriptions are harvested from the Alt-text HTML attribute associated with web images. To arrive at… See the full description on the dataset page: https://huggingface.co/datasets/google-research-datasets/conceptual_captions.

Facebook

Twitter

Click to copy link

Link copied

Cite

DataForSEO (2021). DataForSEO Labs API for keyword research and search analytics, real-time data for all Google locations and languages [Dataset]. https://datarade.ai/data-products/dataforseo-labs-api-for-keyword-research-and-search-analytics-dataforseo

DataForSEO Labs API for keyword research and search analytics, real-time data for all Google locations and languages

Explore at:

.jsonAvailable download formats

Dataset updated

Jun 4, 2021

Dataset provided by

Authors

DataForSEO

Area covered

Tokelau, Kenya, Morocco, Korea (Democratic People's Republic of), Azerbaijan, Mauritania, Micronesia (Federated States of), Isle of Man, Cocos (Keeling) Islands, Armenia

Description

DataForSEO Labs API offers three powerful keyword research algorithms and historical keyword data:

• Related Keywords from the “searches related to” element of Google SERP. • Keyword Suggestions that match the specified seed keyword with additional words before, after, or within the seed key phrase. • Keyword Ideas that fall into the same category as specified seed keywords. • Historical Search Volume with current cost-per-click, and competition values.

Based on in-market categories of Google Ads, you can get keyword ideas from the relevant Categories For Domain and discover relevant Keywords For Categories. You can also obtain Top Google Searches with AdWords and Bing Ads metrics, product categories, and Google SERP data.

You will find well-rounded ways to scout the competitors:

• Domain Whois Overview with ranking and traffic info from organic and paid search. • Ranked Keywords that any domain or URL has positions for in SERP. • SERP Competitors and the rankings they hold for the keywords you specify. • Competitors Domain with a full overview of its rankings and traffic from organic and paid search. • Domain Intersection keywords for which both specified domains rank within the same SERPs. • Subdomains for the target domain you specify along with the ranking distribution across organic and paid search. • Relevant Pages of the specified domain with rankings and traffic data. • Domain Rank Overview with ranking and traffic data from organic and paid search. • Historical Rank Overview with historical data on rankings and traffic of the specified domain from organic and paid search. • Page Intersection keywords for which the specified pages rank within the same SERP.

All DataForSEO Labs API endpoints function in the Live mode. This means you will be provided with the results in response right after sending the necessary parameters with a POST request.

The limit is 2000 API calls per minute, however, you can contact our support team if your project requires higher rates.

We offer well-rounded API documentation, GUI for API usage control, comprehensive client libraries for different programming languages, free sandbox API testing, ad hoc integration, and deployment support.

We have a pay-as-you-go pricing model. You simply add funds to your account and use them to get data. The account balance doesn't expire.

Clear search

Close search

Google apps

Main menu

DataForSEO Labs API for keyword research and search analytics, real-time...

Google Analytics Sample

Context

Content

Acknowledgements

Inspiration

Leading websites worldwide 2024, by monthly visits

ScrapeHero Data Cloud - Free and Easy to use

City Website Analytics

Online Shoppers Purchasing Intention Dataset

DoH-Gen-F-FGHOQS

Digital Geomorphic-GIS Map of Cape Lookout National Seashore, North Carolina...

Google Landmarks Dataset v2

Common languages used for web content 2025, by share of websites

Data from: Inventory of online public databases and repositories holding...

Dataset and trained models belonging to the article 'Distant reading...

The Dataset

Data Cleaning and Curation

Data from: Census of State and Local Law Enforcement Agencies (CSLLEA), 2008...

Digital Geologic-GIS Map of Sagamore Hill National Historic Site and...

Transparency in Keyword Faceted Search: a dataset of Google Shopping html...

Digital Geologic-GIS Map of Joshua Tree National Park, California (NPS, GRD,...

OpenStreetMap

Downloads:

Data access APIs:

The .osm format

Simple embedded maps

Help build the map!

Digital Quaternary Surficial Geologic-GIS Map of Santa Barbara Island,...

australia.gov.au web traffic

conceptual_captions

DataForSEO Labs API for keyword research and search analytics, real-time data for all Google locations and languages