100+ datasets found

d
Open Data Website Traffic
catalog.data.gov
data.lacity.org
+1more
Updated Jun 21, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
data.lacity.org (2025). Open Data Website Traffic [Dataset]. https://catalog.data.gov/dataset/open-data-website-traffic
Explore at:
Dataset updated
Jun 21, 2025
Dataset provided by
data.lacity.org
Description
Daily utilization metrics for data.lacity.org and geohub.lacity.org. Updated monthly
Company Datasets for Business Profiling
datarade.ai
Updated Feb 23, 2017
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Oxylabs (2017). Company Datasets for Business Profiling [Dataset]. https://datarade.ai/data-products/company-datasets-for-business-profiling-oxylabs
Explore at:
.json, .xml, .csv, .xlsAvailable download formats
Dataset updated
Feb 23, 2017
Dataset authored and provided by
Oxylabs
Area covered
Canada, Isle of Man, Northern Mariana Islands, Taiwan, British Indian Ocean Territory, Andorra, Bangladesh, Nepal, Moldova (Republic of), Tunisia
Description
Company Datasets for valuable business insights!

Discover new business prospects, identify investment opportunities, track competitor performance, and streamline your sales efforts with comprehensive Company Datasets.

These datasets are sourced from top industry providers, ensuring you have access to high-quality information:

Owler: Gain valuable business insights and competitive intelligence. -AngelList: Receive fresh startup data transformed into actionable insights. -CrunchBase: Access clean, parsed, and ready-to-use business data from private and public companies. -Craft.co: Make data-informed business decisions with Craft.co's company datasets. -Product Hunt: Harness the Product Hunt dataset, a leader in curating the best new products.

We provide fresh and ready-to-use company data, eliminating the need for complex scraping and parsing. Our data includes crucial details such as:

Company name;

Size;

Founding date;

Location;

Industry;

Revenue;

Employee count;

Competitors.

You can choose your preferred data delivery method, including various storage options, delivery frequency, and input/output formats.

Receive datasets in CSV, JSON, and other formats, with storage options like AWS S3 and Google Cloud Storage. Opt for one-time, monthly, quarterly, or bi-annual data delivery.

With Oxylabs Datasets, you can count on:

Fresh and accurate data collected and parsed by our expert web scraping team.

Time and resource savings, allowing you to focus on data analysis and achieving your business goals.

A customized approach tailored to your specific business needs.

Legal compliance in line with GDPR and CCPA standards, thanks to our membership in the Ethical Web Data Collection Initiative.

Pricing Options:

Standard Datasets: choose from various ready-to-use datasets with standardized data schemas, priced from $1,000/month.

Custom Datasets: Tailor datasets from any public web domain to your unique business needs. Contact our sales team for custom pricing.

Experience a seamless journey with Oxylabs:

Understanding your data needs: We work closely to understand your business nature and daily operations, defining your unique data requirements.

Developing a customized solution: Our experts create a custom framework to extract public data using our in-house web scraping infrastructure.

Delivering data sample: We provide a sample for your feedback on data quality and the entire delivery process.

Continuous data delivery: We continuously collect public data and deliver custom datasets per the agreed frequency.

Unlock the power of data with Oxylabs' Company Datasets and supercharge your business insights today!
d
Website Analytics
catalog.data.gov
data.brla.gov
+2more
Updated Jul 12, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
data.brla.gov (2025). Website Analytics [Dataset]. https://catalog.data.gov/dataset/website-analytics-89ba5
Explore at:
Dataset updated
Jul 12, 2025
Dataset provided by
data.brla.gov
Description
Web traffic statistics for the several City-Parish websites, brla.gov, city.brla.gov, Red Stick Ready, GIS, Open Data etc. Information provided by Google Analytics.
g
Website Traffic Dataset
gts.ai
json
Updated Aug 23, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
GTS (2024). Website Traffic Dataset [Dataset]. https://gts.ai/dataset-download/website-traffic-dataset/
Explore at:
jsonAvailable download formats
Dataset updated
Aug 23, 2024
Dataset provided by
GLOBOSE TECHNOLOGY SOLUTIONS PRIVATE LIMITED
Authors
GTS
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
Explore our detailed website traffic dataset featuring key metrics like page views, session duration, bounce rate, traffic source, and conversion rates.
P
WEB-FORUM-52 Dataset
paperswithcode.com
Updated Feb 16, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Albert Weichselbraun; Adrian M. P. Brasoveanu; Roger Waldvogel; Fabian Odoni (2021). WEB-FORUM-52 Dataset [Dataset]. https://paperswithcode.com/dataset/web-forum-52
Explore at:
Dataset updated
Feb 16, 2021
Authors
Albert Weichselbraun; Adrian M. P. Brasoveanu; Roger Waldvogel; Fabian Odoni
Description
The WEB-FORUM-52 gold standard comprises (i) 13 web forums from the health domain, (ii) 15 forums obtained from a Wikipedia list of popular forums (https://en.wikipedia.org/wiki/List_of_Internet_forums), (iii) 13 forums mentioned on a list of popular German Web forums (https://www.beliebte-foren.de), (iv) nine forums obtained from WPressBlog (https://www.wpressblog.com/free-forum-posting-sites-list/) and (v) two additional forums. For most forums two web pages (from different threads) were used and stored together with gold standard annotations that have been manually created by domain experts and describe the post text, post date, post user and direct URL to the post.
u
PDMX
cseweb.ucsd.edu
json
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
UCSD CSE Research Project, PDMX [Dataset]. https://cseweb.ucsd.edu/~jmcauley/datasets.html
Explore at:
jsonAvailable download formats
Dataset authored and provided by
UCSD CSE Research Project
Description
We introduce PDMX: a Public Domain MusicXML dataset for symbolic music processing, including over 250k musical scores in MusicXML format. PDMX is the largest publicly available, copyright-free MusicXML dataset in existence. PDMX includes genre, tag, description, and popularity metadata for every file.
i
Website Fingerprinting Dataset of Browsing Network Traffic for Desktop and...
ieee-dataport.org
Updated Oct 21, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mohamad Amar Irsyad Mohd Aminuddin (2024). Website Fingerprinting Dataset of Browsing Network Traffic for Desktop and Mobile Webpages [Dataset]. https://ieee-dataport.org/documents/website-fingerprinting-dataset-browsing-network-traffic-desktop-and-mobile-webpages
Explore at:
Dataset updated
Oct 21, 2024
Authors
Mohamad Amar Irsyad Mohd Aminuddin
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This is a dataset of Tor cell file extracted from browsing simulation using Tor Browser. The simulations cover both desktop and mobile webpages. The data collection process was using WFP-Collector tool (https://github.com/irsyadpage/WFP-Collector). All the neccessary configuration to perform the simulation as detailed in the tool repository.The webpage URL is selected by using the first 100 website based on: https://dataforseo.com/free-seo-stats/top-1000-websites.Each webpage URL is visited 90 times for each deskop and mobile browsing mode.
Datasets for figures and tables
catalog.data.gov
datasets.ai
Updated Nov 12, 2020
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. EPA Office of Research and Development (ORD) (2020). Datasets for figures and tables [Dataset]. https://catalog.data.gov/dataset/datasets-for-figures-and-tables
Explore at:
Dataset updated
Nov 12, 2020
Dataset provided by
United States Environmental Protection Agencyhttp://www.epa.gov/
Description
Software Model simulations were conducted using WRF version 3.8.1 (available at https://github.com/NCAR/WRFV3) and CMAQ version 5.2.1 (available at https://github.com/USEPA/CMAQ). The meteorological and concentration fields created using these models are too large to archive on ScienceHub, approximately 1 TB, and are archived on EPA’s high performance computing archival system (ASM) at /asm/MOD3APP/pcc/02.NOAH.v.CLM.v.PX/. Figures Figures 1 – 6 and Figure 8: Created using the NCAR Command Language (NCL) scripts (https://www.ncl.ucar.edu/get_started.shtml). NCLD code can be downloaded from the NCAR website (https://www.ncl.ucar.edu/Download/) at no cost. The data used for these figures are archived on EPA’s ASM system and are available upon request. Figures 7, 8b-c, 8e-f, 8h-i, and 9 were created using the AMET utility developed by U.S. EPA/ORD. AMET can be freely downloaded and used at https://github.com/USEPA/AMET. The modeled data paired in space and time provided in this archive can be used to recreate these figures. The data contained in the compressed zip files are organized in comma delimited files with descriptive headers or space delimited files that match tabular data in the manuscript. The data dictionary provides additional information about the files and their contents. This dataset is associated with the following publication: Campbell, P., J. Bash, and T. Spero. Updates to the Noah Land Surface Model in WRF‐CMAQ to Improve Simulated Meteorology, Air Quality, and Deposition. Journal of Advances in Modeling Earth Systems. John Wiley & Sons, Inc., Hoboken, NJ, USA, 11(1): 231-256, (2019).
D
Dataset Alerts - Open and Monitoring
datasf.org
data.sfgov.org
+1more
application/rdfxml +5
Updated Jun 20, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2025). Dataset Alerts - Open and Monitoring [Dataset]. https://datasf.org/opendata/
Explore at:
json, application/rssxml, csv, tsv, xml, application/rdfxmlAvailable download formats
Dataset updated
Jun 20, 2025
License
ODC Public Domain Dedication and Licence (PDDL) v1.0http://www.opendatacommons.org/licenses/pddl/1.0/
License information was derived automatically
Description
A log of dataset alerts open, monitored or resolved on the open data portal. Alerts can include issues as well as deprecation or discontinuation notices.
Machine Learning Dataset
brightdata.com
.json, .csv, .xlsx
Updated Dec 23, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Bright Data (2024). Machine Learning Dataset [Dataset]. https://brightdata.com/products/datasets/machine-learning
Explore at:
.json, .csv, .xlsxAvailable download formats
Dataset updated
Dec 23, 2024
Dataset authored and provided by
Bright Datahttps://brightdata.com/
License
https://brightdata.com/licensehttps://brightdata.com/license
Area covered
Worldwide
Description
Utilize our machine learning datasets to develop and validate your models. Our datasets are designed to support a variety of machine learning applications, from image recognition to natural language processing and recommendation systems. You can access a comprehensive dataset or tailor a subset to fit your specific requirements, using data from a combination of various sources and websites, including custom ones. Popular use cases include model training and validation, where the dataset can be used to ensure robust performance across different applications. Additionally, the dataset helps in algorithm benchmarking by providing extensive data to test and compare various machine learning algorithms, identifying the most effective ones for tasks such as fraud detection, sentiment analysis, and predictive maintenance. Furthermore, it supports feature engineering by allowing you to uncover significant data attributes, enhancing the predictive accuracy of your machine learning models for applications like customer segmentation, personalized marketing, and financial forecasting.
d
Free Food & Meal Sites
catalog.data.gov
Updated Mar 31, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
City of Philadelphia (2025). Free Food & Meal Sites [Dataset]. https://catalog.data.gov/dataset/free-food-meal-sites
Explore at:
Dataset updated
Mar 31, 2025
Dataset provided by
City of Philadelphia
Description
This dataset and app provide the locations of sites where the public can access free food, nutrition services, and public benefits.
w
Dataset of free cash flow and website of public companies for Fiserv
workwithdata.com
Updated Nov 27, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Work With Data (2024). Dataset of free cash flow and website of public companies for Fiserv [Dataset]. https://www.workwithdata.com/datasets/public-companies?col=company%2Cfree_cash_flow%2Cwebsite&f=1&fcol0=company&fop0=%3D&fval0=Fiserv
Explore at:
Dataset updated
Nov 27, 2024
Dataset authored and provided by
Work With Data
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset is about companies. It has 1 row and is filtered where the company is Fiserv. It features 3 columns: website, and free cash flow.
d
NYC Free Tax Prep Sites
catalog.data.gov
data.cityofnewyork.us
Updated Sep 2, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
data.cityofnewyork.us (2023). NYC Free Tax Prep Sites [Dataset]. https://catalog.data.gov/dataset/nyc-free-tax-prep-sites
Explore at:
Dataset updated
Sep 2, 2023
Dataset provided by
data.cityofnewyork.us
Area covered
New York
Description
This dataset provides a machine-readable format for the data that populates the "NYC Free Tax Preparation Site Finder" map hosted on DCA's website. The dataset includes the name and address of the service provider, its hours of operation, services available, and required geo-spacial data elements used by the map. DCA's Office of Financial Empowerment (OFE) DCA coordinates the City’s Annual Tax Season Initiative which offers free tax preparation services to qualifying New Yorkers. NYC Free Tax Prep sites are displayed on a map at nyc.gov/taxprep (https://www1.nyc.gov/assets/dca/TaxMap/index.html) The map is updated whenever a new site is added or an existing site changes its hours of operation or services provided. For more information about Free Tax Preparation Sties visit the DCA website (https://www1.nyc.gov/site/dca/consumers/file-your-taxes-faqs.page).
Coursera Courses Uncleaned Dataset to Practice
kaggle.com
Updated May 2, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Janak Pariyar (2024). Coursera Courses Uncleaned Dataset to Practice [Dataset]. https://www.kaggle.com/datasets/janakpariyar/coursera-courses-uncleaned-dataset-to-practice/data
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
May 2, 2024
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Janak Pariyar
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
The data set is web scraped from the Coursera website. The data is static. It consists of 7 columns with various unstructured data, which might help you on your learning curve of Data Science and Data Analytics . Feel free to play around . Happy Digging :)
w
Dataset of free cash flow and website of public companies for Netflix
workwithdata.com
Updated Nov 27, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Work With Data (2024). Dataset of free cash flow and website of public companies for Netflix [Dataset]. https://www.workwithdata.com/datasets/public-companies?col=company%2Cfree_cash_flow%2Cwebsite&f=1&fcol0=company&fop0=%3D&fval0=Netflix
Explore at:
Dataset updated
Nov 27, 2024
Dataset authored and provided by
Work With Data
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset is about companies. It has 1 row and is filtered where the company is Netflix. It features 3 columns: website, and free cash flow.
i
A Dataset on Online Learning-based Web Behavior from Different Countries...
ieee-dataport.org
Updated Apr 27, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Saumick Pradhan (2022). A Dataset on Online Learning-based Web Behavior from Different Countries Before and After COVID-19 [Dataset]. https://ieee-dataport.org/open-access/dataset-online-learning-based-web-behavior-different-countries-and-after-covid-19
Explore at:
Dataset updated
Apr 27, 2022
Authors
Saumick Pradhan
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
2022
h
1k_Website_Screenshots_and_Metadata
huggingface.co
Updated Apr 13, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Silatus (2023). 1k_Website_Screenshots_and_Metadata [Dataset]. https://huggingface.co/datasets/silatus/1k_Website_Screenshots_and_Metadata
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Apr 13, 2023
Dataset authored and provided by
Silatus
License
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Description
Dataset Card for 1000 Website Screenshots with Metadata

Dataset Summary

Silatus is sharing, for free, a segment of a dataset that we are using to train a generative AI model for text-to-mockup conversions. This dataset was collected in December 2022 and early January 2023, so it contains very recent data from 1,000 of the world's most popular websites. You can get our larger 10,000 website dataset for free at: https://silatus.com/datasets This dataset includes: High-res… See the full description on the dataset page: https://huggingface.co/datasets/silatus/1k_Website_Screenshots_and_Metadata.
Top rated TV series extracted from TMDB website
kaggle.com
Updated Jul 15, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Aditi Sable (2024). Top rated TV series extracted from TMDB website [Dataset]. https://www.kaggle.com/datasets/aditis4ble/top-rated-tv-series-extracted-from-tmdb-website/code
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jul 15, 2024
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Aditi Sable
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Dataset

This dataset was created by Aditi Sable

Released under CC0: Public Domain

Contents
u
Pinterest Fashion Compatibility
cseweb.ucsd.edu
beta.data.urbandatacentre.ca
json
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
UCSD CSE Research Project, Pinterest Fashion Compatibility [Dataset]. https://cseweb.ucsd.edu/~jmcauley/datasets.html
Explore at:
jsonAvailable download formats
Dataset authored and provided by
UCSD CSE Research Project
Description
This dataset contains images (scenes) containing fashion products, which are labeled with bounding boxes and links to the corresponding products.

Metadata includes

product IDs

bounding boxes

Basic Statistics:

Scenes: 47,739

Products: 38,111

Scene-Product Pairs: 93,274
i
Labeled Image Datasets for AI & Computer Vision
images.cv
Updated Apr 26, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Images.cv (2024). Labeled Image Datasets for AI & Computer Vision [Dataset]. https://images.cv/
Explore at:
Dataset updated
Apr 26, 2024
Dataset provided by
Images.cv
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Explore and download labeled image datasets for AI, ML, and computer vision. Find datasets for object detection, image classification, and image segmentation.