46 datasets found

m
Google Map Data, Google Map Data Scraper, Business location Data- Scrape All...
apiscrapy.mydatastorefront.com
Updated May 23, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
APISCRAPY (2022). Google Map Data, Google Map Data Scraper, Business location Data- Scrape All Publicly Available Data From Google Map & Other Platforms [Dataset]. https://apiscrapy.mydatastorefront.com/products/google-map-data-google-map-data-scraper-business-location-d-apiscrapy
Explore at:
Dataset updated
May 23, 2022
Dataset authored and provided by
APISCRAPY
Area covered
Greece, Moldova, Iceland, Lithuania, Latvia, United States Minor Outlying Islands, Romania, Luxembourg, Germany, Liechtenstein
Description
Explore APISCRAPY, your AI-powered Google Map Data Scraper. Easily extract Business Location Data from Google Maps and other platforms. Seamlessly access and utilize publicly available map data for your business needs. Scrape All Publicly Available Data From Google Maps & Other Platforms.
Google Maps Dataset
brightdata.com
.json, .csv, .xlsx
Updated Jan 8, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Bright Data (2023). Google Maps Dataset [Dataset]. https://brightdata.com/products/datasets/google-maps
Explore at:
.json, .csv, .xlsxAvailable download formats
Dataset updated
Jan 8, 2023
Dataset authored and provided by
Bright Datahttps://brightdata.com/
License
https://brightdata.com/licensehttps://brightdata.com/license
Area covered
Worldwide
Description
The Google Maps dataset is ideal for getting extensive information on businesses anywhere in the world. Easily filter by location, business type, and other factors to get the exact data you need. The Google Maps dataset includes all major data points: timestamp, name, category, address, description, open website, phone number, open_hours, open_hours_updated, reviews_count, rating, main_image, reviews, url, lat, lon, place_id, country, and more.
Speedtest Open Data - Four International cities - MEL, BKK, SHG, LAX plus...
figshare.com
txt
Updated May 30, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Richard Ferrers; Speedtest Global Index (2023). Speedtest Open Data - Four International cities - MEL, BKK, SHG, LAX plus ALC - 2020, 2022 [Dataset]. http://doi.org/10.6084/m9.figshare.13621169.v24
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.13621169.v24
Dataset updated
May 30, 2023
Dataset provided by
Figsharehttp://figshare.com/
Authors
Richard Ferrers; Speedtest Global Index
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset compares four cities FIXED-line broadband internet speeds: - Melbourne, AU - Bangkok, TH - Shanghai, CN - Los Angeles, US - Alice Springs, AU

ERRATA: 1.Data is for Q3 2020, but some files are labelled incorrectly as 02-20 of June 20. They all should read Sept 20, or 09-20 as Q3 20, rather than Q2. Will rename and reload. Amended in v7.

LAX file named 0320, when should be Q320. Amended in v8.

*lines of data for each geojson file; a line equates to a 600m^2 location, inc total tests, devices used, and average upload and download speed - MEL 16181 locations/lines => 0.85M speedtests (16.7 tests per 100people) - SHG 31745 lines => 0.65M speedtests (2.5/100pp) - BKK 29296 lines => 1.5M speedtests (14.3/100pp) - LAX 15899 lines => 1.3M speedtests (10.4/100pp) - ALC 76 lines => 500 speedtests (2/100pp)

Geojsons of these 2* by 2* extracts for MEL, BKK, SHG now added, and LAX added v6. Alice Springs added v15.

This dataset unpacks, geospatially, data summaries provided in Speedtest Global Index (linked below). See Jupyter Notebook (*.ipynb) to interrogate geo data. See link to install Jupyter.

** To Do Will add Google Map versions so everyone can see without installing Jupyter. - Link to Google Map (BKK) added below. Key:Green > 100Mbps(Superfast). Black > 500Mbps (Ultrafast). CSV provided. Code in Speedtestv1.1.ipynb Jupyter Notebook. - Community (Whirlpool) surprised [Link: https://whrl.pl/RgAPTl] that Melb has 20% at or above 100Mbps. Suggest plot Top 20% on map for community. Google Map link - now added (and tweet).

** Python melb = au_tiles.cx[144:146 , -39:-37] #Lat/Lon extract shg = tiles.cx[120:122 , 30:32] #Lat/Lon extract bkk = tiles.cx[100:102 , 13:15] #Lat/Lon extract lax = tiles.cx[-118:-120, 33:35] #lat/Lon extract ALC=tiles.cx[132:134, -22:-24] #Lat/Lon extract

Histograms (v9), and data visualisations (v3,5,9,11) will be provided. Data Sourced from - This is an extract of Speedtest Open data available at Amazon WS (link below - opendata.aws).

**VERSIONS v.24 Add tweet and google map of Top 20% (over 100Mbps locations) in Mel Q322. Add v.1.5 MEL-Superfast notebook, and CSV of results (now on Google Map; link below). v23. Add graph of 2022 Broadband distribution, and compare 2020 - 2022. Updated v1.4 Jupyter notebook. v22. Add Import ipynb; workflow-import-4cities. v21. Add Q3 2022 data; five cities inc ALC. Geojson files. (2020; 4.3M tests 2022; 2.9M tests)

Melb 14784 lines Avg download speed 69.4M Tests 0.39M

SHG 31207 lines Avg 233.7M Tests 0.56M

ALC 113 lines Avg 51.5M Test 1092

BKK 29684 lines Avg 215.9M Tests 1.2M

LAX 15505 lines Avg 218.5M Tests 0.74M

v20. Speedtest - Five Cities inc ALC. v19. Add ALC2.ipynb. v18. Add ALC line graph. v17. Added ipynb for ALC. Added ALC to title.v16. Load Alice Springs Data Q221 - csv. Added Google Map link of ALC. v15. Load Melb Q1 2021 data - csv. V14. Added Melb Q1 2021 data - geojson. v13. Added Twitter link to pics. v12 Add Line-Compare pic (fastest 1000 locations) inc Jupyter (nbn-intl-v1.2.ipynb). v11 Add Line-Compare pic, plotting Four Cities on a graph. v10 Add Four Histograms in one pic. v9 Add Histogram for Four Cities. Add NBN-Intl.v1.1.ipynb (Jupyter Notebook). v8 Renamed LAX file to Q3, rather than 03. v7 Amended file names of BKK files to correctly label as Q3, not Q2 or 06. v6 Added LAX file. v5 Add screenshot of BKK Google Map. v4 Add BKK Google map(link below), and BKK csv mapping files. v3 replaced MEL map with big key version. Prev key was very tiny in top right corner. v2 Uploaded MEL, SHG, BKK data and Jupyter Notebook v1 Metadata record

** LICENCE AWS data licence on Speedtest data is "CC BY-NC-SA 4.0", so use of this data must be: - non-commercial (NC) - reuse must be share-alike (SA)(add same licence). This restricts the standard CC-BY Figshare licence.

** Other uses of Speedtest Open Data; - see link at Speedtest below.
IRS Form 990 Data
kaggle.com
zip
Updated Mar 20, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Internal Revenue Service (2019). IRS Form 990 Data [Dataset]. https://www.kaggle.com/datasets/irs/irs-990
Explore at:
zip(0 bytes)Available download formats
Dataset updated
Mar 20, 2019
Dataset provided by
irs.govhttp://www.irs.gov/
Internal Revenue Servicehttp://www.irs.gov/
Authors
Internal Revenue Service
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Context

Form 990 (officially, the "Return of Organization Exempt From Income Tax"1) is a United States Internal Revenue Service form that provides the public with financial information about a nonprofit organization. It is often the only source of such information. It is also used by government agencies to prevent organizations from abusing their tax-exempt status. Source: https://en.wikipedia.org/wiki/Form_990

Content

Form 990 is used by the United States Internal Revenue Service to gather financial information about nonprofit/exempt organizations. This BigQuery dataset can be used to perform research and analysis of organizations that have electronically filed Forms 990, 990-EZ and 990-PF. For a complete description of data variables available in this dataset, see the IRS’s extract documentation: https://www.irs.gov/uac/soi-tax-stats-annual-extract-of-tax-exempt-organization-financial-data.

Update Frequency: Annual

Fork this kernel to get started with this dataset.

Acknowledgements

https://bigquery.cloud.google.com/dataset/bigquery-public-data:irs_990

https://cloud.google.com/bigquery/public-data/irs-990

Dataset Source: U.S. Internal Revenue Service. This dataset is publicly available for anyone to use under the following terms provided by the Dataset Source - http://www.data.gov/privacy-policy#data_policy - and is provided "AS IS" without any warranty, express or implied, from Google. Google disclaims all liability for any damages, direct or indirect, resulting from the use of the dataset.

Banner Photo by @rawpixel from Unplash.

Inspiration

What organizations filed tax exempt status in 2015?

What was the revenue of the American Red Cross in 2017?
Company Datasets for Business Profiling
datarade.ai
Updated Feb 23, 2017
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Oxylabs (2017). Company Datasets for Business Profiling [Dataset]. https://datarade.ai/data-products/company-datasets-for-business-profiling-oxylabs
Explore at:
.json, .xml, .csv, .xlsAvailable download formats
Dataset updated
Feb 23, 2017
Dataset authored and provided by
Oxylabs
Area covered
Moldova (Republic of), Tunisia, Isle of Man, Canada, Taiwan, Bangladesh, Andorra, Northern Mariana Islands, British Indian Ocean Territory, Nepal
Description
Company Datasets for valuable business insights!

Discover new business prospects, identify investment opportunities, track competitor performance, and streamline your sales efforts with comprehensive Company Datasets.

These datasets are sourced from top industry providers, ensuring you have access to high-quality information:

Owler: Gain valuable business insights and competitive intelligence. -AngelList: Receive fresh startup data transformed into actionable insights. -CrunchBase: Access clean, parsed, and ready-to-use business data from private and public companies. -Craft.co: Make data-informed business decisions with Craft.co's company datasets. -Product Hunt: Harness the Product Hunt dataset, a leader in curating the best new products.

We provide fresh and ready-to-use company data, eliminating the need for complex scraping and parsing. Our data includes crucial details such as:

Company name;

Size;

Founding date;

Location;

Industry;

Revenue;

Employee count;

Competitors.

You can choose your preferred data delivery method, including various storage options, delivery frequency, and input/output formats.

Receive datasets in CSV, JSON, and other formats, with storage options like AWS S3 and Google Cloud Storage. Opt for one-time, monthly, quarterly, or bi-annual data delivery.

With Oxylabs Datasets, you can count on:

Fresh and accurate data collected and parsed by our expert web scraping team.

Time and resource savings, allowing you to focus on data analysis and achieving your business goals.

A customized approach tailored to your specific business needs.

Legal compliance in line with GDPR and CCPA standards, thanks to our membership in the Ethical Web Data Collection Initiative.

Pricing Options:

Standard Datasets: choose from various ready-to-use datasets with standardized data schemas, priced from $1,000/month.

Custom Datasets: Tailor datasets from any public web domain to your unique business needs. Contact our sales team for custom pricing.

Experience a seamless journey with Oxylabs:

Understanding your data needs: We work closely to understand your business nature and daily operations, defining your unique data requirements.

Developing a customized solution: Our experts create a custom framework to extract public data using our in-house web scraping infrastructure.

Delivering data sample: We provide a sample for your feedback on data quality and the entire delivery process.

Continuous data delivery: We continuously collect public data and deliver custom datasets per the agreed frequency.

Unlock the power of data with Oxylabs' Company Datasets and supercharge your business insights today!
Day & night temperatures, 50yrs, 1666ws, TFRecord
kaggle.com
zip
Updated Nov 9, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Martin Görner (2019). Day & night temperatures, 50yrs, 1666ws, TFRecord [Dataset]. https://www.kaggle.com/datasets/mgorner/day-night-temperatures-50yrs-1666ws-tfrecord
Explore at:
zip(160157825 bytes)Available download formats
Dataset updated
Nov 9, 2019
Authors
Martin Görner
License
https://www.usa.gov/government-works/https://www.usa.gov/government-works/
Description
This dataset is a cleaned-up extract from the following public BigQuery dataset: https://console.cloud.google.com/marketplace/details/noaa-public/ghcn-d

The dataset contains daily min/max temperatures from a selection of 1666 weather stations. The data spans exactly 50 years. Missing values have been interpolated and are marked as such.

This dataset is in TFRecord format.

About the original dataset: NOAA’s Global Historical Climatology Network (GHCN) is an integrated database of climate summaries from land surface stations across the globe that have been subjected to a common suite of quality assurance reviews. The data are obtained from more than 20 sources. The GHCN-Daily is an integrated database of daily climate summaries from land surface stations across the globe, and is comprised of daily climate records from over 100,000 stations in 180 countries and territories, and includes some data from every year since 1763.
d
Job Postings Dataset for Labour Market Research and Insights
datarade.ai
Updated Sep 20, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Oxylabs (2023). Job Postings Dataset for Labour Market Research and Insights [Dataset]. https://datarade.ai/data-products/job-postings-dataset-for-labour-market-research-and-insights-oxylabs
Explore at:
.json, .xml, .csv, .xlsAvailable download formats
Dataset updated
Sep 20, 2023
Dataset authored and provided by
Oxylabs
Area covered
Jamaica, Zambia, Togo, Luxembourg, Tajikistan, British Indian Ocean Territory, Sierra Leone, Switzerland, Kyrgyzstan, Anguilla
Description
Introducing Job Posting Datasets: Uncover labor market insights!

Elevate your recruitment strategies, forecast future labor industry trends, and unearth investment opportunities with Job Posting Datasets.

Job Posting Datasets Source:

Indeed: Access datasets from Indeed, a leading employment website known for its comprehensive job listings.

Glassdoor: Receive ready-to-use employee reviews, salary ranges, and job openings from Glassdoor.

StackShare: Access StackShare datasets to make data-driven technology decisions.

Job Posting Datasets provide meticulously acquired and parsed data, freeing you to focus on analysis. You'll receive clean, structured, ready-to-use job posting data, including job titles, company names, seniority levels, industries, locations, salaries, and employment types.

Choose your preferred dataset delivery options for convenience:

Receive datasets in various formats, including CSV, JSON, and more. Opt for storage solutions such as AWS S3, Google Cloud Storage, and more. Customize data delivery frequencies, whether one-time or per your agreed schedule.

Why Choose Oxylabs Job Posting Datasets:

Fresh and accurate data: Access clean and structured job posting datasets collected by our seasoned web scraping professionals, enabling you to dive into analysis.

Time and resource savings: Focus on data analysis and your core business objectives while we efficiently handle the data extraction process cost-effectively.

Customized solutions: Tailor our approach to your business needs, ensuring your goals are met.

Legal compliance: Partner with a trusted leader in ethical data collection. Oxylabs is a founding member of the Ethical Web Data Collection Initiative, aligning with GDPR and CCPA best practices.

Pricing Options:

Standard Datasets: choose from various ready-to-use datasets with standardized data schemas, priced from $1,000/month.

Custom Datasets: Tailor datasets from any public web domain to your unique business needs. Contact our sales team for custom pricing.

Experience a seamless journey with Oxylabs:

Understanding your data needs: We work closely to understand your business nature and daily operations, defining your unique data requirements.

Developing a customized solution: Our experts create a custom framework to extract public data using our in-house web scraping infrastructure.

Delivering data sample: We provide a sample for your feedback on data quality and the entire delivery process.

Continuous data delivery: We continuously collect public data and deliver custom datasets per the agreed frequency.

Effortlessly access fresh job posting data with Oxylabs Job Posting Datasets.
G
Automatically Extracted Buildings
open.canada.ca
catalogue.arctic-sdi.org
+2more
fgdb/gdb, html, kmz +3
Updated Oct 23, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Natural Resources Canada (2025). Automatically Extracted Buildings [Dataset]. https://open.canada.ca/data/en/dataset/7a5cda52-c7df-427f-9ced-26f19a8a64d6
Explore at:
pdf, html, wms, fgdb/gdb, kmz, shpAvailable download formats
Dataset updated
Oct 23, 2025
Dataset provided by
Natural Resources Canada
License
Open Government Licence - Canada 2.0https://open.canada.ca/en/open-government-licence-canada
License information was derived automatically
Description
“Automatically Extracted Buildings” is a raw digital product in vector format created by NRCan. It consists of a single topographical feature class that delineates polygonal building footprints automatically extracted from airborne Lidar data, high-resolution optical imagery or other sources.
Public electrophysiological datasets collected in the Buzsaki Lab
zenodo.org
Updated Jul 22, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Peter Christian Petersen; Peter Christian Petersen; Michelle Hernandez; György Buzsáki; György Buzsáki; Michelle Hernandez (2024). Public electrophysiological datasets collected in the Buzsaki Lab [Dataset]. http://doi.org/10.5281/zenodo.3629881
Explore at:
Unique identifier
https://doi.org/10.5281/zenodo.3629881
Dataset updated
Jul 22, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Peter Christian Petersen; Peter Christian Petersen; Michelle Hernandez; György Buzsáki; György Buzsáki; Michelle Hernandez
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The Buzsaki Lab is proud to present a large selection of experimental data available for public access: https://buzsakilab.com/wp/database/. We publicly share more than a thousand sessions (about 40TB of raw and spike- and LFP-processed data) via our public data repository. The datasets are from freely moving rodents and include sleep-task-sleep sessions (3 to 24 hrs continuous recording sessions) in various brain structures, including metadata. We are happy to assist you in using the data. Our goal is that by sharing these data, other users can provide new insights, extend, contradict, or clarify our conclusions.

The databank contains electrophysiological recordings performed in freely moving rats and mice collected by investigators in the Buzsaki Lab over several years (a subset from head-fixed mice). Sessions have been collected with extracellular electrodes using high-channel-count silicon probes, with spike sorted single units, and intracellular and juxtacellular combined with extracellular electrodes. Several sessions include physiologically and optogenetically identified units. The sessions have been collected from various brain region pairs: the hippocampus, thalamus, amygdala, post-subiculum, septal region, and the entorhinal cortex, and various neocortical regions. In most behavioral tasks, the animals performed spatial behaviors (linear mazes and open fields), preceded and followed by long sleep sessions. Brain state classification is provided.

Getting started

The top menu “Databank” serves as a navigational menu to the databank. The metadata describing the experiments is stored in a relational database which means that there are many entry points for exploring the data. The databank is organized by projects, animal subjects, and sessions.

Accessing and downloading the datasets

We share the data through two services: our public Globus.org endpoint and our webshare: buzsakilab.nyumc.org. A subset of the datasets is also available at CRCNS.org. If you have an interest in a dataset that is not listed or is lacking information, please contact us. We pledge to make our data available immediately after publication.

Support

For support, please use our Buzsaki Databank google group. If you have an interest in a dataset that is not listed or is lacking information, please send us a request. Feel free to contact us, if you need more details on a given dataset or if a dataset is missing.
h
github-r-repos
huggingface.co
Updated Jun 6, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Daniel Falbel (2023). github-r-repos [Dataset]. https://huggingface.co/datasets/dfalbel/github-r-repos
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jun 6, 2023
Authors
Daniel Falbel
License
https://choosealicense.com/licenses/other/https://choosealicense.com/licenses/other/
Description
GitHub R repositories dataset

R source files from GitHub. This dataset has been created using the public GitHub datasets from Google BigQuery. This is the actual query that has been used to export the data: EXPORT DATA OPTIONS ( uri = 'gs://your-bucket/gh-r/*.parquet', format = 'PARQUET') as ( select f.id, f.repo_name, f.path, c.content, c.size from ( SELECT distinct id, repo_name, path FROM bigquery-public-data.github_repos.files where ends_with(path… See the full description on the dataset page: https://huggingface.co/datasets/dfalbel/github-r-repos.
g
Demographics
health.google.com
Updated Oct 7, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2021). Demographics [Dataset]. https://health.google.com/covid-19/open-data/raw-data
Explore at:
Dataset updated
Oct 7, 2021
Variables measured
key, population, population_male, rural_population, urban_population, population_female, population_density, clustered_population, population_age_00_09, population_age_10_19, and 11 more
Description
Various population statistics, including structured demographics data.
Speedtest Open Data - Australia 2020 Q2, Q3, Q4 extract
figshare.com
txt
Updated Oct 24, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Richard Ferrers; Speedtest Global Index (2025). Speedtest Open Data - Australia 2020 Q2, Q3, Q4 extract [Dataset]. http://doi.org/10.6084/m9.figshare.13370504.v17
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.13370504.v17
Dataset updated
Oct 24, 2025
Dataset provided by
figshare
Figsharehttp://figshare.com/
Authors
Richard Ferrers; Speedtest Global Index
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Australia
Description
This is an Australian extract of Speedtest Open data available at Amazon WS (link below - opendata.aws).AWS data licence is "CC BY-NC-SA 4.0", so use of this data must be:- non-commercial (NC)- reuse must be share-alike (SA)(add same licence).This restricts the standard CC-BY Figshare licence.A world speedtest open data was dowloaded (>400Mb, 7M lines of data). An extract of Australia's location (lat, long) revealed 88,000 lines of data (attached as csv).A Jupyter notebook of extract process is attached.A link to Twitter thread of outputs provided.A link to Data tutorial provided (GitHub), including Jupyter Notebook to analyse World Speedtest data, selecting one US State.Data Shows: (Q2)- 3.1M speedtests- 762,000 devices- 88,000 grid locations (600m * 600m), summarised as a point- average speed 33.7Mbps (down), 12.4M (up)- Max speed 724Mbps- data is for 600m * 600m grids, showing average speed up/down, number of tests, and number of users (IP). Added centroid, and now lat/long.See tweet of image of centroids also attached.Versions:v15/16. Add Hist comparing Q1-21 vs Q2-20. Inc ipynb (incHistQ121, v.1.3-Q121) to calc.v14 Add AUS Speedtest Q1 2021 geojson.(79k lines avg d/l 45.4Mbps)v13 - Added three colour MELB map (less than 20Mbps, over 90Mbps, 20-90Mbps)v12 - Added AUS - Syd - Mel Line Chart Q320.v11 - Add line chart compare Q2, Q3, Q4 plus Melb - result virtually indistinguishable. Add line chart to compare Syd - Melb Q3. Also virtually indistinguishable. Add HIST compare Syd - Melb Q3. Add new Jupyter with graph calcs (nbn-AUS-v1.3). Some ERRATA document in Notebook. Issue with resorting table, and graphing only part of table. Not an issue if all lines of table graphed.v10 - Load AURIN sample pics. Speedtest data loaded to AURIN geo-analytic platform; requires edu.au login.v9 - Add comparative Q2, Q3, Q4 Hist pic.v8 - Added Q4 data geojson. Add Q3, Q4 Hist pic.v7 - Rename to include Q2, Q3 in Title.v6 - Add Q3 20 data. Rename geojson AUS data as Q2. Add comparative Histogram. Calc in International.ipynb.v5 - add Jupyter Notebook inc Histograms. Hist is count of geo-locations avg download speed (unweighted by tests).v4 - added Melb choropleth (png 50Mpix) inc legend. (To do - add Melb.geojson). Posted Link to AURIN description of Speedtest data.v3 - Add super fast data (>100Mbps) less than 1% of data - 697 lines. Includes png of superfast.plot(). Link below to Google Maps version of superfast data points. Also Google map of first 100 data points - sample data. Geojson format for loading into GeoPandas, per Jupyter Notebook. New version of Jupyter Notebook, v.1.1.v2 - add centroids image.v1 - initial data load.** Future Work- combine Speedtest data with NBN Technology by location data (national map.gov.au); https://www.data.gov.au/dataset/national-broadband-network-connections-by-technology-type- combine Speedtest data with SEIFA data - socioeconomic categories - to discuss with AURIN.- Further international comparisons- discussed collaboration with Assoc Prof Tooran Alizadeh, USyd.
Product Review Datasets for User Sentiment Analysis
datarade.ai
Updated Sep 28, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Oxylabs (2018). Product Review Datasets for User Sentiment Analysis [Dataset]. https://datarade.ai/data-products/product-review-datasets-for-user-sentiment-analysis-oxylabs
Explore at:
.json, .xml, .csv, .xlsAvailable download formats
Dataset updated
Sep 28, 2018
Dataset authored and provided by
Oxylabs
Area covered
Egypt, Libya, Hong Kong, Canada, South Africa, Sudan, Italy, Argentina, Antigua and Barbuda, Barbados
Description
Product Review Datasets: Uncover user sentiment

Harness the power of Product Review Datasets to understand user sentiment and insights deeply. These datasets are designed to elevate your brand and product feature analysis, help you evaluate your competitive stance, and assess investment risks.

Data sources:

Trustpilot: datasets encompassing general consumer reviews and ratings across various businesses, products, and services.

Leave the data collection challenges to us and dive straight into market insights with clean, structured, and actionable data, including:

Product name;

Product category;

Number of ratings;

Ratings average;

Review title;

Review body;

Choose from multiple data delivery options to suit your needs:

Receive data in easy-to-read formats like spreadsheets or structured JSON files.

Select your preferred data storage solutions, including SFTP, Webhooks, Google Cloud Storage, AWS S3, and Microsoft Azure Storage.

Tailor data delivery frequencies, whether on-demand or per your agreed schedule.

Why choose Oxylabs?

Fresh and accurate data: Access organized, structured, and comprehensive data collected by our leading web scraping professionals.

Time and resource savings: Concentrate on your core business goals while we efficiently handle the data extraction process at an affordable cost.

Adaptable solutions: Share your specific data requirements, and we'll craft a customized data collection approach to meet your objectives.

Legal compliance: Partner with a trusted leader in ethical data collection. Oxylabs is a founding member of the Ethical Web Data Collection Initiative, aligning with GDPR and CCPA standards.

Pricing Options:

Standard Datasets: choose from various ready-to-use datasets with standardized data schemas, priced from $1,000/month.

Custom Datasets: Tailor datasets from any public web domain to your unique business needs. Contact our sales team for custom pricing.

Experience a seamless journey with Oxylabs:

Understanding your data needs: We work closely to understand your business nature and daily operations, defining your unique data requirements.

Developing a customized solution: Our experts create a custom framework to extract public data using our in-house web scraping infrastructure.

Delivering data sample: We provide a sample for your feedback on data quality and the entire delivery process.

Continuous data delivery: We continuously collect public data and deliver custom datasets per the agreed frequency.

Join the ranks of satisfied customers who appreciate our meticulous attention to detail and personalized support. Experience the power of Product Review Datasets today to uncover valuable insights and enhance decision-making.
Novel Covid-19 Dataset
kaggle.com
Updated Sep 18, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
GHOST5612 (2025). Novel Covid-19 Dataset [Dataset]. https://www.kaggle.com/datasets/ghost5612/novel-covid-19-dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Sep 18, 2025
Dataset provided by
Kagglehttp://kaggle.com/
Authors
GHOST5612
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
Context:

From World Health Organization - On 31 December 2019, WHO was alerted to several cases of pneumonia in Wuhan City, Hubei Province of China. The virus did not match any other known virus. This raised concern because when a virus is new, we do not know how it affects people.

So daily level information on the affected people can give some interesting insights when it is made available to the broader data science community.

Johns Hopkins University has made an excellent dashboard using the affected cases data. Data is extracted from the google sheets associated and made available here.

Edited:

Now data is available as csv files in the Johns Hopkins Github repository. Please refer to the github repository for the Terms of Use details. Uploading it here for using it in Kaggle kernels and getting insights from the broader DS community.

Content

2019 Novel Coronavirus (2019-nCoV) is a virus (more specifically, a coronavirus) identified as the cause of an outbreak of respiratory illness first detected in Wuhan, China. Early on, many of the patients in the outbreak in Wuhan, China reportedly had some link to a large seafood and animal market, suggesting animal-to-person spread. However, a growing number of patients reportedly have not had exposure to animal markets, indicating person-to-person spread is occurring. At this time, it’s unclear how easily or sustainably this virus is spreading between people - CDC

This dataset has daily level information on the number of affected cases, deaths and recovery from 2019 novel coronavirus. Please note that this is a time series data and so the number of cases on any given day is the cumulative number.

The data is available from 22 Jan, 2020.

Here’s a polished version suitable for a professional Kaggle dataset description:

Dataset Description

This dataset contains time-series and case-level records of the COVID-19 pandemic. The primary file is covid_19_data.csv, with supporting files for earlier records and individual-level line list data.

Files and Columns

1. covid_19_data.csv (Main File)

This is the primary dataset and contains aggregated COVID-19 statistics by location and date.

Sno – Serial number of the record

ObservationDate – Date of the observation (MM/DD/YYYY)

Province/State – Province or state of the observation (may be missing for some entries)

Country/Region – Country of the observation

Last Update – Timestamp (UTC) when the record was last updated (not standardized, requires cleaning before use)

Confirmed – Cumulative number of confirmed cases on that date

Deaths – Cumulative number of deaths on that date

Recovered – Cumulative number of recoveries on that date

2. 2019_ncov_data.csv (Legacy File)

This file contains earlier COVID-19 records. It is no longer updated and is provided only for historical reference. For current analysis, please use covid_19_data.csv.

3. COVID_open_line_list_data.csv

This file provides individual-level case information, obtained from an open data source. It includes patient demographics, travel history, and case outcomes.

4. COVID19_line_list_data.csv

Another individual-level case dataset, also obtained from public sources, with detailed patient-level information useful for micro-level epidemiological analysis.

✅ Use covid_19_data.csv for up-to-date aggregated global trends.

✅ Use the line list datasets for detailed, individual-level case analysis.

Country level datasets:

If you are interested in knowing country level data, please refer to the following Kaggle datasets:

India - https://www.kaggle.com/sudalairajkumar/covid19-in-india

South Korea - https://www.kaggle.com/kimjihoo/coronavirusdataset

Italy - https://www.kaggle.com/sudalairajkumar/covid19-in-italy

Brazil - https://www.kaggle.com/unanimad/corona-virus-brazil

USA - https://www.kaggle.com/sudalairajkumar/covid19-in-usa

Switzerland - https://www.kaggle.com/daenuprobst/covid19-cases-switzerland

Indonesia - https://www.kaggle.com/ardisragen/indonesia-coronavirus-cases

Acknowledgements :

Johns Hopkins University for making the data available for educational and academic research purposes

MoBS lab - https://www.mobs-lab.org/2019ncov.html

World Health Organization (WHO): https://www.who.int/

DXY.cn. Pneumonia. 2020. http://3g.dxy.cn/newh5/view/pneumonia.

BNO News: https://bnonews.com/index.php/2020/02/the-latest-coronavirus-cases/

National Health Commission of the People’s Republic of China (NHC): http://www.nhc.gov.cn/xcs/yqtb/list_gzbd.shtml

China CDC (CCDC): http://weekly.chinacdc.cn/news/TrackingtheEpidemic.htm

Hong Kong Department of Health: https://www.chp.gov.hk/en/features/102465.html

Macau Government: https://www.ssm.gov.mo/portal/

Taiwan CDC: https://sites.google....
IRS 990
console.cloud.google.com
Updated Nov 23, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
https://console.cloud.google.com/marketplace/browse?filter=partner:U.S.%20Internal%20Revenue%20Service&hl=pt (2023). IRS 990 [Dataset]. https://console.cloud.google.com/marketplace/product/internal-revenue-service/irs-990?hl=pt
Explore at:
Dataset updated
Nov 23, 2023
Dataset provided by
Googlehttp://google.com/
Description
Form 990 is used by the United States Internal Revenue Service to gather financial information about nonprofit/exempt organizations. This BigQuery dataset can be used to perform research and analysis of organizations that have electronically filed Forms 990, 990-EZ and 990-PF. For a complete description of data variables available in this dataset, see the IRS’s extract documentation . This public dataset is hosted in Google BigQuery and is included in BigQuery's 1TB/mo of free tier processing. This means that each user receives 1TB of free BigQuery processing every month, which can be used to run queries on this public dataset. Watch this short video to learn how to get started quickly using BigQuery to access public datasets. What is BigQuery .
g
Economic Indicators
health.google.com
Updated Oct 7, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2021). Economic Indicators [Dataset]. https://health.google.com/covid-19/open-data/raw-data
Explore at:
Dataset updated
Oct 7, 2021
Variables measured
gdp, key, gdp_per_capita, human_capital_index
Description
Various economic indicators.
Data_Sheet_1_Public interest in different types of masks and its...
frontiersin.figshare.com
docx
Updated Jun 8, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Andy Wai Kan Yeung; Emil D. Parvanov; Jarosław Olav Horbańczuk; Maria Kletecka-Pulker; Oliver Kimberger; Harald Willschke; Atanas G. Atanasov (2023). Data_Sheet_1_Public interest in different types of masks and its relationship with pandemic and policy measures during the COVID-19 pandemic: a study using Google Trends data.docx [Dataset]. http://doi.org/10.3389/fpubh.2023.1010674.s001
Explore at:
docxAvailable download formats
Unique identifier
https://doi.org/10.3389/fpubh.2023.1010674.s001
Dataset updated
Jun 8, 2023
Dataset provided by
Frontiers Mediahttp://www.frontiersin.org/
Authors
Andy Wai Kan Yeung; Emil D. Parvanov; Jarosław Olav Horbańczuk; Maria Kletecka-Pulker; Oliver Kimberger; Harald Willschke; Atanas G. Atanasov
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Google Trends data have been used to investigate various themes on online information seeking. It was unclear if the population from different parts of the world shared the same amount of attention to different mask types during the COVID-19 pandemic. This study aimed to reveal which types of masks were frequently searched by the public in different countries, and evaluated if public attention to masks could be related to mandatory policy, stringency of the policy, and transmission rate of COVID-19. By referring to an open dataset hosted at the online database Our World in Data, the 10 countries with the highest total number of COVID-19 cases as of 9th of February 2022 were identified. For each of these countries, the weekly new cases per million population, reproduction rate (of COVID-19), stringency index, and face covering policy score were computed from the raw daily data. Google Trends were queried to extract the relative search volume (RSV) for different types of masks from each of these countries. Results found that Google searches for N95 masks were predominant in India, whereas surgical masks were predominant in Russia, FFP2 masks were predominant in Spain, and cloth masks were predominant in both France and United Kingdom. The United States, Brazil, Germany, and Turkey had two predominant types of mask. The online searching behavior for masks markedly varied across countries. For most of the surveyed countries, the online searching for masks peaked during the first wave of COVID-19 pandemic before the government implemented mandatory mask wearing. The search for masks positively correlated with the government response stringency index but not with the COVID-19 reproduction rate or the new cases per million.
C
Cloud-Based Data Analytics Platform Report
datainsightsmarket.com
doc, pdf, ppt
Updated Oct 25, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Data Insights Market (2025). Cloud-Based Data Analytics Platform Report [Dataset]. https://www.datainsightsmarket.com/reports/cloud-based-data-analytics-platform-499252
Explore at:
ppt, pdf, docAvailable download formats
Dataset updated
Oct 25, 2025
Dataset authored and provided by
Data Insights Market
License
https://www.datainsightsmarket.com/privacy-policyhttps://www.datainsightsmarket.com/privacy-policy
Time period covered
2025 - 2033
Area covered
Global
Variables measured
Market Size
Description
The global Cloud-Based Data Analytics Platform market is poised for significant expansion, projected to reach a substantial market size of $150 billion by 2025, exhibiting a robust Compound Annual Growth Rate (CAGR) of 18% throughout the forecast period of 2025-2033. This impressive growth trajectory is fueled by an increasing reliance on data-driven decision-making across all industries. Key drivers include the escalating volume and complexity of data, the growing demand for real-time insights to gain a competitive edge, and the inherent scalability and cost-effectiveness offered by cloud platforms compared to on-premise solutions. Businesses are increasingly leveraging these platforms to extract actionable intelligence from their data, enabling them to optimize operations, enhance customer experiences, and identify new revenue streams. The democratization of data analytics tools, with user-friendly interfaces and advanced AI/ML capabilities, is further accelerating adoption among small and medium-sized enterprises, broadening the market's reach and impact. The market landscape is characterized by a dynamic interplay of technological advancements and evolving business needs. Major trends include the proliferation of hybrid and multi-cloud strategies, offering organizations greater flexibility and control over their data. Advancements in AI and machine learning are deeply integrated into these platforms, enabling more sophisticated predictive analytics, natural language processing for query simplification, and automated insights. The emphasis on data governance, security, and compliance in cloud environments is also a critical consideration, with vendors investing heavily in robust security features. While the market experiences immense growth, potential restraints such as data privacy concerns, vendor lock-in anxieties, and the need for skilled personnel to manage and interpret complex data sets present challenges. However, the overwhelming benefits of enhanced agility, improved collaboration, and reduced IT infrastructure costs continue to drive strong market momentum, with platforms like those offered by industry leaders such as Amazon, Google, Microsoft, and Snowflake dominating the competitive arena. This comprehensive report provides an in-depth analysis of the global Cloud-Based Data Analytics Platform market, forecasting its trajectory from 2019 to 2033, with a base year of 2025. The study delves into the market's intricate dynamics, exploring its growth drivers, challenges, and emerging trends, while also providing valuable insights into its competitive landscape and key regional contributions. The estimated market size is expected to reach $XX million by 2025, with significant growth projected during the forecast period.
Speedtest Open Data - Australia 2020-04-01 extract
figshare.com
txt
Updated Oct 24, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Richard Ferrers; Speedtest Global Index (2025). Speedtest Open Data - Australia 2020-04-01 extract [Dataset]. http://doi.org/10.6084/m9.figshare.13370504.v3
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.13370504.v3
Dataset updated
Oct 24, 2025
Dataset provided by
Figsharehttp://figshare.com/
figshare
Authors
Richard Ferrers; Speedtest Global Index
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Australia
Description
This is an Australian extract of Speedtest Open data available at Amazon WS (link below - opendata.aws).AWS data licence is "CC BY-NC-SA 4.0", so use of this data must be:- non-commercial (NC)- reuse must be share-alike (SA)(add same licence).This restricts the standard CC-BY Figshare licence.A world speedtest open data was dowloaded (>400Mb, 7M lines of data). An extract of Australia's location (lat, long) revealed 88,000 lines of data (attached as csv).A Jupyter notebook of extract process is attached.A link to Twitter thread of outputs provided.A link to Data tutorial provided (GitHub), including Jupyter Notebook to analyse World Speedtest data, selecting one US State.Data Shows:- 3.1M speedtests- 762,000 devices- 88,000 grid locations (600m * 600m), summarised as a point- average speed 33.7Mbps (down), 12.4M (up)- Max speed 724Mbps- data is for 600m * 600m grids, showing average speed up/down, number of tests, and number of users (IP). Added centroid, and now lat/long.See tweet of image of centroids also attached.Versions:v3 - Add super fast data (>100Mbps) less than 1% of data - 697 lines. Includes png of superfast.plot(). Link below to Google Maps version of superfast data points. Also Google map of first 100 data points - sample data. Geojson format for loading into GeoPandas, per Jupyter Notebook. New version of Jupyter Notebook, v.1.1.v2 - add centroids image.v1 - initial data load.
n
Build better LibGuides: A dataset of Political Science, Public Affairs, and...
data.niaid.nih.gov
search.dataone.org
+1more
zip
Updated May 30, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Annelise Sklar (2024). Build better LibGuides: A dataset of Political Science, Public Affairs, and International Studies LibGuides [Dataset]. http://doi.org/10.5061/dryad.prr4xgxvk
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5061/dryad.prr4xgxvk
Dataset updated
May 30, 2024
Dataset provided by
University of California, San Diego
Authors
Annelise Sklar
License
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Description
The dataset that accompanies the "Build Better LibGuides" chapter of Teaching Information Literacy in Political Science, Public Affairs, and International Studies. This dataset was created to compare current practices in Political Science, Public Affairs, and International Studies (PSPAIS) LibGuides with recommended best practices using a sample that represents a variety of academic institutions. Members of the ACRL Politics, Policy, and International Relations Section (PPIRS) were identified as the librarians most likely to be actively engaged with these specific subjects, so the dataset was scoped by identifying the institutions associated with the most active PPIRS members and then locating the LibGuides in these and related disciplines. The resulting dataset includes 101 guides at 46 institutions, for a total of 887 LibGuide tabs. Methods This dataset was created to compare current practices in Political Science, Public Affairs, and International Studies (PSPAIS) LibGuides with recommended best practices using a sample that represents a variety of academic institutions. Members of the ACRL Politics, Policy, and International Relations Section (PPIRS) were identified as the librarians most likely to be actively engaged with these specific subjects, so the dataset was scoped by identifying the institutions associated with the most active PPIRS members and then locating the LibGuides in these and related disciplines. Specifically, a student assistant collected the names and institutional affiliations of each member serving on a PPIRS committee as of July 1, 2021, 2022, and 2023. The student then removed the individual librarian names from the list and located the links to the Political Science or Government; Public Policy, Public Affairs, or Public Administration; and International Studies or International Relations LibGuides at each institution. The chapter author then confirmed and, in a few cases, added to the student's work and copied and pasted the tab names from each guide (which conveniently were also hyperlinked) into a Google Sheet. The resulting dataset included 101 guides at 46 institutions, for a total of 887 LibGuide tabs. A Google Apps script was used to extract the hyperlinks from the collected tab names and then a Python script was used to scrape the names of links included on each of the tabs. LibGuides from two institutions returned errors during the link name scraping process and were excluded in this part of the analysis.

Facebook

Twitter

Click to copy link

Link copied

Cite

APISCRAPY (2022). Google Map Data, Google Map Data Scraper, Business location Data- Scrape All Publicly Available Data From Google Map & Other Platforms [Dataset]. https://apiscrapy.mydatastorefront.com/products/google-map-data-google-map-data-scraper-business-location-d-apiscrapy

Google Map Data, Google Map Data Scraper, Business location Data- Scrape All Publicly Available Data From Google Map & Other Platforms

Explore at:

Dataset updated

May 23, 2022

Dataset authored and provided by

APISCRAPY

Area covered

Greece, Moldova, Iceland, Lithuania, Latvia, United States Minor Outlying Islands, Romania, Luxembourg, Germany, Liechtenstein

Description

Explore APISCRAPY, your AI-powered Google Map Data Scraper. Easily extract Business Location Data from Google Maps and other platforms. Seamlessly access and utilize publicly available map data for your business needs. Scrape All Publicly Available Data From Google Maps & Other Platforms.

Clear search

Close search

Google apps

Main menu

Google Map Data, Google Map Data Scraper, Business location Data- Scrape All...

Google Maps Dataset

Speedtest Open Data - Four International cities - MEL, BKK, SHG, LAX plus...

Melb 14784 lines Avg download speed 69.4M Tests 0.39M

SHG 31207 lines Avg 233.7M Tests 0.56M

ALC 113 lines Avg 51.5M Test 1092

BKK 29684 lines Avg 215.9M Tests 1.2M

LAX 15505 lines Avg 218.5M Tests 0.74M

IRS Form 990 Data

Context

Content

Acknowledgements

Inspiration

Company Datasets for Business Profiling

Day & night temperatures, 50yrs, 1666ws, TFRecord

Job Postings Dataset for Labour Market Research and Insights

Automatically Extracted Buildings

Public electrophysiological datasets collected in the Buzsaki Lab

github-r-repos

Demographics

Speedtest Open Data - Australia 2020 Q2, Q3, Q4 extract

Product Review Datasets for User Sentiment Analysis

Novel Covid-19 Dataset

Context:

Edited:

Content

Dataset Description

Files and Columns

1. covid_19_data.csv (Main File)

2. 2019_ncov_data.csv (Legacy File)

3. COVID_open_line_list_data.csv

4. COVID19_line_list_data.csv

Country level datasets:

Acknowledgements :

IRS 990

Economic Indicators

Data_Sheet_1_Public interest in different types of masks and its...

Cloud-Based Data Analytics Platform Report

Speedtest Open Data - Australia 2020-04-01 extract

Build better LibGuides: A dataset of Political Science, Public Affairs, and...

Google Map Data, Google Map Data Scraper, Business location Data- Scrape All Publicly Available Data From Google Map & Other Platforms