46 datasets found
  1. m

    Google Map Data, Google Map Data Scraper, Business location Data- Scrape All...

    • apiscrapy.mydatastorefront.com
    Updated May 23, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    APISCRAPY (2022). Google Map Data, Google Map Data Scraper, Business location Data- Scrape All Publicly Available Data From Google Map & Other Platforms [Dataset]. https://apiscrapy.mydatastorefront.com/products/google-map-data-google-map-data-scraper-business-location-d-apiscrapy
    Explore at:
    Dataset updated
    May 23, 2022
    Dataset authored and provided by
    APISCRAPY
    Area covered
    Greece, Moldova, Iceland, Lithuania, Latvia, United States Minor Outlying Islands, Romania, Luxembourg, Germany, Liechtenstein
    Description

    Explore APISCRAPY, your AI-powered Google Map Data Scraper. Easily extract Business Location Data from Google Maps and other platforms. Seamlessly access and utilize publicly available map data for your business needs. Scrape All Publicly Available Data From Google Maps & Other Platforms.

  2. Google Maps Dataset

    • brightdata.com
    .json, .csv, .xlsx
    Updated Jan 8, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bright Data (2023). Google Maps Dataset [Dataset]. https://brightdata.com/products/datasets/google-maps
    Explore at:
    .json, .csv, .xlsxAvailable download formats
    Dataset updated
    Jan 8, 2023
    Dataset authored and provided by
    Bright Datahttps://brightdata.com/
    License

    https://brightdata.com/licensehttps://brightdata.com/license

    Area covered
    Worldwide
    Description

    The Google Maps dataset is ideal for getting extensive information on businesses anywhere in the world. Easily filter by location, business type, and other factors to get the exact data you need. The Google Maps dataset includes all major data points: timestamp, name, category, address, description, open website, phone number, open_hours, open_hours_updated, reviews_count, rating, main_image, reviews, url, lat, lon, place_id, country, and more.

  3. Speedtest Open Data - Four International cities - MEL, BKK, SHG, LAX plus...

    • figshare.com
    txt
    Updated May 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Richard Ferrers; Speedtest Global Index (2023). Speedtest Open Data - Four International cities - MEL, BKK, SHG, LAX plus ALC - 2020, 2022 [Dataset]. http://doi.org/10.6084/m9.figshare.13621169.v24
    Explore at:
    txtAvailable download formats
    Dataset updated
    May 30, 2023
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Richard Ferrers; Speedtest Global Index
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset compares four cities FIXED-line broadband internet speeds: - Melbourne, AU - Bangkok, TH - Shanghai, CN - Los Angeles, US - Alice Springs, AU

    ERRATA: 1.Data is for Q3 2020, but some files are labelled incorrectly as 02-20 of June 20. They all should read Sept 20, or 09-20 as Q3 20, rather than Q2. Will rename and reload. Amended in v7.

    1. LAX file named 0320, when should be Q320. Amended in v8.

    *lines of data for each geojson file; a line equates to a 600m^2 location, inc total tests, devices used, and average upload and download speed - MEL 16181 locations/lines => 0.85M speedtests (16.7 tests per 100people) - SHG 31745 lines => 0.65M speedtests (2.5/100pp) - BKK 29296 lines => 1.5M speedtests (14.3/100pp) - LAX 15899 lines => 1.3M speedtests (10.4/100pp) - ALC 76 lines => 500 speedtests (2/100pp)

    Geojsons of these 2* by 2* extracts for MEL, BKK, SHG now added, and LAX added v6. Alice Springs added v15.

    This dataset unpacks, geospatially, data summaries provided in Speedtest Global Index (linked below). See Jupyter Notebook (*.ipynb) to interrogate geo data. See link to install Jupyter.

    ** To Do Will add Google Map versions so everyone can see without installing Jupyter. - Link to Google Map (BKK) added below. Key:Green > 100Mbps(Superfast). Black > 500Mbps (Ultrafast). CSV provided. Code in Speedtestv1.1.ipynb Jupyter Notebook. - Community (Whirlpool) surprised [Link: https://whrl.pl/RgAPTl] that Melb has 20% at or above 100Mbps. Suggest plot Top 20% on map for community. Google Map link - now added (and tweet).

    ** Python melb = au_tiles.cx[144:146 , -39:-37] #Lat/Lon extract shg = tiles.cx[120:122 , 30:32] #Lat/Lon extract bkk = tiles.cx[100:102 , 13:15] #Lat/Lon extract lax = tiles.cx[-118:-120, 33:35] #lat/Lon extract ALC=tiles.cx[132:134, -22:-24] #Lat/Lon extract

    Histograms (v9), and data visualisations (v3,5,9,11) will be provided. Data Sourced from - This is an extract of Speedtest Open data available at Amazon WS (link below - opendata.aws).

    **VERSIONS v.24 Add tweet and google map of Top 20% (over 100Mbps locations) in Mel Q322. Add v.1.5 MEL-Superfast notebook, and CSV of results (now on Google Map; link below). v23. Add graph of 2022 Broadband distribution, and compare 2020 - 2022. Updated v1.4 Jupyter notebook. v22. Add Import ipynb; workflow-import-4cities. v21. Add Q3 2022 data; five cities inc ALC. Geojson files. (2020; 4.3M tests 2022; 2.9M tests)

    Melb 14784 lines Avg download speed 69.4M Tests 0.39M

    SHG 31207 lines Avg 233.7M Tests 0.56M

    ALC 113 lines Avg 51.5M Test 1092

    BKK 29684 lines Avg 215.9M Tests 1.2M

    LAX 15505 lines Avg 218.5M Tests 0.74M

    v20. Speedtest - Five Cities inc ALC. v19. Add ALC2.ipynb. v18. Add ALC line graph. v17. Added ipynb for ALC. Added ALC to title.v16. Load Alice Springs Data Q221 - csv. Added Google Map link of ALC. v15. Load Melb Q1 2021 data - csv. V14. Added Melb Q1 2021 data - geojson. v13. Added Twitter link to pics. v12 Add Line-Compare pic (fastest 1000 locations) inc Jupyter (nbn-intl-v1.2.ipynb). v11 Add Line-Compare pic, plotting Four Cities on a graph. v10 Add Four Histograms in one pic. v9 Add Histogram for Four Cities. Add NBN-Intl.v1.1.ipynb (Jupyter Notebook). v8 Renamed LAX file to Q3, rather than 03. v7 Amended file names of BKK files to correctly label as Q3, not Q2 or 06. v6 Added LAX file. v5 Add screenshot of BKK Google Map. v4 Add BKK Google map(link below), and BKK csv mapping files. v3 replaced MEL map with big key version. Prev key was very tiny in top right corner. v2 Uploaded MEL, SHG, BKK data and Jupyter Notebook v1 Metadata record

    ** LICENCE AWS data licence on Speedtest data is "CC BY-NC-SA 4.0", so use of this data must be: - non-commercial (NC) - reuse must be share-alike (SA)(add same licence). This restricts the standard CC-BY Figshare licence.

    ** Other uses of Speedtest Open Data; - see link at Speedtest below.

  4. IRS Form 990 Data

    • kaggle.com
    zip
    Updated Mar 20, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Internal Revenue Service (2019). IRS Form 990 Data [Dataset]. https://www.kaggle.com/datasets/irs/irs-990
    Explore at:
    zip(0 bytes)Available download formats
    Dataset updated
    Mar 20, 2019
    Dataset provided by
    irs.govhttp://www.irs.gov/
    Internal Revenue Servicehttp://www.irs.gov/
    Authors
    Internal Revenue Service
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Context

    Form 990 (officially, the "Return of Organization Exempt From Income Tax"1) is a United States Internal Revenue Service form that provides the public with financial information about a nonprofit organization. It is often the only source of such information. It is also used by government agencies to prevent organizations from abusing their tax-exempt status. Source: https://en.wikipedia.org/wiki/Form_990

    Content

    Form 990 is used by the United States Internal Revenue Service to gather financial information about nonprofit/exempt organizations. This BigQuery dataset can be used to perform research and analysis of organizations that have electronically filed Forms 990, 990-EZ and 990-PF. For a complete description of data variables available in this dataset, see the IRS’s extract documentation: https://www.irs.gov/uac/soi-tax-stats-annual-extract-of-tax-exempt-organization-financial-data.

    Update Frequency: Annual

    Fork this kernel to get started with this dataset.

    Acknowledgements

    https://bigquery.cloud.google.com/dataset/bigquery-public-data:irs_990

    https://cloud.google.com/bigquery/public-data/irs-990

    Dataset Source: U.S. Internal Revenue Service. This dataset is publicly available for anyone to use under the following terms provided by the Dataset Source - http://www.data.gov/privacy-policy#data_policy - and is provided "AS IS" without any warranty, express or implied, from Google. Google disclaims all liability for any damages, direct or indirect, resulting from the use of the dataset.

    Banner Photo by @rawpixel from Unplash.

    Inspiration

    What organizations filed tax exempt status in 2015?

    What was the revenue of the American Red Cross in 2017?

  5. Company Datasets for Business Profiling

    • datarade.ai
    Updated Feb 23, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Oxylabs (2017). Company Datasets for Business Profiling [Dataset]. https://datarade.ai/data-products/company-datasets-for-business-profiling-oxylabs
    Explore at:
    .json, .xml, .csv, .xlsAvailable download formats
    Dataset updated
    Feb 23, 2017
    Dataset authored and provided by
    Oxylabs
    Area covered
    Moldova (Republic of), Tunisia, Isle of Man, Canada, Taiwan, Bangladesh, Andorra, Northern Mariana Islands, British Indian Ocean Territory, Nepal
    Description

    Company Datasets for valuable business insights!

    Discover new business prospects, identify investment opportunities, track competitor performance, and streamline your sales efforts with comprehensive Company Datasets.

    These datasets are sourced from top industry providers, ensuring you have access to high-quality information:

    • Owler: Gain valuable business insights and competitive intelligence. -AngelList: Receive fresh startup data transformed into actionable insights. -CrunchBase: Access clean, parsed, and ready-to-use business data from private and public companies. -Craft.co: Make data-informed business decisions with Craft.co's company datasets. -Product Hunt: Harness the Product Hunt dataset, a leader in curating the best new products.

    We provide fresh and ready-to-use company data, eliminating the need for complex scraping and parsing. Our data includes crucial details such as:

    • Company name;
    • Size;
    • Founding date;
    • Location;
    • Industry;
    • Revenue;
    • Employee count;
    • Competitors.

    You can choose your preferred data delivery method, including various storage options, delivery frequency, and input/output formats.

    Receive datasets in CSV, JSON, and other formats, with storage options like AWS S3 and Google Cloud Storage. Opt for one-time, monthly, quarterly, or bi-annual data delivery.

    With Oxylabs Datasets, you can count on:

    • Fresh and accurate data collected and parsed by our expert web scraping team.
    • Time and resource savings, allowing you to focus on data analysis and achieving your business goals.
    • A customized approach tailored to your specific business needs.
    • Legal compliance in line with GDPR and CCPA standards, thanks to our membership in the Ethical Web Data Collection Initiative.

    Pricing Options:

    Standard Datasets: choose from various ready-to-use datasets with standardized data schemas, priced from $1,000/month.

    Custom Datasets: Tailor datasets from any public web domain to your unique business needs. Contact our sales team for custom pricing.

    Experience a seamless journey with Oxylabs:

    • Understanding your data needs: We work closely to understand your business nature and daily operations, defining your unique data requirements.
    • Developing a customized solution: Our experts create a custom framework to extract public data using our in-house web scraping infrastructure.
    • Delivering data sample: We provide a sample for your feedback on data quality and the entire delivery process.
    • Continuous data delivery: We continuously collect public data and deliver custom datasets per the agreed frequency.

    Unlock the power of data with Oxylabs' Company Datasets and supercharge your business insights today!

  6. Day & night temperatures, 50yrs, 1666ws, TFRecord

    • kaggle.com
    zip
    Updated Nov 9, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Martin Görner (2019). Day & night temperatures, 50yrs, 1666ws, TFRecord [Dataset]. https://www.kaggle.com/datasets/mgorner/day-night-temperatures-50yrs-1666ws-tfrecord
    Explore at:
    zip(160157825 bytes)Available download formats
    Dataset updated
    Nov 9, 2019
    Authors
    Martin Görner
    License

    https://www.usa.gov/government-works/https://www.usa.gov/government-works/

    Description

    This dataset is a cleaned-up extract from the following public BigQuery dataset: https://console.cloud.google.com/marketplace/details/noaa-public/ghcn-d

    The dataset contains daily min/max temperatures from a selection of 1666 weather stations. The data spans exactly 50 years. Missing values have been interpolated and are marked as such.

    This dataset is in TFRecord format.

    About the original dataset: NOAA’s Global Historical Climatology Network (GHCN) is an integrated database of climate summaries from land surface stations across the globe that have been subjected to a common suite of quality assurance reviews. The data are obtained from more than 20 sources. The GHCN-Daily is an integrated database of daily climate summaries from land surface stations across the globe, and is comprised of daily climate records from over 100,000 stations in 180 countries and territories, and includes some data from every year since 1763.

  7. d

    Job Postings Dataset for Labour Market Research and Insights

    • datarade.ai
    Updated Sep 20, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Oxylabs (2023). Job Postings Dataset for Labour Market Research and Insights [Dataset]. https://datarade.ai/data-products/job-postings-dataset-for-labour-market-research-and-insights-oxylabs
    Explore at:
    .json, .xml, .csv, .xlsAvailable download formats
    Dataset updated
    Sep 20, 2023
    Dataset authored and provided by
    Oxylabs
    Area covered
    Jamaica, Zambia, Togo, Luxembourg, Tajikistan, British Indian Ocean Territory, Sierra Leone, Switzerland, Kyrgyzstan, Anguilla
    Description

    Introducing Job Posting Datasets: Uncover labor market insights!

    Elevate your recruitment strategies, forecast future labor industry trends, and unearth investment opportunities with Job Posting Datasets.

    Job Posting Datasets Source:

    1. Indeed: Access datasets from Indeed, a leading employment website known for its comprehensive job listings.

    2. Glassdoor: Receive ready-to-use employee reviews, salary ranges, and job openings from Glassdoor.

    3. StackShare: Access StackShare datasets to make data-driven technology decisions.

    Job Posting Datasets provide meticulously acquired and parsed data, freeing you to focus on analysis. You'll receive clean, structured, ready-to-use job posting data, including job titles, company names, seniority levels, industries, locations, salaries, and employment types.

    Choose your preferred dataset delivery options for convenience:

    Receive datasets in various formats, including CSV, JSON, and more. Opt for storage solutions such as AWS S3, Google Cloud Storage, and more. Customize data delivery frequencies, whether one-time or per your agreed schedule.

    Why Choose Oxylabs Job Posting Datasets:

    1. Fresh and accurate data: Access clean and structured job posting datasets collected by our seasoned web scraping professionals, enabling you to dive into analysis.

    2. Time and resource savings: Focus on data analysis and your core business objectives while we efficiently handle the data extraction process cost-effectively.

    3. Customized solutions: Tailor our approach to your business needs, ensuring your goals are met.

    4. Legal compliance: Partner with a trusted leader in ethical data collection. Oxylabs is a founding member of the Ethical Web Data Collection Initiative, aligning with GDPR and CCPA best practices.

    Pricing Options:

    Standard Datasets: choose from various ready-to-use datasets with standardized data schemas, priced from $1,000/month.

    Custom Datasets: Tailor datasets from any public web domain to your unique business needs. Contact our sales team for custom pricing.

    Experience a seamless journey with Oxylabs:

    • Understanding your data needs: We work closely to understand your business nature and daily operations, defining your unique data requirements.
    • Developing a customized solution: Our experts create a custom framework to extract public data using our in-house web scraping infrastructure.
    • Delivering data sample: We provide a sample for your feedback on data quality and the entire delivery process.
    • Continuous data delivery: We continuously collect public data and deliver custom datasets per the agreed frequency.

    Effortlessly access fresh job posting data with Oxylabs Job Posting Datasets.

  8. G

    Automatically Extracted Buildings

    • open.canada.ca
    • catalogue.arctic-sdi.org
    • +2more
    fgdb/gdb, html, kmz +3
    Updated Oct 23, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Natural Resources Canada (2025). Automatically Extracted Buildings [Dataset]. https://open.canada.ca/data/en/dataset/7a5cda52-c7df-427f-9ced-26f19a8a64d6
    Explore at:
    pdf, html, wms, fgdb/gdb, kmz, shpAvailable download formats
    Dataset updated
    Oct 23, 2025
    Dataset provided by
    Natural Resources Canada
    License

    Open Government Licence - Canada 2.0https://open.canada.ca/en/open-government-licence-canada
    License information was derived automatically

    Description

    “Automatically Extracted Buildings” is a raw digital product in vector format created by NRCan. It consists of a single topographical feature class that delineates polygonal building footprints automatically extracted from airborne Lidar data, high-resolution optical imagery or other sources.

  9. Public electrophysiological datasets collected in the Buzsaki Lab

    • zenodo.org
    Updated Jul 22, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Peter Christian Petersen; Peter Christian Petersen; Michelle Hernandez; György Buzsáki; György Buzsáki; Michelle Hernandez (2024). Public electrophysiological datasets collected in the Buzsaki Lab [Dataset]. http://doi.org/10.5281/zenodo.3629881
    Explore at:
    Dataset updated
    Jul 22, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Peter Christian Petersen; Peter Christian Petersen; Michelle Hernandez; György Buzsáki; György Buzsáki; Michelle Hernandez
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The Buzsaki Lab is proud to present a large selection of experimental data available for public access: https://buzsakilab.com/wp/database/. We publicly share more than a thousand sessions (about 40TB of raw and spike- and LFP-processed data) via our public data repository. The datasets are from freely moving rodents and include sleep-task-sleep sessions (3 to 24 hrs continuous recording sessions) in various brain structures, including metadata. We are happy to assist you in using the data. Our goal is that by sharing these data, other users can provide new insights, extend, contradict, or clarify our conclusions.

    The databank contains electrophysiological recordings performed in freely moving rats and mice collected by investigators in the Buzsaki Lab over several years (a subset from head-fixed mice). Sessions have been collected with extracellular electrodes using high-channel-count silicon probes, with spike sorted single units, and intracellular and juxtacellular combined with extracellular electrodes. Several sessions include physiologically and optogenetically identified units. The sessions have been collected from various brain region pairs: the hippocampus, thalamus, amygdala, post-subiculum, septal region, and the entorhinal cortex, and various neocortical regions. In most behavioral tasks, the animals performed spatial behaviors (linear mazes and open fields), preceded and followed by long sleep sessions. Brain state classification is provided.

    Getting started

    The top menu “Databank” serves as a navigational menu to the databank. The metadata describing the experiments is stored in a relational database which means that there are many entry points for exploring the data. The databank is organized by projects, animal subjects, and sessions.

    Accessing and downloading the datasets

    We share the data through two services: our public Globus.org endpoint and our webshare: buzsakilab.nyumc.org. A subset of the datasets is also available at CRCNS.org. If you have an interest in a dataset that is not listed or is lacking information, please contact us. We pledge to make our data available immediately after publication.

    Support

    For support, please use our Buzsaki Databank google group. If you have an interest in a dataset that is not listed or is lacking information, please send us a request. Feel free to contact us, if you need more details on a given dataset or if a dataset is missing.

  10. h

    github-r-repos

    • huggingface.co
    Updated Jun 6, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Daniel Falbel (2023). github-r-repos [Dataset]. https://huggingface.co/datasets/dfalbel/github-r-repos
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jun 6, 2023
    Authors
    Daniel Falbel
    License

    https://choosealicense.com/licenses/other/https://choosealicense.com/licenses/other/

    Description

    GitHub R repositories dataset

    R source files from GitHub. This dataset has been created using the public GitHub datasets from Google BigQuery. This is the actual query that has been used to export the data: EXPORT DATA OPTIONS ( uri = 'gs://your-bucket/gh-r/*.parquet', format = 'PARQUET') as ( select f.id, f.repo_name, f.path, c.content, c.size from ( SELECT distinct id, repo_name, path FROM bigquery-public-data.github_repos.files where ends_with(path… See the full description on the dataset page: https://huggingface.co/datasets/dfalbel/github-r-repos.

  11. g

    Demographics

    • health.google.com
    Updated Oct 7, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2021). Demographics [Dataset]. https://health.google.com/covid-19/open-data/raw-data
    Explore at:
    Dataset updated
    Oct 7, 2021
    Variables measured
    key, population, population_male, rural_population, urban_population, population_female, population_density, clustered_population, population_age_00_09, population_age_10_19, and 11 more
    Description

    Various population statistics, including structured demographics data.

  12. Speedtest Open Data - Australia 2020 Q2, Q3, Q4 extract

    • figshare.com
    txt
    Updated Oct 24, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Richard Ferrers; Speedtest Global Index (2025). Speedtest Open Data - Australia 2020 Q2, Q3, Q4 extract [Dataset]. http://doi.org/10.6084/m9.figshare.13370504.v17
    Explore at:
    txtAvailable download formats
    Dataset updated
    Oct 24, 2025
    Dataset provided by
    figshare
    Figsharehttp://figshare.com/
    Authors
    Richard Ferrers; Speedtest Global Index
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Australia
    Description

    This is an Australian extract of Speedtest Open data available at Amazon WS (link below - opendata.aws).AWS data licence is "CC BY-NC-SA 4.0", so use of this data must be:- non-commercial (NC)- reuse must be share-alike (SA)(add same licence).This restricts the standard CC-BY Figshare licence.A world speedtest open data was dowloaded (>400Mb, 7M lines of data). An extract of Australia's location (lat, long) revealed 88,000 lines of data (attached as csv).A Jupyter notebook of extract process is attached.A link to Twitter thread of outputs provided.A link to Data tutorial provided (GitHub), including Jupyter Notebook to analyse World Speedtest data, selecting one US State.Data Shows: (Q2)- 3.1M speedtests- 762,000 devices- 88,000 grid locations (600m * 600m), summarised as a point- average speed 33.7Mbps (down), 12.4M (up)- Max speed 724Mbps- data is for 600m * 600m grids, showing average speed up/down, number of tests, and number of users (IP). Added centroid, and now lat/long.See tweet of image of centroids also attached.Versions:v15/16. Add Hist comparing Q1-21 vs Q2-20. Inc ipynb (incHistQ121, v.1.3-Q121) to calc.v14 Add AUS Speedtest Q1 2021 geojson.(79k lines avg d/l 45.4Mbps)v13 - Added three colour MELB map (less than 20Mbps, over 90Mbps, 20-90Mbps)v12 - Added AUS - Syd - Mel Line Chart Q320.v11 - Add line chart compare Q2, Q3, Q4 plus Melb - result virtually indistinguishable. Add line chart to compare Syd - Melb Q3. Also virtually indistinguishable. Add HIST compare Syd - Melb Q3. Add new Jupyter with graph calcs (nbn-AUS-v1.3). Some ERRATA document in Notebook. Issue with resorting table, and graphing only part of table. Not an issue if all lines of table graphed.v10 - Load AURIN sample pics. Speedtest data loaded to AURIN geo-analytic platform; requires edu.au login.v9 - Add comparative Q2, Q3, Q4 Hist pic.v8 - Added Q4 data geojson. Add Q3, Q4 Hist pic.v7 - Rename to include Q2, Q3 in Title.v6 - Add Q3 20 data. Rename geojson AUS data as Q2. Add comparative Histogram. Calc in International.ipynb.v5 - add Jupyter Notebook inc Histograms. Hist is count of geo-locations avg download speed (unweighted by tests).v4 - added Melb choropleth (png 50Mpix) inc legend. (To do - add Melb.geojson). Posted Link to AURIN description of Speedtest data.v3 - Add super fast data (>100Mbps) less than 1% of data - 697 lines. Includes png of superfast.plot(). Link below to Google Maps version of superfast data points. Also Google map of first 100 data points - sample data. Geojson format for loading into GeoPandas, per Jupyter Notebook. New version of Jupyter Notebook, v.1.1.v2 - add centroids image.v1 - initial data load.** Future Work- combine Speedtest data with NBN Technology by location data (national map.gov.au); https://www.data.gov.au/dataset/national-broadband-network-connections-by-technology-type- combine Speedtest data with SEIFA data - socioeconomic categories - to discuss with AURIN.- Further international comparisons- discussed collaboration with Assoc Prof Tooran Alizadeh, USyd.

  13. Product Review Datasets for User Sentiment Analysis

    • datarade.ai
    Updated Sep 28, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Oxylabs (2018). Product Review Datasets for User Sentiment Analysis [Dataset]. https://datarade.ai/data-products/product-review-datasets-for-user-sentiment-analysis-oxylabs
    Explore at:
    .json, .xml, .csv, .xlsAvailable download formats
    Dataset updated
    Sep 28, 2018
    Dataset authored and provided by
    Oxylabs
    Area covered
    Egypt, Libya, Hong Kong, Canada, South Africa, Sudan, Italy, Argentina, Antigua and Barbuda, Barbados
    Description

    Product Review Datasets: Uncover user sentiment

    Harness the power of Product Review Datasets to understand user sentiment and insights deeply. These datasets are designed to elevate your brand and product feature analysis, help you evaluate your competitive stance, and assess investment risks.

    Data sources:

    • Trustpilot: datasets encompassing general consumer reviews and ratings across various businesses, products, and services.

    Leave the data collection challenges to us and dive straight into market insights with clean, structured, and actionable data, including:

    • Product name;
    • Product category;
    • Number of ratings;
    • Ratings average;
    • Review title;
    • Review body;

    Choose from multiple data delivery options to suit your needs:

    1. Receive data in easy-to-read formats like spreadsheets or structured JSON files.
    2. Select your preferred data storage solutions, including SFTP, Webhooks, Google Cloud Storage, AWS S3, and Microsoft Azure Storage.
    3. Tailor data delivery frequencies, whether on-demand or per your agreed schedule.

    Why choose Oxylabs?

    1. Fresh and accurate data: Access organized, structured, and comprehensive data collected by our leading web scraping professionals.

    2. Time and resource savings: Concentrate on your core business goals while we efficiently handle the data extraction process at an affordable cost.

    3. Adaptable solutions: Share your specific data requirements, and we'll craft a customized data collection approach to meet your objectives.

    4. Legal compliance: Partner with a trusted leader in ethical data collection. Oxylabs is a founding member of the Ethical Web Data Collection Initiative, aligning with GDPR and CCPA standards.

    Pricing Options:

    Standard Datasets: choose from various ready-to-use datasets with standardized data schemas, priced from $1,000/month.

    Custom Datasets: Tailor datasets from any public web domain to your unique business needs. Contact our sales team for custom pricing.

    Experience a seamless journey with Oxylabs:

    • Understanding your data needs: We work closely to understand your business nature and daily operations, defining your unique data requirements.
    • Developing a customized solution: Our experts create a custom framework to extract public data using our in-house web scraping infrastructure.
    • Delivering data sample: We provide a sample for your feedback on data quality and the entire delivery process.
    • Continuous data delivery: We continuously collect public data and deliver custom datasets per the agreed frequency.

    Join the ranks of satisfied customers who appreciate our meticulous attention to detail and personalized support. Experience the power of Product Review Datasets today to uncover valuable insights and enhance decision-making.

  14. Novel Covid-19 Dataset

    • kaggle.com
    Updated Sep 18, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    GHOST5612 (2025). Novel Covid-19 Dataset [Dataset]. https://www.kaggle.com/datasets/ghost5612/novel-covid-19-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Sep 18, 2025
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    GHOST5612
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Context:

    From World Health Organization - On 31 December 2019, WHO was alerted to several cases of pneumonia in Wuhan City, Hubei Province of China. The virus did not match any other known virus. This raised concern because when a virus is new, we do not know how it affects people.

    So daily level information on the affected people can give some interesting insights when it is made available to the broader data science community.

    Johns Hopkins University has made an excellent dashboard using the affected cases data. Data is extracted from the google sheets associated and made available here.

    Edited:

    Now data is available as csv files in the Johns Hopkins Github repository. Please refer to the github repository for the Terms of Use details. Uploading it here for using it in Kaggle kernels and getting insights from the broader DS community.

    Content

    2019 Novel Coronavirus (2019-nCoV) is a virus (more specifically, a coronavirus) identified as the cause of an outbreak of respiratory illness first detected in Wuhan, China. Early on, many of the patients in the outbreak in Wuhan, China reportedly had some link to a large seafood and animal market, suggesting animal-to-person spread. However, a growing number of patients reportedly have not had exposure to animal markets, indicating person-to-person spread is occurring. At this time, it’s unclear how easily or sustainably this virus is spreading between people - CDC

    This dataset has daily level information on the number of affected cases, deaths and recovery from 2019 novel coronavirus. Please note that this is a time series data and so the number of cases on any given day is the cumulative number.

    The data is available from 22 Jan, 2020.

    Here’s a polished version suitable for a professional Kaggle dataset description:

    Dataset Description

    This dataset contains time-series and case-level records of the COVID-19 pandemic. The primary file is covid_19_data.csv, with supporting files for earlier records and individual-level line list data.

    Files and Columns

    1. covid_19_data.csv (Main File)

    This is the primary dataset and contains aggregated COVID-19 statistics by location and date.

    • Sno – Serial number of the record
    • ObservationDate – Date of the observation (MM/DD/YYYY)
    • Province/State – Province or state of the observation (may be missing for some entries)
    • Country/Region – Country of the observation
    • Last Update – Timestamp (UTC) when the record was last updated (not standardized, requires cleaning before use)
    • Confirmed – Cumulative number of confirmed cases on that date
    • Deaths – Cumulative number of deaths on that date
    • Recovered – Cumulative number of recoveries on that date

    2. 2019_ncov_data.csv (Legacy File)

    This file contains earlier COVID-19 records. It is no longer updated and is provided only for historical reference. For current analysis, please use covid_19_data.csv.

    3. COVID_open_line_list_data.csv

    This file provides individual-level case information, obtained from an open data source. It includes patient demographics, travel history, and case outcomes.

    4. COVID19_line_list_data.csv

    Another individual-level case dataset, also obtained from public sources, with detailed patient-level information useful for micro-level epidemiological analysis.

    ✅ Use covid_19_data.csv for up-to-date aggregated global trends.

    ✅ Use the line list datasets for detailed, individual-level case analysis.

    Country level datasets:

    If you are interested in knowing country level data, please refer to the following Kaggle datasets:

    India - https://www.kaggle.com/sudalairajkumar/covid19-in-india

    South Korea - https://www.kaggle.com/kimjihoo/coronavirusdataset

    Italy - https://www.kaggle.com/sudalairajkumar/covid19-in-italy

    Brazil - https://www.kaggle.com/unanimad/corona-virus-brazil

    USA - https://www.kaggle.com/sudalairajkumar/covid19-in-usa

    Switzerland - https://www.kaggle.com/daenuprobst/covid19-cases-switzerland

    Indonesia - https://www.kaggle.com/ardisragen/indonesia-coronavirus-cases

    Acknowledgements :

    Johns Hopkins University for making the data available for educational and academic research purposes

    MoBS lab - https://www.mobs-lab.org/2019ncov.html

    World Health Organization (WHO): https://www.who.int/

    DXY.cn. Pneumonia. 2020. http://3g.dxy.cn/newh5/view/pneumonia.

    BNO News: https://bnonews.com/index.php/2020/02/the-latest-coronavirus-cases/

    National Health Commission of the People’s Republic of China (NHC): http://www.nhc.gov.cn/xcs/yqtb/list_gzbd.shtml

    China CDC (CCDC): http://weekly.chinacdc.cn/news/TrackingtheEpidemic.htm

    Hong Kong Department of Health: https://www.chp.gov.hk/en/features/102465.html

    Macau Government: https://www.ssm.gov.mo/portal/

    Taiwan CDC: https://sites.google....

  15. IRS 990

    • console.cloud.google.com
    Updated Nov 23, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    https://console.cloud.google.com/marketplace/browse?filter=partner:U.S.%20Internal%20Revenue%20Service&hl=pt (2023). IRS 990 [Dataset]. https://console.cloud.google.com/marketplace/product/internal-revenue-service/irs-990?hl=pt
    Explore at:
    Dataset updated
    Nov 23, 2023
    Dataset provided by
    Googlehttp://google.com/
    Description

    Form 990 is used by the United States Internal Revenue Service to gather financial information about nonprofit/exempt organizations. This BigQuery dataset can be used to perform research and analysis of organizations that have electronically filed Forms 990, 990-EZ and 990-PF. For a complete description of data variables available in this dataset, see the IRS’s extract documentation . This public dataset is hosted in Google BigQuery and is included in BigQuery's 1TB/mo of free tier processing. This means that each user receives 1TB of free BigQuery processing every month, which can be used to run queries on this public dataset. Watch this short video to learn how to get started quickly using BigQuery to access public datasets. What is BigQuery .

  16. g

    Economic Indicators

    • health.google.com
    Updated Oct 7, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2021). Economic Indicators [Dataset]. https://health.google.com/covid-19/open-data/raw-data
    Explore at:
    Dataset updated
    Oct 7, 2021
    Variables measured
    gdp, key, gdp_per_capita, human_capital_index
    Description

    Various economic indicators.

  17. Data_Sheet_1_Public interest in different types of masks and its...

    • frontiersin.figshare.com
    docx
    Updated Jun 8, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Andy Wai Kan Yeung; Emil D. Parvanov; Jarosław Olav Horbańczuk; Maria Kletecka-Pulker; Oliver Kimberger; Harald Willschke; Atanas G. Atanasov (2023). Data_Sheet_1_Public interest in different types of masks and its relationship with pandemic and policy measures during the COVID-19 pandemic: a study using Google Trends data.docx [Dataset]. http://doi.org/10.3389/fpubh.2023.1010674.s001
    Explore at:
    docxAvailable download formats
    Dataset updated
    Jun 8, 2023
    Dataset provided by
    Frontiers Mediahttp://www.frontiersin.org/
    Authors
    Andy Wai Kan Yeung; Emil D. Parvanov; Jarosław Olav Horbańczuk; Maria Kletecka-Pulker; Oliver Kimberger; Harald Willschke; Atanas G. Atanasov
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Google Trends data have been used to investigate various themes on online information seeking. It was unclear if the population from different parts of the world shared the same amount of attention to different mask types during the COVID-19 pandemic. This study aimed to reveal which types of masks were frequently searched by the public in different countries, and evaluated if public attention to masks could be related to mandatory policy, stringency of the policy, and transmission rate of COVID-19. By referring to an open dataset hosted at the online database Our World in Data, the 10 countries with the highest total number of COVID-19 cases as of 9th of February 2022 were identified. For each of these countries, the weekly new cases per million population, reproduction rate (of COVID-19), stringency index, and face covering policy score were computed from the raw daily data. Google Trends were queried to extract the relative search volume (RSV) for different types of masks from each of these countries. Results found that Google searches for N95 masks were predominant in India, whereas surgical masks were predominant in Russia, FFP2 masks were predominant in Spain, and cloth masks were predominant in both France and United Kingdom. The United States, Brazil, Germany, and Turkey had two predominant types of mask. The online searching behavior for masks markedly varied across countries. For most of the surveyed countries, the online searching for masks peaked during the first wave of COVID-19 pandemic before the government implemented mandatory mask wearing. The search for masks positively correlated with the government response stringency index but not with the COVID-19 reproduction rate or the new cases per million.

  18. C

    Cloud-Based Data Analytics Platform Report

    • datainsightsmarket.com
    doc, pdf, ppt
    Updated Oct 25, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Data Insights Market (2025). Cloud-Based Data Analytics Platform Report [Dataset]. https://www.datainsightsmarket.com/reports/cloud-based-data-analytics-platform-499252
    Explore at:
    ppt, pdf, docAvailable download formats
    Dataset updated
    Oct 25, 2025
    Dataset authored and provided by
    Data Insights Market
    License

    https://www.datainsightsmarket.com/privacy-policyhttps://www.datainsightsmarket.com/privacy-policy

    Time period covered
    2025 - 2033
    Area covered
    Global
    Variables measured
    Market Size
    Description

    The global Cloud-Based Data Analytics Platform market is poised for significant expansion, projected to reach a substantial market size of $150 billion by 2025, exhibiting a robust Compound Annual Growth Rate (CAGR) of 18% throughout the forecast period of 2025-2033. This impressive growth trajectory is fueled by an increasing reliance on data-driven decision-making across all industries. Key drivers include the escalating volume and complexity of data, the growing demand for real-time insights to gain a competitive edge, and the inherent scalability and cost-effectiveness offered by cloud platforms compared to on-premise solutions. Businesses are increasingly leveraging these platforms to extract actionable intelligence from their data, enabling them to optimize operations, enhance customer experiences, and identify new revenue streams. The democratization of data analytics tools, with user-friendly interfaces and advanced AI/ML capabilities, is further accelerating adoption among small and medium-sized enterprises, broadening the market's reach and impact. The market landscape is characterized by a dynamic interplay of technological advancements and evolving business needs. Major trends include the proliferation of hybrid and multi-cloud strategies, offering organizations greater flexibility and control over their data. Advancements in AI and machine learning are deeply integrated into these platforms, enabling more sophisticated predictive analytics, natural language processing for query simplification, and automated insights. The emphasis on data governance, security, and compliance in cloud environments is also a critical consideration, with vendors investing heavily in robust security features. While the market experiences immense growth, potential restraints such as data privacy concerns, vendor lock-in anxieties, and the need for skilled personnel to manage and interpret complex data sets present challenges. However, the overwhelming benefits of enhanced agility, improved collaboration, and reduced IT infrastructure costs continue to drive strong market momentum, with platforms like those offered by industry leaders such as Amazon, Google, Microsoft, and Snowflake dominating the competitive arena. This comprehensive report provides an in-depth analysis of the global Cloud-Based Data Analytics Platform market, forecasting its trajectory from 2019 to 2033, with a base year of 2025. The study delves into the market's intricate dynamics, exploring its growth drivers, challenges, and emerging trends, while also providing valuable insights into its competitive landscape and key regional contributions. The estimated market size is expected to reach $XX million by 2025, with significant growth projected during the forecast period.

  19. Speedtest Open Data - Australia 2020-04-01 extract

    • figshare.com
    txt
    Updated Oct 24, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Richard Ferrers; Speedtest Global Index (2025). Speedtest Open Data - Australia 2020-04-01 extract [Dataset]. http://doi.org/10.6084/m9.figshare.13370504.v3
    Explore at:
    txtAvailable download formats
    Dataset updated
    Oct 24, 2025
    Dataset provided by
    Figsharehttp://figshare.com/
    figshare
    Authors
    Richard Ferrers; Speedtest Global Index
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Australia
    Description

    This is an Australian extract of Speedtest Open data available at Amazon WS (link below - opendata.aws).AWS data licence is "CC BY-NC-SA 4.0", so use of this data must be:- non-commercial (NC)- reuse must be share-alike (SA)(add same licence).This restricts the standard CC-BY Figshare licence.A world speedtest open data was dowloaded (>400Mb, 7M lines of data). An extract of Australia's location (lat, long) revealed 88,000 lines of data (attached as csv).A Jupyter notebook of extract process is attached.A link to Twitter thread of outputs provided.A link to Data tutorial provided (GitHub), including Jupyter Notebook to analyse World Speedtest data, selecting one US State.Data Shows:- 3.1M speedtests- 762,000 devices- 88,000 grid locations (600m * 600m), summarised as a point- average speed 33.7Mbps (down), 12.4M (up)- Max speed 724Mbps- data is for 600m * 600m grids, showing average speed up/down, number of tests, and number of users (IP). Added centroid, and now lat/long.See tweet of image of centroids also attached.Versions:v3 - Add super fast data (>100Mbps) less than 1% of data - 697 lines. Includes png of superfast.plot(). Link below to Google Maps version of superfast data points. Also Google map of first 100 data points - sample data. Geojson format for loading into GeoPandas, per Jupyter Notebook. New version of Jupyter Notebook, v.1.1.v2 - add centroids image.v1 - initial data load.

  20. n

    Build better LibGuides: A dataset of Political Science, Public Affairs, and...

    • data.niaid.nih.gov
    • search.dataone.org
    • +1more
    zip
    Updated May 30, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Annelise Sklar (2024). Build better LibGuides: A dataset of Political Science, Public Affairs, and International Studies LibGuides [Dataset]. http://doi.org/10.5061/dryad.prr4xgxvk
    Explore at:
    zipAvailable download formats
    Dataset updated
    May 30, 2024
    Dataset provided by
    University of California, San Diego
    Authors
    Annelise Sklar
    License

    https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html

    Description

    The dataset that accompanies the "Build Better LibGuides" chapter of Teaching Information Literacy in Political Science, Public Affairs, and International Studies. This dataset was created to compare current practices in Political Science, Public Affairs, and International Studies (PSPAIS) LibGuides with recommended best practices using a sample that represents a variety of academic institutions. Members of the ACRL Politics, Policy, and International Relations Section (PPIRS) were identified as the librarians most likely to be actively engaged with these specific subjects, so the dataset was scoped by identifying the institutions associated with the most active PPIRS members and then locating the LibGuides in these and related disciplines. The resulting dataset includes 101 guides at 46 institutions, for a total of 887 LibGuide tabs. Methods This dataset was created to compare current practices in Political Science, Public Affairs, and International Studies (PSPAIS) LibGuides with recommended best practices using a sample that represents a variety of academic institutions. Members of the ACRL Politics, Policy, and International Relations Section (PPIRS) were identified as the librarians most likely to be actively engaged with these specific subjects, so the dataset was scoped by identifying the institutions associated with the most active PPIRS members and then locating the LibGuides in these and related disciplines. Specifically, a student assistant collected the names and institutional affiliations of each member serving on a PPIRS committee as of July 1, 2021, 2022, and 2023. The student then removed the individual librarian names from the list and located the links to the Political Science or Government; Public Policy, Public Affairs, or Public Administration; and International Studies or International Relations LibGuides at each institution. The chapter author then confirmed and, in a few cases, added to the student's work and copied and pasted the tab names from each guide (which conveniently were also hyperlinked) into a Google Sheet. The resulting dataset included 101 guides at 46 institutions, for a total of 887 LibGuide tabs. A Google Apps script was used to extract the hyperlinks from the collected tab names and then a Python script was used to scrape the names of links included on each of the tabs. LibGuides from two institutions returned errors during the link name scraping process and were excluded in this part of the analysis.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
APISCRAPY (2022). Google Map Data, Google Map Data Scraper, Business location Data- Scrape All Publicly Available Data From Google Map & Other Platforms [Dataset]. https://apiscrapy.mydatastorefront.com/products/google-map-data-google-map-data-scraper-business-location-d-apiscrapy

Google Map Data, Google Map Data Scraper, Business location Data- Scrape All Publicly Available Data From Google Map & Other Platforms

Explore at:
Dataset updated
May 23, 2022
Dataset authored and provided by
APISCRAPY
Area covered
Greece, Moldova, Iceland, Lithuania, Latvia, United States Minor Outlying Islands, Romania, Luxembourg, Germany, Liechtenstein
Description

Explore APISCRAPY, your AI-powered Google Map Data Scraper. Easily extract Business Location Data from Google Maps and other platforms. Seamlessly access and utilize publicly available map data for your business needs. Scrape All Publicly Available Data From Google Maps & Other Platforms.

Search
Clear search
Close search
Google apps
Main menu