100+ datasets found
  1. Google Ads Transparency Center

    • console.cloud.google.com
    Updated Sep 6, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    https://console.cloud.google.com/marketplace/browse?filter=partner:BigQuery%20Public%20Data&hl=de (2023). Google Ads Transparency Center [Dataset]. https://console.cloud.google.com/marketplace/product/bigquery-public-data/google-ads-transparency-center?hl=de
    Explore at:
    Dataset updated
    Sep 6, 2023
    Dataset provided by
    Googlehttp://google.com/
    BigQueryhttps://cloud.google.com/bigquery
    Description

    This dataset contains two tables: creative_stats and removed_creative_stats. The creative_stats table contains information about advertisers that served ads in the European Economic Area or Turkey: their legal name, verification status, disclosed name, and location. It also includes ad specific information: impression ranges per region (including aggregate impressions for the European Economic Area), first shown and last shown dates, which criteria were used in audience selection, the format of the ad, the ad topic and whether the ad is funded by Google Ad Grants program. A link to the ad in the Google Ads Transparency Center is also provided. The removed_creative_stats table contains information about ads that served in the European Economic Area that Google removed: where and why they were removed and per-region information on when they served. The removed_creative_stats table also contains a link to the Google Ads Transparency Center for the removed ad. Data for both tables updates periodically and may be delayed from what appears on the Google Ads Transparency Center website. About BigQuery This data is hosted in Google BigQuery for users to easily query using SQL. Note that to use BigQuery, users must have a Google account and create a GCP project. This public dataset is included in BigQuery's 1TB/mo of free tier processing. Each user receives 1TB of free BigQuery processing every month, which can be used to run queries on this public dataset. Watch this short video to learn how to get started quickly using BigQuery to access public datasets. What is BigQuery . Download Dataset This public dataset is also hosted in Google Cloud Storage here and available free to use. Use this quick start guide to quickly learn how to access public datasets on Google Cloud Storage. We provide the raw data in JSON format, sharded across multiple files to support easier download of the large dataset. A README file which describes the data structure and our Terms of Service (also listed below) is included with the dataset. You can also download the results from a custom query. See here for options and instructions. Signed out users can download the full dataset by using the gCloud CLI. Follow the instructions here to download and install the gCloud CLI. To remove the login requirement, run "$ gcloud config set auth/disable_credentials True" To download the dataset, run "$ gcloud storage cp gs://ads-transparency-center/* . -R" This public dataset is hosted in Google BigQuery and is included in BigQuery's 1TB/mo of free tier processing. This means that each user receives 1TB of free BigQuery processing every month, which can be used to run queries on this public dataset. Watch this short video to learn how to get started quickly using BigQuery to access public datasets. What is BigQuery .

  2. Data from: Google Analytics & Twitter dataset from a movies, TV series and...

    • figshare.com
    • portalcientificovalencia.univeuropea.com
    txt
    Updated Feb 7, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Víctor Yeste (2024). Google Analytics & Twitter dataset from a movies, TV series and videogames website [Dataset]. http://doi.org/10.6084/m9.figshare.16553061.v4
    Explore at:
    txtAvailable download formats
    Dataset updated
    Feb 7, 2024
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Víctor Yeste
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Author: Víctor Yeste. Universitat Politècnica de Valencia.The object of this study is the design of a cybermetric methodology whose objectives are to measure the success of the content published in online media and the possible prediction of the selected success variables.In this case, due to the need to integrate data from two separate areas, such as web publishing and the analysis of their shares and related topics on Twitter, has opted for programming as you access both the Google Analytics v4 reporting API and Twitter Standard API, always respecting the limits of these.The website analyzed is hellofriki.com. It is an online media whose primary intention is to solve the need for information on some topics that provide daily a vast number of news in the form of news, as well as the possibility of analysis, reports, interviews, and many other information formats. All these contents are under the scope of the sections of cinema, series, video games, literature, and comics.This dataset has contributed to the elaboration of the PhD Thesis:Yeste Moreno, VM. (2021). Diseño de una metodología cibermétrica de cálculo del éxito para la optimización de contenidos web [Tesis doctoral]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/176009Data have been obtained from each last-minute news article published online according to the indicators described in the doctoral thesis. All related data are stored in a database, divided into the following tables:tesis_followers: User ID list of media account followers.tesis_hometimeline: data from tweets posted by the media account sharing breaking news from the web.status_id: Tweet IDcreated_at: date of publicationtext: content of the tweetpath: URL extracted after processing the shortened URL in textpost_shared: Article ID in WordPress that is being sharedretweet_count: number of retweetsfavorite_count: number of favoritestesis_hometimeline_other: data from tweets posted by the media account that do not share breaking news from the web. Other typologies, automatic Facebook shares, custom tweets without link to an article, etc. With the same fields as tesis_hometimeline.tesis_posts: data of articles published by the web and processed for some analysis.stats_id: Analysis IDpost_id: Article ID in WordPresspost_date: article publication date in WordPresspost_title: title of the articlepath: URL of the article in the middle webtags: Tags ID or WordPress tags related to the articleuniquepageviews: unique page viewsentrancerate: input ratioavgtimeonpage: average visit timeexitrate: output ratiopageviewspersession: page views per sessionadsense_adunitsviewed: number of ads viewed by usersadsense_viewableimpressionpercent: ad display ratioadsense_ctr: ad click ratioadsense_ecpm: estimated ad revenue per 1000 page viewstesis_stats: data from a particular analysis, performed at each published breaking news item. Fields with statistical values can be computed from the data in the other tables, but total and average calculations are saved for faster and easier further processing.id: ID of the analysisphase: phase of the thesis in which analysis has been carried out (right now all are 1)time: "0" if at the time of publication, "1" if 14 days laterstart_date: date and time of measurement on the day of publicationend_date: date and time when the measurement is made 14 days latermain_post_id: ID of the published article to be analysedmain_post_theme: Main section of the published article to analyzesuperheroes_theme: "1" if about superheroes, "0" if nottrailer_theme: "1" if trailer, "0" if notname: empty field, possibility to add a custom name manuallynotes: empty field, possibility to add personalized notes manually, as if some tag has been removed manually for being considered too generic, despite the fact that the editor put itnum_articles: number of articles analysednum_articles_with_traffic: number of articles analysed with traffic (which will be taken into account for traffic analysis)num_articles_with_tw_data: number of articles with data from when they were shared on the media’s Twitter accountnum_terms: number of terms analyzeduniquepageviews_total: total page viewsuniquepageviews_mean: average page viewsentrancerate_mean: average input ratioavgtimeonpage_mean: average duration of visitsexitrate_mean: average output ratiopageviewspersession_mean: average page views per sessiontotal: total of ads viewedadsense_adunitsviewed_mean: average of ads viewedadsense_viewableimpressionpercent_mean: average ad display ratioadsense_ctr_mean: average ad click ratioadsense_ecpm_mean: estimated ad revenue per 1000 page viewsTotal: total incomeretweet_count_mean: average incomefavorite_count_total: total of favoritesfavorite_count_mean: average of favoritesterms_ini_num_tweets: total tweets on the terms on the day of publicationterms_ini_retweet_count_total: total retweets on the terms on the day of publicationterms_ini_retweet_count_mean: average retweets on the terms on the day of publicationterms_ini_favorite_count_total: total of favorites on the terms on the day of publicationterms_ini_favorite_count_mean: average of favorites on the terms on the day of publicationterms_ini_followers_talking_rate: ratio of followers of the media Twitter account who have recently published a tweet talking about the terms on the day of publicationterms_ini_user_num_followers_mean: average followers of users who have spoken of the terms on the day of publicationterms_ini_user_num_tweets_mean: average number of tweets published by users who spoke about the terms on the day of publicationterms_ini_user_age_mean: average age in days of users who have spoken of the terms on the day of publicationterms_ini_ur_inclusion_rate: URL inclusion ratio of tweets talking about terms on the day of publicationterms_end_num_tweets: total tweets on terms 14 days after publicationterms_ini_retweet_count_total: total retweets on terms 14 days after publicationterms_ini_retweet_count_mean: average retweets on terms 14 days after publicationterms_ini_favorite_count_total: total bookmarks on terms 14 days after publicationterms_ini_favorite_count_mean: average of favorites on terms 14 days after publicationterms_ini_followers_talking_rate: ratio of media Twitter account followers who have recently posted a tweet talking about the terms 14 days after publicationterms_ini_user_num_followers_mean: average followers of users who have spoken of the terms 14 days after publicationterms_ini_user_num_tweets_mean: average number of tweets published by users who have spoken about the terms 14 days after publicationterms_ini_user_age_mean: the average age in days of users who have spoken of the terms 14 days after publicationterms_ini_ur_inclusion_rate: URL inclusion ratio of tweets talking about terms 14 days after publication.tesis_terms: data of the terms (tags) related to the processed articles.stats_id: Analysis IDtime: "0" if at the time of publication, "1" if 14 days laterterm_id: Term ID (tag) in WordPressname: Name of the termslug: URL of the termnum_tweets: number of tweetsretweet_count_total: total retweetsretweet_count_mean: average retweetsfavorite_count_total: total of favoritesfavorite_count_mean: average of favoritesfollowers_talking_rate: ratio of followers of the media Twitter account who have recently published a tweet talking about the termuser_num_followers_mean: average followers of users who were talking about the termuser_num_tweets_mean: average number of tweets published by users who were talking about the termuser_age_mean: average age in days of users who were talking about the termurl_inclusion_rate: URL inclusion ratio

  3. Google Trends

    • console.cloud.google.com
    Updated May 15, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    https://console.cloud.google.com/marketplace/browse?filter=partner:BigQuery%20Public%20Datasets%20Program&hl=it (2022). Google Trends [Dataset]. https://console.cloud.google.com/marketplace/product/bigquery-public-datasets/google-search-trends?hl=it
    Explore at:
    Dataset updated
    May 15, 2022
    Dataset provided by
    Google Searchhttp://google.com/
    Googlehttp://google.com/
    BigQueryhttps://cloud.google.com/bigquery
    Description

    The Google Trends dataset will provide critical signals that individual users and businesses alike can leverage to make better data-driven decisions. This dataset simplifies the manual interaction with the existing Google Trends UI by automating and exposing anonymized, aggregated, and indexed search data in BigQuery. This dataset includes the Top 25 stories and Top 25 Rising queries from Google Trends. It will be made available as two separate BigQuery tables, with a set of new top terms appended daily. Each set of Top 25 and Top 25 rising expires after 30 days, and will be accompanied by a rolling five-year window of historical data in 210 distinct locations in the United States. This Google dataset is hosted in Google BigQuery as part of Google Cloud's Datasets solution and is included in BigQuery's 1TB/mo of free tier processing. This means that each user receives 1TB of free BigQuery processing every month, which can be used to run queries on this public dataset. Watch this short video to learn how to get started quickly using BigQuery to access public datasets. What is BigQuery

  4. d

    Dataplex: Google Reviews & Ratings Dataset | Track Consumer Sentiment &...

    • datarade.ai
    .json, .csv
    Updated Feb 3, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dataplex (2025). Dataplex: Google Reviews & Ratings Dataset | Track Consumer Sentiment & Location-Based Insights [Dataset]. https://datarade.ai/data-products/dataplex-google-reviews-ratings-dataset-track-consumer-s-dataplex
    Explore at:
    .json, .csvAvailable download formats
    Dataset updated
    Feb 3, 2025
    Dataset authored and provided by
    Dataplex
    Area covered
    Grenada, Ethiopia, Guinea, British Indian Ocean Territory, South Georgia and the South Sandwich Islands, Bhutan, French Polynesia, Palau, Korea (Democratic People's Republic of), Sweden
    Description

    The Google Reviews & Ratings Dataset provides businesses with structured insights into customer sentiment, satisfaction, and trends based on reviews from Google. Unlike broad review datasets, this product is location-specific—businesses provide the locations they want to track, and we retrieve as much historical data as possible, with daily updates moving forward.

    This dataset enables businesses to monitor brand reputation, analyze consumer feedback, and enhance decision-making with real-world insights. For deeper analysis, optional AI-driven sentiment analysis and review summaries are available on a weekly, monthly, or yearly basis.

    Dataset Highlights

    • Location-Specific Reviews – Reviews and ratings for the locations you provide.
    • Daily Updates – New reviews and rating changes updated automatically.
    • Historical Data Access – Retrieve past reviews where available.
    • AI Sentiment Analysis (Optional) – Summarized insights by week, month, or year.
    • Competitive Benchmarking – Compare performance across selected locations.

    Use Cases

    • Franchise & Retail Chains – Monitor brand reputation and performance across locations.
    • Hospitality & Restaurants – Track guest sentiment and service trends.
    • Healthcare & Medical Facilities – Understand patient feedback for specific locations.
    • Real Estate & Property Management – Analyze tenant and customer experiences through reviews.
    • Market Research & Consumer Insights – Identify trends and analyze feedback patterns across industries.

    Data Updates & Delivery

    • Update Frequency: Daily
    • Data Format: CSV for easy integration
    • Delivery: Secure file transfer (SFTP or cloud storage)

    Data Fields Include:

    • Business Name
    • Location Details
    • Star Ratings
    • Review Text
    • Timestamps
    • Reviewer Metadata

    Optional Add-Ons:

    • AI Sentiment Analysis – Aggregate trends by week, month, or year.
    • Custom Location Tracking – Tailor the dataset to fit your specific business needs.

    Ideal for

    • Marketing Teams – Leverage real-world consumer feedback to optimize brand strategy.
    • Business Analysts – Use structured review data to track customer sentiment over time.
    • Operations & Customer Experience Teams – Identify service issues and opportunities for improvement.
    • Competitive Intelligence – Compare locations and benchmark against industry competitors.

    Why Choose This Dataset?

    • Accurate & Up-to-Date – Daily updates ensure fresh, reliable data.
    • Scalable & Customizable – Track only the locations that matter to you.
    • Actionable Insights – AI-driven summaries for quick decision-making.
    • Easy Integration – Delivered in a structured format for seamless analysis.

    By leveraging Google Reviews & Ratings Data, businesses can gain valuable insights into customer sentiment, enhance reputation management, and stay ahead of the competition.

  5. Google Landmarks Dataset v2

    • github.com
    • opendatalab.com
    Updated Sep 27, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Google (2019). Google Landmarks Dataset v2 [Dataset]. https://github.com/cvdfoundation/google-landmark
    Explore at:
    Dataset updated
    Sep 27, 2019
    Dataset provided by
    Googlehttp://google.com/
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This is the second version of the Google Landmarks dataset (GLDv2), which contains images annotated with labels representing human-made and natural landmarks. The dataset can be used for landmark recognition and retrieval experiments. This version of the dataset contains approximately 5 million images, split into 3 sets of images: train, index and test. The dataset was presented in our CVPR'20 paper. In this repository, we present download links for all dataset files and relevant code for metric computation. This dataset was associated to two Kaggle challenges, on landmark recognition and landmark retrieval. Results were discussed as part of a CVPR'19 workshop. In this repository, we also provide scores for the top 10 teams in the challenges, based on the latest ground-truth version. Please visit the challenge and workshop webpages for more details on the data, tasks and technical solutions from top teams.

  6. AI Financial Market Data

    • kaggle.com
    Updated Aug 6, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Data Science Lovers (2025). AI Financial Market Data [Dataset]. https://www.kaggle.com/datasets/rohitgrewal/ai-financial-and-market-data
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 6, 2025
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Data Science Lovers
    License

    http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/

    Description

    📹Project Video available on YouTube - https://youtu.be/WmJYHz_qn5s

    Realistic Synthetic - AI Financial & Market Data for Gemini(Google), ChatGPT(OpenAI), Llama(Meta)

    This dataset provides a synthetic, daily record of financial market activities related to companies involved in Artificial Intelligence (AI). There are key financial metrics and events that could influence a company's stock performance like launch of Llama by Meta, launch of GPT by OpenAI, launch of Gemini by Google etc. Here, we have the data about how much amount the companies are spending on R & D of their AI's Products & Services, and how much revenue these companies are generating. The data is from January 1, 2015, to December 31, 2024, and includes information for various companies : OpenAI, Google and Meta.

    This data is available as a CSV file. We are going to analyze this data set using the Pandas DataFrame.

    This analyse will be helpful for those working in Finance or Share Market domain.

    From this dataset, we extract various insights using Python in our Project.

    1) How much amount the companies spent on R & D ?

    2) Revenue Earned by the companies

    3) Date-wise Impact on the Stock

    4) Events when Maximum Stock Impact was observed

    5) AI Revenue Growth of the companies

    6) Correlation between the columns

    7) Expenditure vs Revenue year-by-year

    8) Event Impact Analysis

    9) Change in the index wrt Year & Company

    These are the main Features/Columns available in the dataset :

    1) Date: This column indicates the specific calendar day for which the financial and AI-related data is recorded. It allows for time-series analysis of the trends and impacts.

    2) Company: This column specifies the name of the company to which the data in that particular row belongs. Examples include "OpenAI" and "Meta".

    3) R&D_Spending_USD_Mn: This column represents the Research and Development (R&D) spending of the company, measured in Millions of USD. It serves as an indicator of a company's investment in innovation and future growth, particularly in the AI sector.

    4) AI_Revenue_USD_Mn: This column denotes the revenue generated specifically from AI-related products or services, also measured in Millions of USD. This metric highlights the direct financial success derived from AI initiatives.

    5) AI_Revenue_Growth_%: This column shows the percentage growth of AI-related revenue for the company on a daily basis. It indicates the pace at which a company's AI business is expanding or contracting.

    6) Event: This column captures any significant events or announcements made by the company that could potentially influence its financial performance or market perception. Examples include "Cloud AI launch," "AI partnership deal," "AI ethics policy update," and "AI speech recognition release." These events are crucial for understanding sudden shifts in stock impact.

    7) Stock_Impact_%: This column quantifies the percentage change in the company's stock price on a given day, likely in response to the recorded financial metrics or events. It serves as a direct measure of market reaction.

  7. o

    How to make google plus posts private - Dataset - openAFRICA

    • open.africa
    Updated Jan 4, 2018
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2018). How to make google plus posts private - Dataset - openAFRICA [Dataset]. https://open.africa/dataset/how-to-make-google-plus-posts-private
    Explore at:
    Dataset updated
    Jan 4, 2018
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    so if you have to have a G+ account (for YouTube, location services, or other reasons) - here's how you can make it totally private! No one will be able to add you, send you spammy links, or otherwise annoy you. You need to visit the "Audience Settings" page - https://plus.google.com/u/0/settings/audience You can then set a "custom audience" - usually you would use this to restrict your account to people from a specific geographic location, or within a specific age range. In this case, we're going to choose a custom audience of "No-one" Check the box and hit save. Now, when people try to visit your Google+ profile - they'll see this "restricted" message. You can visit my G+ Profile if you want to see this working. (https://plus.google.com/114725651137252000986) If you are not able to understand you can follow this website : http://www.livehuntz.com/google-plus/support-phone-number

  8. Google Analytics Sample

    • console.cloud.google.com
    Updated Jul 15, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    https://console.cloud.google.com/marketplace/browse?filter=partner:Obfuscated%20Google%20Analytics%20360%20data&hl=de (2017). Google Analytics Sample [Dataset]. https://console.cloud.google.com/marketplace/product/obfuscated-ga360-data/obfuscated-ga360-data?hl=de
    Explore at:
    Dataset updated
    Jul 15, 2017
    Dataset provided by
    Googlehttp://google.com/
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    The dataset provides 12 months (August 2016 to August 2017) of obfuscated Google Analytics 360 data from the Google Merchandise Store , a real ecommerce store that sells Google-branded merchandise, in BigQuery. It’s a great way analyze business data and learn the benefits of using BigQuery to analyze Analytics 360 data Learn more about the data The data includes The data is typical of what an ecommerce website would see and includes the following information:Traffic source data: information about where website visitors originate, including data about organic traffic, paid search traffic, and display trafficContent data: information about the behavior of users on the site, such as URLs of pages that visitors look at, how they interact with content, etc. Transactional data: information about the transactions on the Google Merchandise Store website.Limitations: All users have view access to the dataset. This means you can query the dataset and generate reports but you cannot complete administrative tasks. Data for some fields is obfuscated such as fullVisitorId, or removed such as clientId, adWordsClickInfo and geoNetwork. “Not available in demo dataset” will be returned for STRING values and “null” will be returned for INTEGER values when querying the fields containing no data.This public dataset is hosted in Google BigQuery and is included in BigQuery's 1TB/mo of free tier processing. This means that each user receives 1TB of free BigQuery processing every month, which can be used to run queries on this public dataset. Watch this short video to learn how to get started quickly using BigQuery to access public datasets. What is BigQuery

  9. Google's Audioset: Reformatted

    • zenodo.org
    • data.niaid.nih.gov
    tsv
    Updated Sep 21, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bakhtin; Bakhtin (2022). Google's Audioset: Reformatted [Dataset]. http://doi.org/10.5281/zenodo.7096702
    Explore at:
    tsvAvailable download formats
    Dataset updated
    Sep 21, 2022
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Bakhtin; Bakhtin
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description
    Google's AudioSet consistently reformatted
    
    During my work with Google's AudioSet(https://research.google.com/audioset/index.html)
    I encountered some problems due to the fact that Weak (https://research.google.com/audioset/download.html) and
     Strong (https://research.google.com/audioset/download_strong.html) versions of the dataset used different csv formatting for the data, and that also labels used in the two datasets are different (https://github.com/audioset/ontology/issues/9) and also presented in files with different formatting.
    
    This dataset reformatting aims to unify the formats of the datasets so that it is possible
    to analyse them in the same pipelines, and also make the dataset files compatible
    with psds_eval, dcase_util and sed_eval Python packages used in Audio Processing.
    
    For better formatted documentation and source code of reformatting refer to https://github.com/bakhtos/GoogleAudioSetReformatted 
    
    -Changes in dataset
    
    All files are converted to tab-separated `*.tsv` files (i.e. `csv` files with `\t`
    as a separator). All files have a header as the first line.
    
    -New fields and filenames
    
    Fields are renamed according to the following table, to be compatible with psds_eval:
    
    Old field -> New field
    YTID -> filename
    segment_id -> filename
    start_seconds -> onset
    start_time_seconds -> onset
    end_seconds -> offset
    end_time_seconds -> offset
    positive_labels -> event_label
    label -> event_label
    present -> present
    
    For class label files, `id` is now the name for the for `mid` label (e.g. `/m/09xor`)
    and `label` for the human-readable label (e.g. `Speech`). Index of label indicated
    for Weak dataset labels (`index` field in `class_labels_indices.csv`) is not used.
    
    Files are renamed according to the following table to ensure consisted naming
    of the form `audioset_[weak|strong]_[train|eval]_[balanced|unbalanced|posneg]*.tsv`:
    
    Old name -> New name
    balanced_train_segments.csv -> audioset_weak_train_balanced.tsv
    unbalanced_train_segments.csv -> audioset_weak_train_unbalanced.tsv
    eval_segments.csv -> audioset_weak_eval.tsv
    audioset_train_strong.tsv -> audioset_strong_train.tsv
    audioset_eval_strong.tsv -> audioset_strong_eval.tsv
    audioset_eval_strong_framed_posneg.tsv -> audioset_strong_eval_posneg.tsv
    class_labels_indices.csv -> class_labels.tsv (merged with mid_to_display_name.tsv)
    mid_to_display_name.tsv -> class_labels.tsv (merged with class_labels_indices.csv)
    
    -Strong dataset changes
    
    Only changes to the Strong dataset are renaming of fields and reordering of columns,
    so that both Weak and Strong version have `filename` and `event_label` as first 
    two columns.
    
    -Weak dataset changes
    
    -- Labels are given one per line, instead of comma-separated and quoted list
    
    -- To make sure that `filename` format is the same as in Strong version, the following
    format change is made:
    The value of the `start_seconds` field is converted to milliseconds and appended to the `filename` with an underscore. Since all files in the dataset are assumed to be 10 seconds long, this unifies the format of `filename` with the Strong version and makes `end_seconds` also redundant.
    
    -Class labels changes
    
    Class labels from both datasets are merged into one file and given in alphabetical order of `id`s. Since same `id`s are present in both datasets, but sometimes with different human-readable labels, labels from Strong dataset overwrite those from Weak. It is possible to regenerate `class_labels.tsv` while giving priority to the Weak version of labels by calling `convert_labels(False)` from convert.py in the GitHub repository.
    
    -License
    
    Google's AudioSet was published in two stages - first the Weakly labelled data (Gemmeke, Jort F., et al. "Audio set: An ontology and human-labeled dataset for audio events." 2017 IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE, 2017.), then the strongly labelled data (Hershey, Shawn, et al. "The benefit of temporally-strong labels in audio event classification." ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2021.)
    
    Both the original dataset and this reworked version are licensed under [CC BY 4.0](https://creativecommons.org/licenses/by/4.0/)
    

    Class labels come from the AudioSet Ontology, which is licensed under CC BY-SA 4.0.

  10. Google energy consumption 2011-2023

    • statista.com
    Updated Oct 11, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2024). Google energy consumption 2011-2023 [Dataset]. https://www.statista.com/statistics/788540/energy-consumption-of-google/
    Explore at:
    Dataset updated
    Oct 11, 2024
    Dataset authored and provided by
    Statistahttp://statista.com/
    Area covered
    Worldwide
    Description

    Google’s energy consumption has increased over the last few years, reaching 25.9 terawatt hours in 2023, up from 12.8 terawatt hours in 2019. The company has made efforts to make its data centers more efficient through customized high-performance servers, using smart temperature and lighting, advanced cooling techniques, and machine learning. Datacenters and energy Through its operations, Google pursues a more sustainable impact on the environment by creating efficient data centers that use less energy than the average, transitioning towards renewable energy, creating sustainable workplaces, and providing its users with the technological means towards a cleaner future for the future generations. Through its efficient data centers, Google has also managed to divert waste from its operations away from landfills. Reducing Google’s carbon footprint Google’s clean energy efforts is also related to their efforts to reduce their carbon footprint. Since their commitment to using 100 percent renewable energy, the company has met their targets largely through solar and wind energy power purchase agreements and buying renewable power from utilities. Google is one of the largest corporate purchasers of renewable energy in the world.

  11. Project Sunroof

    • console.cloud.google.com
    Updated Aug 15, 2017
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    https://console.cloud.google.com/marketplace/browse?filter=partner:Google%20Project%20Sunroof&hl=fr (2017). Project Sunroof [Dataset]. https://console.cloud.google.com/marketplace/product/project-sunroof/project-sunroof?hl=fr
    Explore at:
    Dataset updated
    Aug 15, 2017
    Dataset provided by
    Googlehttp://google.com/
    Description

    As the price of installing solar has gotten less expensive, more homeowners are turning to it as a possible option for decreasing their energy bill. We want to make installing solar panels easy and understandable for anyone. Project Sunroof puts Google's expansive data in mapping and computing resources to use, helping calculate the best solar plan for you. How does it work? When you enter your address, Project Sunroof looks up your home in Google Maps and combines that information with other databases to create your personalized roof analysis. Don’t worry, Project Sunroof doesn't give the address to anybody else. Learn more about Project Sunroof and see the tool at Project Sunroof’s site . Project Sunroof computes how much sunlight hits roofs in a year, based on shading calculations, typical meteorological data, and estimates of the size and shape of the roofs. You can see more details about how solar viability is determined by checking out methodology here. This public dataset is hosted in Google BigQuery and is included in BigQuery's 1TB/mo of free tier processing. This means that each user receives 1TB of free BigQuery processing every month, which can be used to run queries on this public dataset. Watch this short video to learn how to get started quickly using BigQuery to access public datasets. What is BigQuery .

  12. Meta Kaggle Code

    • kaggle.com
    zip
    Updated Oct 2, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kaggle (2025). Meta Kaggle Code [Dataset]. https://www.kaggle.com/datasets/kaggle/meta-kaggle-code/code
    Explore at:
    zip(159295314323 bytes)Available download formats
    Dataset updated
    Oct 2, 2025
    Dataset authored and provided by
    Kagglehttp://kaggle.com/
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Explore our public notebook content!

    Meta Kaggle Code is an extension to our popular Meta Kaggle dataset. This extension contains all the raw source code from hundreds of thousands of public, Apache 2.0 licensed Python and R notebooks versions on Kaggle used to analyze Datasets, make submissions to Competitions, and more. This represents nearly a decade of data spanning a period of tremendous evolution in the ways ML work is done.

    Why we’re releasing this dataset

    By collecting all of this code created by Kaggle’s community in one dataset, we hope to make it easier for the world to research and share insights about trends in our industry. With the growing significance of AI-assisted development, we expect this data can also be used to fine-tune models for ML-specific code generation tasks.

    Meta Kaggle for Code is also a continuation of our commitment to open data and research. This new dataset is a companion to Meta Kaggle which we originally released in 2016. On top of Meta Kaggle, our community has shared nearly 1,000 public code examples. Research papers written using Meta Kaggle have examined how data scientists collaboratively solve problems, analyzed overfitting in machine learning competitions, compared discussions between Kaggle and Stack Overflow communities, and more.

    The best part is Meta Kaggle enriches Meta Kaggle for Code. By joining the datasets together, you can easily understand which competitions code was run against, the progression tier of the code’s author, how many votes a notebook had, what kinds of comments it received, and much, much more. We hope the new potential for uncovering deep insights into how ML code is written feels just as limitless to you as it does to us!

    Sensitive data

    While we have made an attempt to filter out notebooks containing potentially sensitive information published by Kaggle users, the dataset may still contain such information. Research, publications, applications, etc. relying on this data should only use or report on publicly available, non-sensitive information.

    Joining with Meta Kaggle

    The files contained here are a subset of the KernelVersions in Meta Kaggle. The file names match the ids in the KernelVersions csv file. Whereas Meta Kaggle contains data for all interactive and commit sessions, Meta Kaggle Code contains only data for commit sessions.

    File organization

    The files are organized into a two-level directory structure. Each top level folder contains up to 1 million files, e.g. - folder 123 contains all versions from 123,000,000 to 123,999,999. Each sub folder contains up to 1 thousand files, e.g. - 123/456 contains all versions from 123,456,000 to 123,456,999. In practice, each folder will have many fewer than 1 thousand files due to private and interactive sessions.

    The ipynb files in this dataset hosted on Kaggle do not contain the output cells. If the outputs are required, the full set of ipynbs with the outputs embedded can be obtained from this public GCS bucket: kaggle-meta-kaggle-code-downloads. Note that this is a "requester pays" bucket. This means you will need a GCP account with billing enabled to download. Learn more here: https://cloud.google.com/storage/docs/requester-pays

    Questions / Comments

    We love feedback! Let us know in the Discussion tab.

    Happy Kaggling!

  13. Z

    Data from: Qbias – A Dataset on Media Bias in Search Queries and Query...

    • data.niaid.nih.gov
    • zenodo.org
    Updated Mar 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Haak, Fabian (2023). Qbias – A Dataset on Media Bias in Search Queries and Query Suggestions [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7682914
    Explore at:
    Dataset updated
    Mar 1, 2023
    Dataset provided by
    Haak, Fabian
    Schaer, Philipp
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    We present Qbias, two novel datasets that promote the investigation of bias in online news search as described in

    Fabian Haak and Philipp Schaer. 2023. 𝑄𝑏𝑖𝑎𝑠 - A Dataset on Media Bias in Search Queries and Query Suggestions. In Proceedings of ACM Web Science Conference (WebSci’23). ACM, New York, NY, USA, 6 pages. https://doi.org/10.1145/3578503.3583628.

    Dataset 1: AllSides Balanced News Dataset (allsides_balanced_news_headlines-texts.csv)

    The dataset contains 21,747 news articles collected from AllSides balanced news headline roundups in November 2022 as presented in our publication. The AllSides balanced news feature three expert-selected U.S. news articles from sources of different political views (left, right, center), often featuring spin bias, and slant other forms of non-neutral reporting on political news. All articles are tagged with a bias label by four expert annotators based on the expressed political partisanship, left, right, or neutral. The AllSides balanced news aims to offer multiple political perspectives on important news stories, educate users on biases, and provide multiple viewpoints. Collected data further includes headlines, dates, news texts, topic tags (e.g., "Republican party", "coronavirus", "federal jobs"), and the publishing news outlet. We also include AllSides' neutral description of the topic of the articles. Overall, the dataset contains 10,273 articles tagged as left, 7,222 as right, and 4,252 as center.

    To provide easier access to the most recent and complete version of the dataset for future research, we provide a scraping tool and a regularly updated version of the dataset at https://github.com/irgroup/Qbias. The repository also contains regularly updated more recent versions of the dataset with additional tags (such as the URL to the article). We chose to publish the version used for fine-tuning the models on Zenodo to enable the reproduction of the results of our study.

    Dataset 2: Search Query Suggestions (suggestions.csv)

    The second dataset we provide consists of 671,669 search query suggestions for root queries based on tags of the AllSides biased news dataset. We collected search query suggestions from Google and Bing for the 1,431 topic tags, that have been used for tagging AllSides news at least five times, approximately half of the total number of topics. The topic tags include names, a wide range of political terms, agendas, and topics (e.g., "communism", "libertarian party", "same-sex marriage"), cultural and religious terms (e.g., "Ramadan", "pope Francis"), locations and other news-relevant terms. On average, the dataset contains 469 search queries for each topic. In total, 318,185 suggestions have been retrieved from Google and 353,484 from Bing.

    The file contains a "root_term" column based on the AllSides topic tags. The "query_input" column contains the search term submitted to the search engine ("search_engine"). "query_suggestion" and "rank" represents the search query suggestions at the respective positions returned by the search engines at the given time of search "datetime". We scraped our data from a US server saved in "location".

    We retrieved ten search query suggestions provided by the Google and Bing search autocomplete systems for the input of each of these root queries, without performing a search. Furthermore, we extended the root queries by the letters a to z (e.g., "democrats" (root term) >> "democrats a" (query input) >> "democrats and recession" (query suggestion)) to simulate a user's input during information search and generate a total of up to 270 query suggestions per topic and search engine. The dataset we provide contains columns for root term, query input, and query suggestion for each suggested query. The location from which the search is performed is the location of the Google servers running Colab, in our case Iowa in the United States of America, which is added to the dataset.

    AllSides Scraper

    At https://github.com/irgroup/Qbias, we provide a scraping tool, that allows for the automatic retrieval of all available articles at the AllSides balanced news headlines.

    We want to provide an easy means of retrieving the news and all corresponding information. For many tasks it is relevant to have the most recent documents available. Thus, we provide this Python-based scraper, that scrapes all available AllSides news articles and gathers available information. By providing the scraper we facilitate access to a recent version of the dataset for other researchers.

  14. Public electrophysiological datasets collected in the Buzsaki Lab

    • zenodo.org
    Updated Jul 22, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Peter Christian Petersen; Peter Christian Petersen; Michelle Hernandez; György Buzsáki; György Buzsáki; Michelle Hernandez (2024). Public electrophysiological datasets collected in the Buzsaki Lab [Dataset]. http://doi.org/10.5281/zenodo.3629881
    Explore at:
    Dataset updated
    Jul 22, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Peter Christian Petersen; Peter Christian Petersen; Michelle Hernandez; György Buzsáki; György Buzsáki; Michelle Hernandez
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The Buzsaki Lab is proud to present a large selection of experimental data available for public access: https://buzsakilab.com/wp/database/. We publicly share more than a thousand sessions (about 40TB of raw and spike- and LFP-processed data) via our public data repository. The datasets are from freely moving rodents and include sleep-task-sleep sessions (3 to 24 hrs continuous recording sessions) in various brain structures, including metadata. We are happy to assist you in using the data. Our goal is that by sharing these data, other users can provide new insights, extend, contradict, or clarify our conclusions.

    The databank contains electrophysiological recordings performed in freely moving rats and mice collected by investigators in the Buzsaki Lab over several years (a subset from head-fixed mice). Sessions have been collected with extracellular electrodes using high-channel-count silicon probes, with spike sorted single units, and intracellular and juxtacellular combined with extracellular electrodes. Several sessions include physiologically and optogenetically identified units. The sessions have been collected from various brain region pairs: the hippocampus, thalamus, amygdala, post-subiculum, septal region, and the entorhinal cortex, and various neocortical regions. In most behavioral tasks, the animals performed spatial behaviors (linear mazes and open fields), preceded and followed by long sleep sessions. Brain state classification is provided.

    Getting started

    The top menu “Databank” serves as a navigational menu to the databank. The metadata describing the experiments is stored in a relational database which means that there are many entry points for exploring the data. The databank is organized by projects, animal subjects, and sessions.

    Accessing and downloading the datasets

    We share the data through two services: our public Globus.org endpoint and our webshare: buzsakilab.nyumc.org. A subset of the datasets is also available at CRCNS.org. If you have an interest in a dataset that is not listed or is lacking information, please contact us. We pledge to make our data available immediately after publication.

    Support

    For support, please use our Buzsaki Databank google group. If you have an interest in a dataset that is not listed or is lacking information, please send us a request. Feel free to contact us, if you need more details on a given dataset or if a dataset is missing.

  15. Data from: PodcastMix - a dataset for separating music and speech in...

    • zenodo.org
    • data.niaid.nih.gov
    zip
    Updated Sep 12, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nicolas Schmidt; Jordi Pons; Marius Miron; Nicolas Schmidt; Jordi Pons; Marius Miron (2022). PodcastMix - a dataset for separating music and speech in podcasts [Dataset]. http://doi.org/10.5281/zenodo.5597047
    Explore at:
    zipAvailable download formats
    Dataset updated
    Sep 12, 2022
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Nicolas Schmidt; Jordi Pons; Marius Miron; Nicolas Schmidt; Jordi Pons; Marius Miron
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Note: due to zenodo limitations here we host solely the metadata. the whole dataset can be found at: https://drive.google.com/drive/u/0/folders/1tpg9WXkl4L0zU84AwLQjrFqnP-jw1t7z

    We introduce PodcastMix, a dataset formalizing the task of separating background music and foreground speech in podcasts. It contains audio files at 44.1kHz and the corresponding metadata. For further details check the following paper and the associated GitHub repository:

    This dataset contains four parts. Due to zenodo file size limitation we host the training dataset on google drive. We highlight the content of the zenodo archives within brackets:

    • [metadata] PodcastMix-synth train: large and diverse training set that is programatically generated (with a validation partition). The mixtures are created programatically with music from Jamendo and speech from the VCTK dataset.
    • [metadata] PodcastMix-synth test a programatically generated test set with reference stems to compute evaluation metrics. The mixtures are created programatically with music from Jamendo and speech from the VCTK dataset.
    • [audio and metadata] PodcastMix-real with-reference : a test set with real podcasts with reference stems to compute evaluation metrics. The podcasts are recorded by one of the authors and the source of the music is the FMA dataset.
    • [audio and metadata] PodcastMix-real no-reference: a test set with real podcasts with only the podcasts mixes for subjective evaluation. The podcasts are compiled from the internet.

    The training dataset, PodcastMix-synth may be found at our google drive repository: https://drive.google.com/drive/folders/1tpg9WXkl4L0zU84AwLQjrFqnP-jw1t7z?usp=sharing . The archive comprises 450GB of audio and metadata with the following structure:

    • [metadata and audio] PodcastMix-synth train: large and diverse training set that is programatically generated (with a validation partition). The mixtures are created programatically with music from Jamendo and speech from the VCTK dataset.
    • [metadata and audio] PodcastMix-synth test a programatically generated test set with reference stems to compute evaluation metrics. The mixtures are created programatically with music from Jamendo and speech from the VCTK dataset.

    Make sure you maintain the folder structure of the original dataset when you uncompress these files.


    This dataset is created by Nicolas Schmidt, Marius Miron, Music Technology Group - Universitat Pompeu Fabra (Barcelona) and Jordi Pons. This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 Unported License (CC BY-SA 4.0).


    Please acknowledge PodcastMix in Academic Research. When the present dataset is used for academic research, we would highly appreciate if authors quote the following publications:

    • N. Schmidt, J. Pons, M. Miron, "PodcastMix - a dataset for separating music and speech in podcasts", Interspeech (2022)
    • N. Schmidt, "PodcastMix - a dataset for separating music and speech in podcasts", Masters thesis, MTG, UPF (2021) https://zenodo.org/record/5554790#.YXLHvNlByWA


    The dataset and its contents are made available on an “as is” basis and without warranties of any kind, including without limitation satisfactory quality and conformity, merchantability, fitness for a particular purpose, accuracy or completeness, or absence of errors. Subject to any liability that may not be excluded or limited by law, the UPF is not liable for, and expressly excludes, all liability for loss or damage however and whenever caused to anyone by any use of the dataset or any part of it.


    PURPOSES. The data is processed for the general purpose of carrying out research development and innovation studies, works or projects. In particular, but without limitation, the data is processed for the purpose of communicating with Licensee regarding any administrative and legal / judicial purposes.

  16. Linked Open Data Management Services: A Comparison

    • zenodo.org
    • data.niaid.nih.gov
    Updated Sep 18, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Robert Nasarek; Robert Nasarek; Lozana Rossenova; Lozana Rossenova (2023). Linked Open Data Management Services: A Comparison [Dataset]. http://doi.org/10.5281/zenodo.7738424
    Explore at:
    Dataset updated
    Sep 18, 2023
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Robert Nasarek; Robert Nasarek; Lozana Rossenova; Lozana Rossenova
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Thanks to a variety of software services, it has never been easier to produce, manage and publish Linked Open Data. But until now, there has been a lack of an accessible overview to help researchers make the right choice for their use case. This dataset release will be regularly updated to reflect the latest data published in a comparison table developed in Google Sheets [1]. The comparison table includes the most commonly used LOD management software tools from NFDI4Culture to illustrate what functionalities and features a service should offer for the long-term management of FAIR research data, including:

    • ConedaKOR
    • LinkedDataHub
    • Metaphacts
    • Omeka S
    • ResearchSpace
    • Vitro
    • Wikibase
    • WissKI

    The table presents two views based on a comparison system of categories developed iteratively during workshops with expert users and developers from the respective tool communities. First, a short overview with field values coming from controlled vocabularies and multiple-choice options; and a second sheet allowing for more descriptive free text additions. The table and corresponding dataset releases for each view mode are designed to provide a well-founded basis for evaluation when deciding on a LOD management service. The Google Sheet table will remain open to collaboration and community contribution, as well as updates with new data and potentially new tools, whereas the datasets released here are meant to provide stable reference points with version control.

    The research for the comparison table was first presented as a paper at DHd2023, Open Humanities – Open Culture, 13-17.03.2023, Trier and Luxembourg [2].

    [1] Non-editing access is available here: docs.google.com/spreadsheets/d/1FNU8857JwUNFXmXAW16lgpjLq5TkgBUuafqZF-yo8_I/edit?usp=share_link To get editing access contact the authors.

    [2] Full paper will be made available open access in the conference proceedings.

  17. T

    civil_comments

    • tensorflow.org
    • huggingface.co
    Updated Feb 28, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2023). civil_comments [Dataset]. https://www.tensorflow.org/datasets/catalog/civil_comments
    Explore at:
    Dataset updated
    Feb 28, 2023
    Description

    This version of the CivilComments Dataset provides access to the primary seven labels that were annotated by crowd workers, the toxicity and other tags are a value between 0 and 1 indicating the fraction of annotators that assigned these attributes to the comment text.

    The other tags are only available for a fraction of the input examples. They are currently ignored for the main dataset; the CivilCommentsIdentities set includes those labels, but only consists of the subset of the data with them. The other attributes that were part of the original CivilComments release are included only in the raw data. See the Kaggle documentation for more details about the available features.

    The comments in this dataset come from an archive of the Civil Comments platform, a commenting plugin for independent news sites. These public comments were created from 2015 - 2017 and appeared on approximately 50 English-language news sites across the world. When Civil Comments shut down in 2017, they chose to make the public comments available in a lasting open archive to enable future research. The original data, published on figshare, includes the public comment text, some associated metadata such as article IDs, publication IDs, timestamps and commenter-generated "civility" labels, but does not include user ids. Jigsaw extended this dataset by adding additional labels for toxicity, identity mentions, as well as covert offensiveness. This data set is an exact replica of the data released for the Jigsaw Unintended Bias in Toxicity Classification Kaggle challenge. This dataset is released under CC0, as is the underlying comment text.

    For comments that have a parent_id also in the civil comments data, the text of the previous comment is provided as the "parent_text" feature. Note that the splits were made without regard to this information, so using previous comments may leak some information. The annotators did not have access to the parent text when making the labels.

    To use this dataset:

    import tensorflow_datasets as tfds
    
    ds = tfds.load('civil_comments', split='train')
    for ex in ds.take(4):
     print(ex)
    

    See the guide for more informations on tensorflow_datasets.

  18. a

    Data from: Google Earth Engine (GEE)

    • catalog-usgs.opendata.arcgis.com
    • data.amerigeoss.org
    • +5more
    Updated Nov 29, 2018
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    AmeriGEOSS (2018). Google Earth Engine (GEE) [Dataset]. https://catalog-usgs.opendata.arcgis.com/datasets/amerigeoss::google-earth-engine-gee
    Explore at:
    Dataset updated
    Nov 29, 2018
    Dataset authored and provided by
    AmeriGEOSS
    Description

    Meet Earth EngineGoogle Earth Engine combines a multi-petabyte catalog of satellite imagery and geospatial datasets with planetary-scale analysis capabilities and makes it available for scientists, researchers, and developers to detect changes, map trends, and quantify differences on the Earth's surface.SATELLITE IMAGERY+YOUR ALGORITHMS+REAL WORLD APPLICATIONSLEARN MOREGLOBAL-SCALE INSIGHTExplore our interactive timelapse viewer to travel back in time and see how the world has changed over the past twenty-nine years. Timelapse is one example of how Earth Engine can help gain insight into petabyte-scale datasets.EXPLORE TIMELAPSEREADY-TO-USE DATASETSThe public data archive includes more than thirty years of historical imagery and scientific datasets, updated and expanded daily. It contains over twenty petabytes of geospatial data instantly available for analysis.EXPLORE DATASETSSIMPLE, YET POWERFUL APIThe Earth Engine API is available in Python and JavaScript, making it easy to harness the power of Google’s cloud for your own geospatial analysis.EXPLORE THE APIGoogle Earth Engine has made it possible for the first time in history to rapidly and accurately process vast amounts of satellite imagery, identifying where and when tree cover change has occurred at high resolution. Global Forest Watch would not exist without it. For those who care about the future of the planet Google Earth Engine is a great blessing!-Dr. Andrew Steer, President and CEO of the World Resources Institute.CONVENIENT TOOLSUse our web-based code editor for fast, interactive algorithm development with instant access to petabytes of data.LEARN ABOUT THE CODE EDITORSCIENTIFIC AND HUMANITARIAN IMPACTScientists and non-profits use Earth Engine for remote sensing research, predicting disease outbreaks, natural resource management, and more.SEE CASE STUDIESREADY TO BE PART OF THE SOLUTION?SIGN UP NOWTERMS OF SERVICE PRIVACY ABOUT GOOGLE

  19. d

    Outscraper Google Maps Scraper

    • datarade.ai
    .json, .csv, .xls
    Updated Dec 9, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2021). Outscraper Google Maps Scraper [Dataset]. https://datarade.ai/data-products/outscraper-google-maps-scraper-outscraper
    Explore at:
    .json, .csv, .xlsAvailable download formats
    Dataset updated
    Dec 9, 2021
    Area covered
    United States
    Description

    Are you looking to identify B2B leads to promote your business, product, or service? Outscraper Google Maps Scraper might just be the tool you've been searching for. This powerful software enables you to extract business data directly from Google's extensive database, which spans millions of businesses across countless industries worldwide.

    Outscraper Google Maps Scraper is a tool built with advanced technology that lets you scrape a myriad of valuable information about businesses from Google's database. This information includes but is not limited to, business names, addresses, contact information, website URLs, reviews, ratings, and operational hours.

    Whether you are a small business trying to make a mark or a large enterprise exploring new territories, the data obtained from the Outscraper Google Maps Scraper can be a treasure trove. This tool provides a cost-effective, efficient, and accurate method to generate leads and gather market insights.

    By using Outscraper, you'll gain a significant competitive edge as it allows you to analyze your market and find potential B2B leads with precision. You can use this data to understand your competitors' landscape, discover new markets, or enhance your customer database. The tool offers the flexibility to extract data based on specific parameters like business category or geographic location, helping you to target the most relevant leads for your business.

    In a world that's growing increasingly data-driven, utilizing a tool like Outscraper Google Maps Scraper could be instrumental to your business' success. If you're looking to get ahead in your market and find B2B leads in a more efficient and precise manner, Outscraper is worth considering. It streamlines the data collection process, allowing you to focus on what truly matters – using the data to grow your business.

    https://outscraper.com/google-maps-scraper/

    As a result of the Google Maps scraping, your data file will contain the following details:

    Query Name Site Type Subtypes Category Phone Full Address Borough Street City Postal Code State Us State Country Country Code Latitude Longitude Time Zone Plus Code Rating Reviews Reviews Link Reviews Per Scores Photos Count Photo Street View Working Hours Working Hours Old Format Popular Times Business Status About Range Posts Verified Owner ID Owner Title Owner Link Reservation Links Booking Appointment Link Menu Link Order Links Location Link Place ID Google ID Reviews ID

    If you want to enrich your datasets with social media accounts and many more details you could combine Google Maps Scraper with Domain Contact Scraper.

    Domain Contact Scraper can scrape these details:

    Email Facebook Github Instagram Linkedin Phone Twitter Youtube

  20. Google Community Mobility Reports

    • console.cloud.google.com
    Updated May 2, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    https://console.cloud.google.com/marketplace/browse?filter=partner:BigQuery%20Public%20Datasets%20Program&inv=1&invt=Ab48sA (2020). Google Community Mobility Reports [Dataset]. https://console.cloud.google.com/marketplace/product/bigquery-public-datasets/covid19_google_mobility
    Explore at:
    Dataset updated
    May 2, 2020
    Dataset provided by
    BigQueryhttps://cloud.google.com/bigquery
    Googlehttp://google.com/
    Description

    UPDATE: The Community Mobility Reports are no longer being updated as of October 15, 2022. All historical data will remain publicly available for research purposes. This dataset aims to provide insights into what has changed in response to policies aimed at combating COVID-19. It reports movement trends over time by geography, across different categories of places such as retail and recreation, groceries and pharmacies, parks, transit stations, workplaces, and residential. This dataset is intended to help remediate the impact of COVID-19. It shouldn’t be used for medical diagnostic, prognostic, or treatment purposes. It also isn’t intended to be used for guidance on personal travel plans. To learn more about the dataset, the place categories and how we calculate these trends and preserve privacy, visit our help center or read the data documentation All bytes processed in queries against this dataset will be zeroed out, making this part of the query free. Data joined with the dataset will be billed at the normal rate to prevent abuse. After September 15, queries over these datasets will revert to the normal billing rate. This public dataset is hosted in Google BigQuery and is included in BigQuery's 1TB/mo of free tier processing. This means that each user receives 1TB of free BigQuery processing every month, which can be used to run queries on this public dataset. Watch this short video to learn how to get started quickly using BigQuery to access public datasets. What is BigQuery .

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
https://console.cloud.google.com/marketplace/browse?filter=partner:BigQuery%20Public%20Data&hl=de (2023). Google Ads Transparency Center [Dataset]. https://console.cloud.google.com/marketplace/product/bigquery-public-data/google-ads-transparency-center?hl=de
Organization logoOrganization logo

Google Ads Transparency Center

Explore at:
15 scholarly articles cite this dataset (View in Google Scholar)
Dataset updated
Sep 6, 2023
Dataset provided by
Googlehttp://google.com/
BigQueryhttps://cloud.google.com/bigquery
Description

This dataset contains two tables: creative_stats and removed_creative_stats. The creative_stats table contains information about advertisers that served ads in the European Economic Area or Turkey: their legal name, verification status, disclosed name, and location. It also includes ad specific information: impression ranges per region (including aggregate impressions for the European Economic Area), first shown and last shown dates, which criteria were used in audience selection, the format of the ad, the ad topic and whether the ad is funded by Google Ad Grants program. A link to the ad in the Google Ads Transparency Center is also provided. The removed_creative_stats table contains information about ads that served in the European Economic Area that Google removed: where and why they were removed and per-region information on when they served. The removed_creative_stats table also contains a link to the Google Ads Transparency Center for the removed ad. Data for both tables updates periodically and may be delayed from what appears on the Google Ads Transparency Center website. About BigQuery This data is hosted in Google BigQuery for users to easily query using SQL. Note that to use BigQuery, users must have a Google account and create a GCP project. This public dataset is included in BigQuery's 1TB/mo of free tier processing. Each user receives 1TB of free BigQuery processing every month, which can be used to run queries on this public dataset. Watch this short video to learn how to get started quickly using BigQuery to access public datasets. What is BigQuery . Download Dataset This public dataset is also hosted in Google Cloud Storage here and available free to use. Use this quick start guide to quickly learn how to access public datasets on Google Cloud Storage. We provide the raw data in JSON format, sharded across multiple files to support easier download of the large dataset. A README file which describes the data structure and our Terms of Service (also listed below) is included with the dataset. You can also download the results from a custom query. See here for options and instructions. Signed out users can download the full dataset by using the gCloud CLI. Follow the instructions here to download and install the gCloud CLI. To remove the login requirement, run "$ gcloud config set auth/disable_credentials True" To download the dataset, run "$ gcloud storage cp gs://ads-transparency-center/* . -R" This public dataset is hosted in Google BigQuery and is included in BigQuery's 1TB/mo of free tier processing. This means that each user receives 1TB of free BigQuery processing every month, which can be used to run queries on this public dataset. Watch this short video to learn how to get started quickly using BigQuery to access public datasets. What is BigQuery .

Search
Clear search
Close search
Google apps
Main menu