100+ datasets found

o
How to make google plus posts private - Dataset - openAFRICA
open.africa
Updated Jan 4, 2018
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2018). How to make google plus posts private - Dataset - openAFRICA [Dataset]. https://open.africa/dataset/how-to-make-google-plus-posts-private
Explore at:
Dataset updated
Jan 4, 2018
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
so if you have to have a G+ account (for YouTube, location services, or other reasons) - here's how you can make it totally private! No one will be able to add you, send you spammy links, or otherwise annoy you. You need to visit the "Audience Settings" page - https://plus.google.com/u/0/settings/audience You can then set a "custom audience" - usually you would use this to restrict your account to people from a specific geographic location, or within a specific age range. In this case, we're going to choose a custom audience of "No-one" Check the box and hit save. Now, when people try to visit your Google+ profile - they'll see this "restricted" message. You can visit my G+ Profile if you want to see this working. (https://plus.google.com/114725651137252000986) If you are not able to understand you can follow this website : http://www.livehuntz.com/google-plus/support-phone-number
About COVID-19 Public Datasets
console.cloud.google.com
Updated Jun 19, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
https://console.cloud.google.com/marketplace/browse?filter=partner:BigQuery%20Public%20Datasets%20Program&inv=1&invt=Ab2YUw (2022). About COVID-19 Public Datasets [Dataset]. https://console.cloud.google.com/marketplace/product/bigquery-public-datasets/covid19-public-data-program
Explore at:
Dataset updated
Jun 19, 2022
Dataset provided by
Googlehttp://google.com/
BigQueryhttps://cloud.google.com/bigquery
Description
In an effort to help combat COVID-19, we created a COVID-19 Public Datasets program to make data more accessible to researchers, data scientists and analysts. The program will host a repository of public datasets that relate to the COVID-19 crisis and make them free to access and analyze. These include datasets from the New York Times, European Centre for Disease Prevention and Control, Google, Global Health Data from the World Bank, and OpenStreetMap. Free hosting and queries of COVID datasets As with all data in the Google Cloud Public Datasets Program , Google pays for storage of datasets in the program. BigQuery also provides free queries over certain COVID-related datasets to support the response to COVID-19. Queries on COVID datasets will not count against the BigQuery sandbox free tier , where you can query up to 1TB free each month. Limitations and duration Queries of COVID data are free. If, during your analysis, you join COVID datasets with non-COVID datasets, the bytes processed in the non-COVID datasets will be counted against the free tier, then charged accordingly, to prevent abuse. Queries of COVID datasets will remain free until Sept 15, 2021. The contents of these datasets are provided to the public strictly for educational and research purposes only. We are not onboarding or managing PHI or PII data as part of the COVID-19 Public Dataset Program. Google has practices & policies in place to ensure that data is handled in accordance with widely recognized patient privacy and data security policies. See the list of all datasets included in the program
Meta Kaggle Code
kaggle.com
zip
Updated Jul 10, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kaggle (2025). Meta Kaggle Code [Dataset]. https://www.kaggle.com/datasets/kaggle/meta-kaggle-code/code
Explore at:
zip(148301844275 bytes)Available download formats
Dataset updated
Jul 10, 2025
Dataset authored and provided by
Kagglehttp://kaggle.com/
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Explore our public notebook content!

Meta Kaggle Code is an extension to our popular Meta Kaggle dataset. This extension contains all the raw source code from hundreds of thousands of public, Apache 2.0 licensed Python and R notebooks versions on Kaggle used to analyze Datasets, make submissions to Competitions, and more. This represents nearly a decade of data spanning a period of tremendous evolution in the ways ML work is done.

Why we’re releasing this dataset

By collecting all of this code created by Kaggle’s community in one dataset, we hope to make it easier for the world to research and share insights about trends in our industry. With the growing significance of AI-assisted development, we expect this data can also be used to fine-tune models for ML-specific code generation tasks.

Meta Kaggle for Code is also a continuation of our commitment to open data and research. This new dataset is a companion to Meta Kaggle which we originally released in 2016. On top of Meta Kaggle, our community has shared nearly 1,000 public code examples. Research papers written using Meta Kaggle have examined how data scientists collaboratively solve problems, analyzed overfitting in machine learning competitions, compared discussions between Kaggle and Stack Overflow communities, and more.

The best part is Meta Kaggle enriches Meta Kaggle for Code. By joining the datasets together, you can easily understand which competitions code was run against, the progression tier of the code’s author, how many votes a notebook had, what kinds of comments it received, and much, much more. We hope the new potential for uncovering deep insights into how ML code is written feels just as limitless to you as it does to us!

Sensitive data

While we have made an attempt to filter out notebooks containing potentially sensitive information published by Kaggle users, the dataset may still contain such information. Research, publications, applications, etc. relying on this data should only use or report on publicly available, non-sensitive information.

Joining with Meta Kaggle

The files contained here are a subset of the KernelVersions in Meta Kaggle. The file names match the ids in the KernelVersions csv file. Whereas Meta Kaggle contains data for all interactive and commit sessions, Meta Kaggle Code contains only data for commit sessions.

File organization

The files are organized into a two-level directory structure. Each top level folder contains up to 1 million files, e.g. - folder 123 contains all versions from 123,000,000 to 123,999,999. Each sub folder contains up to 1 thousand files, e.g. - 123/456 contains all versions from 123,456,000 to 123,456,999. In practice, each folder will have many fewer than 1 thousand files due to private and interactive sessions.

The ipynb files in this dataset hosted on Kaggle do not contain the output cells. If the outputs are required, the full set of ipynbs with the outputs embedded can be obtained from this public GCS bucket: kaggle-meta-kaggle-code-downloads. Note that this is a "requester pays" bucket. This means you will need a GCP account with billing enabled to download. Learn more here: https://cloud.google.com/storage/docs/requester-pays

Questions / Comments

We love feedback! Let us know in the Discussion tab.

Happy Kaggling!
Google Trends - International
console.cloud.google.com
Updated Jul 22, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
https://console.cloud.google.com/marketplace/browse?filter=partner:BigQuery%20Public%20Datasets%20Program&inv=1&invt=Ab2hhQ (2018). Google Trends - International [Dataset]. https://console.cloud.google.com/marketplace/product/bigquery-public-datasets/google-trends-intl
Explore at:
Dataset updated
Jul 22, 2018
Dataset provided by
Googlehttp://google.com/
BigQueryhttps://cloud.google.com/bigquery
Google Searchhttp://google.com/
Description
The International Google Trends dataset will provide critical signals that individual users and businesses alike can leverage to make better data-driven decisions. This dataset simplifies the manual interaction with the existing Google Trends UI by automating and exposing anonymized, aggregated, and indexed search data in BigQuery. This dataset includes the Top 25 stories and Top 25 Rising queries from Google Trends. It will be made available as two separate BigQuery tables, with a set of new top terms appended daily. Each set of Top 25 and Top 25 rising expires after 30 days, and will be accompanied by a rolling five-year window of historical data for each country and region across the globe, where data is available. This Google dataset is hosted in Google BigQuery as part of Google Cloud's Datasets solution and is included in BigQuery's 1TB/mo of free tier processing. This means that each user receives 1TB of free BigQuery processing every month, which can be used to run queries on this public dataset. Watch this short video to learn how to get started quickly using BigQuery to access public datasets. What is BigQuery
Google energy consumption 2011-2023
statista.com
ai-chatbox.pro
Updated Oct 11, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2024). Google energy consumption 2011-2023 [Dataset]. https://www.statista.com/statistics/788540/energy-consumption-of-google/
Explore at:
Dataset updated
Oct 11, 2024
Dataset authored and provided by
Statistahttp://statista.com/
Area covered
Worldwide
Description
Google’s energy consumption has increased over the last few years, reaching 25.9 terawatt hours in 2023, up from 12.8 terawatt hours in 2019. The company has made efforts to make its data centers more efficient through customized high-performance servers, using smart temperature and lighting, advanced cooling techniques, and machine learning. Datacenters and energy Through its operations, Google pursues a more sustainable impact on the environment by creating efficient data centers that use less energy than the average, transitioning towards renewable energy, creating sustainable workplaces, and providing its users with the technological means towards a cleaner future for the future generations. Through its efficient data centers, Google has also managed to divert waste from its operations away from landfills. Reducing Google’s carbon footprint Google’s clean energy efforts is also related to their efforts to reduce their carbon footprint. Since their commitment to using 100 percent renewable energy, the company has met their targets largely through solar and wind energy power purchase agreements and buying renewable power from utilities. Google is one of the largest corporate purchasers of renewable energy in the world.

Google's Audioset: Reformatted

zenodo.org
data.niaid.nih.gov

tsv

Updated Sep 21, 2022

Facebook

Twitter

Click to copy link

Link copied

Cite

Bakhtin; Bakhtin (2022). Google's Audioset: Reformatted [Dataset]. http://doi.org/10.5281/zenodo.7096702

Explore at:

tsvAvailable download formats

Unique identifier

https://doi.org/10.5281/zenodo.7096702

Dataset updated

Sep 21, 2022

Dataset provided by

Zenodohttp://zenodo.org/

Authors

Bakhtin; Bakhtin

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Google's AudioSet consistently reformatted

During my work with Google's AudioSet(https://research.google.com/audioset/index.html)
I encountered some problems due to the fact that Weak (https://research.google.com/audioset/download.html) and
 Strong (https://research.google.com/audioset/download_strong.html) versions of the dataset used different csv formatting for the data, and that also labels used in the two datasets are different (https://github.com/audioset/ontology/issues/9) and also presented in files with different formatting.

This dataset reformatting aims to unify the formats of the datasets so that it is possible
to analyse them in the same pipelines, and also make the dataset files compatible
with psds_eval, dcase_util and sed_eval Python packages used in Audio Processing.

For better formatted documentation and source code of reformatting refer to https://github.com/bakhtos/GoogleAudioSetReformatted 

-Changes in dataset

All files are converted to tab-separated `*.tsv` files (i.e. `csv` files with `\t`
as a separator). All files have a header as the first line.

-New fields and filenames

Fields are renamed according to the following table, to be compatible with psds_eval:

Old field -> New field
YTID -> filename
segment_id -> filename
start_seconds -> onset
start_time_seconds -> onset
end_seconds -> offset
end_time_seconds -> offset
positive_labels -> event_label
label -> event_label
present -> present

For class label files, `id` is now the name for the for `mid` label (e.g. `/m/09xor`)
and `label` for the human-readable label (e.g. `Speech`). Index of label indicated
for Weak dataset labels (`index` field in `class_labels_indices.csv`) is not used.

Files are renamed according to the following table to ensure consisted naming
of the form `audioset_[weak|strong]_[train|eval]_[balanced|unbalanced|posneg]*.tsv`:

Old name -> New name
balanced_train_segments.csv -> audioset_weak_train_balanced.tsv
unbalanced_train_segments.csv -> audioset_weak_train_unbalanced.tsv
eval_segments.csv -> audioset_weak_eval.tsv
audioset_train_strong.tsv -> audioset_strong_train.tsv
audioset_eval_strong.tsv -> audioset_strong_eval.tsv
audioset_eval_strong_framed_posneg.tsv -> audioset_strong_eval_posneg.tsv
class_labels_indices.csv -> class_labels.tsv (merged with mid_to_display_name.tsv)
mid_to_display_name.tsv -> class_labels.tsv (merged with class_labels_indices.csv)

-Strong dataset changes

Only changes to the Strong dataset are renaming of fields and reordering of columns,
so that both Weak and Strong version have `filename` and `event_label` as first 
two columns.

-Weak dataset changes

-- Labels are given one per line, instead of comma-separated and quoted list

-- To make sure that `filename` format is the same as in Strong version, the following
format change is made:
The value of the `start_seconds` field is converted to milliseconds and appended to the `filename` with an underscore. Since all files in the dataset are assumed to be 10 seconds long, this unifies the format of `filename` with the Strong version and makes `end_seconds` also redundant.

-Class labels changes

Class labels from both datasets are merged into one file and given in alphabetical order of `id`s. Since same `id`s are present in both datasets, but sometimes with different human-readable labels, labels from Strong dataset overwrite those from Weak. It is possible to regenerate `class_labels.tsv` while giving priority to the Weak version of labels by calling `convert_labels(False)` from convert.py in the GitHub repository.

-License

Google's AudioSet was published in two stages - first the Weakly labelled data (Gemmeke, Jort F., et al. "Audio set: An ontology and human-labeled dataset for audio events." 2017 IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE, 2017.), then the strongly labelled data (Hershey, Shawn, et al. "The benefit of temporally-strong labels in audio event classification." ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2021.)

Both the original dataset and this reworked version are licensed under [CC BY 4.0](https://creativecommons.org/licenses/by/4.0/)

Class labels come from the AudioSet Ontology, which is licensed under CC BY-SA 4.0.

Google Analytics Sample
kaggle.com
zip
Updated Sep 19, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Google BigQuery (2019). Google Analytics Sample [Dataset]. https://www.kaggle.com/bigquery/google-analytics-sample
Explore at:
zip(0 bytes)Available download formats
Dataset updated
Sep 19, 2019
Dataset provided by
Googlehttp://google.com/
BigQueryhttps://cloud.google.com/bigquery
Authors
Google BigQuery
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Context

The Google Merchandise Store sells Google branded merchandise. The data is typical of what you would see for an ecommerce website.

Content

The sample dataset contains Google Analytics 360 data from the Google Merchandise Store, a real ecommerce store. The Google Merchandise Store sells Google branded merchandise. The data is typical of what you would see for an ecommerce website. It includes the following kinds of information:

Traffic source data: information about where website visitors originate. This includes data about organic traffic, paid search traffic, display traffic, etc. Content data: information about the behavior of users on the site. This includes the URLs of pages that visitors look at, how they interact with content, etc. Transactional data: information about the transactions that occur on the Google Merchandise Store website.

Fork this kernel to get started.

Acknowledgements

Data from: https://bigquery.cloud.google.com/table/bigquery-public-data:google_analytics_sample.ga_sessions_20170801

Banner Photo by Edho Pratama from Unsplash.

Inspiration

What is the total number of transactions generated per device browser in July 2017?

The real bounce rate is defined as the percentage of visits with a single pageview. What was the real bounce rate per traffic source?

What was the average number of product pageviews for users who made a purchase in July 2017?

What was the average number of product pageviews for users who did not make a purchase in July 2017?

What was the average total transactions per user that made a purchase in July 2017?

What is the average amount of money spent per session in July 2017?

What is the sequence of pages viewed?
d
Outscraper Google Maps Scraper
datarade.ai
.csv, .xls, .json
Updated Dec 9, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2021). Outscraper Google Maps Scraper [Dataset]. https://datarade.ai/data-products/outscraper-google-maps-scraper-outscraper
Explore at:
.csv, .xls, .jsonAvailable download formats
Dataset updated
Dec 9, 2021
Area covered
Guyana, Uruguay, Mayotte, Cameroon, Sint Eustatius and Saba, Egypt, Zimbabwe, Western Sahara, United States Minor Outlying Islands, Botswana
Description
Are you looking to identify B2B leads to promote your business, product, or service? Outscraper Google Maps Scraper might just be the tool you've been searching for. This powerful software enables you to extract business data directly from Google's extensive database, which spans millions of businesses across countless industries worldwide.

Outscraper Google Maps Scraper is a tool built with advanced technology that lets you scrape a myriad of valuable information about businesses from Google's database. This information includes but is not limited to, business names, addresses, contact information, website URLs, reviews, ratings, and operational hours.

Whether you are a small business trying to make a mark or a large enterprise exploring new territories, the data obtained from the Outscraper Google Maps Scraper can be a treasure trove. This tool provides a cost-effective, efficient, and accurate method to generate leads and gather market insights.

By using Outscraper, you'll gain a significant competitive edge as it allows you to analyze your market and find potential B2B leads with precision. You can use this data to understand your competitors' landscape, discover new markets, or enhance your customer database. The tool offers the flexibility to extract data based on specific parameters like business category or geographic location, helping you to target the most relevant leads for your business.

In a world that's growing increasingly data-driven, utilizing a tool like Outscraper Google Maps Scraper could be instrumental to your business' success. If you're looking to get ahead in your market and find B2B leads in a more efficient and precise manner, Outscraper is worth considering. It streamlines the data collection process, allowing you to focus on what truly matters – using the data to grow your business.

https://outscraper.com/google-maps-scraper/

As a result of the Google Maps scraping, your data file will contain the following details:

Query Name Site Type Subtypes Category Phone Full Address Borough Street City Postal Code State Us State Country Country Code Latitude Longitude Time Zone Plus Code Rating Reviews Reviews Link Reviews Per Scores Photos Count Photo Street View Working Hours Working Hours Old Format Popular Times Business Status About Range Posts Verified Owner ID Owner Title Owner Link Reservation Links Booking Appointment Link Menu Link Order Links Location Link Place ID Google ID Reviews ID

If you want to enrich your datasets with social media accounts and many more details you could combine Google Maps Scraper with Domain Contact Scraper.

Domain Contact Scraper can scrape these details:

Email Facebook Github Instagram Linkedin Phone Twitter Youtube
Google Trends
console.cloud.google.com
Updated Jul 18, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
https://console.cloud.google.com/marketplace/browse?filter=partner:BigQuery%20Public%20Datasets%20Program&inv=1&invt=Ab1KDQ (2018). Google Trends [Dataset]. https://console.cloud.google.com/marketplace/product/bigquery-public-datasets/google-search-trends
Explore at:
Dataset updated
Jul 18, 2018
Dataset provided by
Googlehttp://google.com/
BigQueryhttps://cloud.google.com/bigquery
Google Searchhttp://google.com/
Description
The Google Trends dataset will provide critical signals that individual users and businesses alike can leverage to make better data-driven decisions. This dataset simplifies the manual interaction with the existing Google Trends UI by automating and exposing anonymized, aggregated, and indexed search data in BigQuery. This dataset includes the Top 25 stories and Top 25 Rising queries from Google Trends. It will be made available as two separate BigQuery tables, with a set of new top terms appended daily. Each set of Top 25 and Top 25 rising expires after 30 days, and will be accompanied by a rolling five-year window of historical data in 210 distinct locations in the United States. This Google dataset is hosted in Google BigQuery as part of Google Cloud's Datasets solution and is included in BigQuery's 1TB/mo of free tier processing. This means that each user receives 1TB of free BigQuery processing every month, which can be used to run queries on this public dataset. Watch this short video to learn how to get started quickly using BigQuery to access public datasets. What is BigQuery
o
Google Trends And Wikipedia Page Views
explore.openaire.eu
zenodo.org
Updated Jun 25, 2015
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mitsuo Yoshida (2015). Google Trends And Wikipedia Page Views [Dataset]. http://doi.org/10.5281/zenodo.14539
Explore at:
Unique identifier
https://doi.org/10.5281/zenodo.14539
Dataset updated
Jun 25, 2015
Authors
Mitsuo Yoshida
Description
Abstract (our paper) The frequency of a web search keyword generally reflects the degree of public interest in a particular subject matter. Search logs are therefore useful resources for trend analysis. However, access to search logs is typically restricted to search engine providers. In this paper, we investigate whether search frequency can be estimated from a different resource such as Wikipedia page views of open data. We found frequently searched keywords to have remarkably high correlations with Wikipedia page views. This suggests that Wikipedia page views can be an effective tool for determining popular global web search trends. Data personal-name.txt.gz: The first column is the Wikipedia article id, the second column is the search keyword, the third column is the Wikipedia article title, and the fourth column is the total of page views from 2008 to 2014. personal-name_data_google-trends.txt.gz, personal-name_data_wikipedia.txt.gz: The first column is the period to be collected, the second column is the source (Google or Wikipedia), the third column is the Wikipedia article id, the fourth column is the search keyword, the fifth column is the date, and the sixth column is the value of search trend or page view. Publication This data set was created for our study. If you make use of this data set, please cite: Mitsuo Yoshida, Yuki Arase, Takaaki Tsunoda, Mikio Yamamoto. Wikipedia Page View Reflects Web Search Trend. Proceedings of the 2015 ACM Web Science Conference (WebSci '15). no.65, pp.1-2, 2015. http://dx.doi.org/10.1145/2786451.2786495 http://arxiv.org/abs/1509.02218 (author-created version) Note The raw data of Wikipedia page views is available in the following page. http://dumps.wikimedia.org/other/pagecounts-raw/ {"references": ["Mitsuo Yoshida, Yuki Arase, Takaaki Tsunoda, Mikio Yamamoto. Wikipedia Page View Reflects Web Search Trend. Proceedings of the 2015 ACM Web Science Conference (WebSci '15). no.65, pp.1-2, 2015.", "Mitsuo Yoshida, Yuki Arase, Takaaki Tsunoda, Mikio Yamamoto. Wikipedia Page View Analysis for Search Trend Prediction. Proceedings of the Annual Conference of Japanese Society for Artificial Intelligence (in Japanese). vol.29, no.2I1-1, pp.1-4, 2015."]}
Company Datasets for Business Profiling
datarade.ai
Updated Feb 23, 2017
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Oxylabs (2017). Company Datasets for Business Profiling [Dataset]. https://datarade.ai/data-products/company-datasets-for-business-profiling-oxylabs
Explore at:
.json, .xml, .csv, .xlsAvailable download formats
Dataset updated
Feb 23, 2017
Dataset authored and provided by
Oxylabs
Area covered
British Indian Ocean Territory, Northern Mariana Islands, Bangladesh, Moldova (Republic of), Canada, Nepal, Isle of Man, Andorra, Taiwan, Tunisia
Description
Company Datasets for valuable business insights!

Discover new business prospects, identify investment opportunities, track competitor performance, and streamline your sales efforts with comprehensive Company Datasets.

These datasets are sourced from top industry providers, ensuring you have access to high-quality information:

Owler: Gain valuable business insights and competitive intelligence. -AngelList: Receive fresh startup data transformed into actionable insights. -CrunchBase: Access clean, parsed, and ready-to-use business data from private and public companies. -Craft.co: Make data-informed business decisions with Craft.co's company datasets. -Product Hunt: Harness the Product Hunt dataset, a leader in curating the best new products.

We provide fresh and ready-to-use company data, eliminating the need for complex scraping and parsing. Our data includes crucial details such as:

Company name;

Size;

Founding date;

Location;

Industry;

Revenue;

Employee count;

Competitors.

You can choose your preferred data delivery method, including various storage options, delivery frequency, and input/output formats.

Receive datasets in CSV, JSON, and other formats, with storage options like AWS S3 and Google Cloud Storage. Opt for one-time, monthly, quarterly, or bi-annual data delivery.

With Oxylabs Datasets, you can count on:

Fresh and accurate data collected and parsed by our expert web scraping team.

Time and resource savings, allowing you to focus on data analysis and achieving your business goals.

A customized approach tailored to your specific business needs.

Legal compliance in line with GDPR and CCPA standards, thanks to our membership in the Ethical Web Data Collection Initiative.

Pricing Options:

Standard Datasets: choose from various ready-to-use datasets with standardized data schemas, priced from $1,000/month.

Custom Datasets: Tailor datasets from any public web domain to your unique business needs. Contact our sales team for custom pricing.

Experience a seamless journey with Oxylabs:

Understanding your data needs: We work closely to understand your business nature and daily operations, defining your unique data requirements.

Developing a customized solution: Our experts create a custom framework to extract public data using our in-house web scraping infrastructure.

Delivering data sample: We provide a sample for your feedback on data quality and the entire delivery process.

Continuous data delivery: We continuously collect public data and deliver custom datasets per the agreed frequency.

Unlock the power of data with Oxylabs' Company Datasets and supercharge your business insights today!
COVID-19 Community Mobility Report - Dataset - ADH Data Portal
ckan.africadatahub.org
Updated Feb 15, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
ckan.africadatahub.org (2021). COVID-19 Community Mobility Report - Dataset - ADH Data Portal [Dataset]. https://ckan.africadatahub.org/dataset/covid-19-community-mobility-report
Explore at:
Dataset updated
Feb 15, 2021
Dataset provided by
CKANhttps://ckan.org/
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset is a snapshot of the Community Mobility Reports generated by Google. Google developed these sets as a response to public health officials who expressed that the same type of aggregated, anonymized insights used in products such as Google Maps could be helpful as they make critical decisions to combat COVID-19. Each Community Mobility Report is broken down by location and displays the change in visits to places like grocery stores and parks. These Community Mobility Reports aim to provide insights into what has changed in response to policies aimed at combating COVID-19. The reports chart movement trends over time by geography, across different categories of places such as retail and recreation, groceries and pharmacies, parks, transit stations, workplaces, and residential. In order to download or use the data or reports, you must agree to the Google Terms of Service. Learn more about the data here
o
Google GoEmotions Dataset
opendatabay.com
.undefined
Updated Jul 3, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Datasimple (2025). Google GoEmotions Dataset [Dataset]. https://www.opendatabay.com/data/ai-ml/c98ae93a-abde-4a9f-ad5e-97ed418f598f
Explore at:
.undefinedAvailable download formats
Dataset updated
Jul 3, 2025
Dataset authored and provided by
Datasimple
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Area covered
Reviews & Ratings
Description
The Google AI GoEmotions dataset contains comments from Reddit users, each labelled with its emotional colouring. This dataset is primarily designed to train neural networks for performing deep analysis of text tonality. Unlike many existing emotion classification datasets that often cover narrow areas like news headlines or movie subtitles, and typically use a limited scale of six basic emotions (anger, surprise, disgust, joy, fear, and sadness), GoEmotions offers a much broader emotional spectrum. This expansion enables the development of more sensitive chatbots, enhanced models for detecting hazardous online behaviour, and improved customer support services through a deeper understanding of textual emotion. The emotion categories were collaboratively identified by Google and psychologists, encompassing 12 positive, 11 negative, 4 ambiguous, and 1 neutral emotion, making it well-suited for tasks requiring fine-grained emotion differentiation.

Columns

id: A unique identifier for each comment. This column contains approximately 58,011 unique values.

text: The original text content from the Reddit comment. There are around 57,732 unique text entries.

example_very_unclear: A boolean field indicating whether the text content is considered very unclear. Approximately 207,814 entries are marked as false, while 3,411 are marked as true.

admiration: An emotion label.

amusement: An emotion label.

anger: An emotion label.

annoyance: An emotion label.

approval: An emotion label.

caring: An emotion label.

confusion: An emotion label. (Note: The dataset includes additional columns for the remaining 20+ fine-grained emotion labels mentioned in the dataset's description.)

Distribution

This dataset is typically provided in a CSV data file format. It contains a substantial number of records, with the sum of false and true values in the example_very_unclear column suggesting over 210,000 individual comments or records. The structure is organised to facilitate direct use in machine learning and natural language processing tasks.

Usage

This dataset is ideal for several applications, particularly for projects focused on emotion recognition and text analysis. Its primary use is for training neural networks to perform deep analysis of text tonality. This capability can be leveraged to develop more sensitive chatbots, create models for detecting dangerous online behaviour, and significantly improve customer support services by allowing systems to better understand the emotional nuances in user communications.

Coverage

The dataset comprises comments sourced from Reddit users, which implies a global geographic coverage. Specific details regarding the time range of the comments or the precise demographics of the Reddit users are not provided within the available information.

License

CCO

Who Can Use It

This dataset is particularly valuable for: * AI and Machine Learning Researchers: For advancing the field of emotion recognition and fine-grained sentiment analysis. * Natural Language Processing (NLP) Developers: To build applications that require the ability to discern and react to emotional states in text. * Chatbot Developers: To design and implement conversational AI that exhibits higher emotional intelligence and provides more empathetic interactions. * Data Scientists: Interested in exploring and modelling human emotions expressed through social media text.

Dataset Name Suggestions

Google GoEmotions Dataset

Reddit Emotion Comments Dataset

Fine-Grained Emotion Text Dataset

Attributes

Original Data Source: Go Emotions: Google Emotions Dataset
Linked Open Data Management Services: A Comparison
zenodo.org
data.niaid.nih.gov
Updated Sep 18, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Robert Nasarek; Robert Nasarek; Lozana Rossenova; Lozana Rossenova (2023). Linked Open Data Management Services: A Comparison [Dataset]. http://doi.org/10.5281/zenodo.7738424
Explore at:
Unique identifier
https://doi.org/10.5281/zenodo.7738424
Dataset updated
Sep 18, 2023
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Robert Nasarek; Robert Nasarek; Lozana Rossenova; Lozana Rossenova
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Thanks to a variety of software services, it has never been easier to produce, manage and publish Linked Open Data. But until now, there has been a lack of an accessible overview to help researchers make the right choice for their use case. This dataset release will be regularly updated to reflect the latest data published in a comparison table developed in Google Sheets [1]. The comparison table includes the most commonly used LOD management software tools from NFDI4Culture to illustrate what functionalities and features a service should offer for the long-term management of FAIR research data, including:

ConedaKOR

LinkedDataHub

Metaphacts

Omeka S

ResearchSpace

Vitro

Wikibase

WissKI

The table presents two views based on a comparison system of categories developed iteratively during workshops with expert users and developers from the respective tool communities. First, a short overview with field values coming from controlled vocabularies and multiple-choice options; and a second sheet allowing for more descriptive free text additions. The table and corresponding dataset releases for each view mode are designed to provide a well-founded basis for evaluation when deciding on a LOD management service. The Google Sheet table will remain open to collaboration and community contribution, as well as updates with new data and potentially new tools, whereas the datasets released here are meant to provide stable reference points with version control.

The research for the comparison table was first presented as a paper at DHd2023, Open Humanities – Open Culture, 13-17.03.2023, Trier and Luxembourg [2].

[1] Non-editing access is available here: docs.google.com/spreadsheets/d/1FNU8857JwUNFXmXAW16lgpjLq5TkgBUuafqZF-yo8_I/edit?usp=share_link To get editing access contact the authors.

[2] Full paper will be made available open access in the conference proceedings.
COKI Language Dataset
zenodo.org
application/gzip, csv
Updated Jun 16, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
James P. Diprose; James P. Diprose; Cameron Neylon; Cameron Neylon (2022). COKI Language Dataset [Dataset]. http://doi.org/10.5281/zenodo.6636625
Explore at:
application/gzip, csvAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.6636625
Dataset updated
Jun 16, 2022
Dataset provided by
Zenodohttp://zenodo.org/
Authors
James P. Diprose; James P. Diprose; Cameron Neylon; Cameron Neylon
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The COKI Language Dataset contains predictions for 122 million academic publications. The dataset consists of DOI, title, ISO language code and the fastText language prediction probability score.

Methodology
A subset of the COKI Academic Observatory Dataset, which is produced by the Academic Observatory Workflows codebase [1], was extracted and converted to CSV with Bigquery and downloaded to a virtual machine. The subset consists of all publications with DOIs in our dataset, including each publication’s title and abstract from both Crossref Metadata and Microsoft Academic Graph. The CSV files were then processed with a Python script. The titles and abstracts for each record were pre-processed, concatenated together and analysed with fastText. The titles and abstracts from Crossref Metadata were used first, with the MAG titles and abstracts serving as a fallback when the Crossref Metadata information was empty. Language was predicted for each publication using the fastText lid.176.bin language identification model [2]. fastText was chosen because of its high accuracy and fast runtime speed [3]. The final output dataset consists of DOI, title, ISO language code and the fastText language prediction probability score.

Query or Download
The data is publicly accessible in BigQuery in the following two tables:

coki-data-share.language.doi_language

coki-data-share.language.iso_language

When you make queries on these tables, make sure that you are in your own Google Cloud project, otherwise the queries will fail.

See the COKI Language Detection README for instructions on how to download the data from Zenodo and load it into BigQuery.

Code
The code that generated this dataset, the BigQuery schemas and instructions for loading the data into BigQuery can be found here: https://github.com/The-Academic-Observatory/coki-language

License
COKI Language Dataset © 2022 by Curtin University is licenced under CC BY 4.0.

Attributions
This work contains information from:

Microsoft Academic Graph which is made available under the ODC Attribution Licence.

Crossref Metadata via the Metadata Plus program. Bibliographic metadata is made available without copyright restriction and Crossref generated data under a CC0 licence. See metadata licence information for more details.

References
[1] https://doi.org/10.5281/zenodo.6366695
[2] https://fasttext.cc/docs/en/language-identification.html
[3] https://modelpredict.com/language-identification-survey
Public electrophysiological datasets collected in the Buzsaki Lab
zenodo.org
Updated Jul 22, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Peter Christian Petersen; Peter Christian Petersen; Michelle Hernandez; György Buzsáki; György Buzsáki; Michelle Hernandez (2024). Public electrophysiological datasets collected in the Buzsaki Lab [Dataset]. http://doi.org/10.5281/zenodo.3629881
Explore at:
Unique identifier
https://doi.org/10.5281/zenodo.3629881
Dataset updated
Jul 22, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Peter Christian Petersen; Peter Christian Petersen; Michelle Hernandez; György Buzsáki; György Buzsáki; Michelle Hernandez
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The Buzsaki Lab is proud to present a large selection of experimental data available for public access: https://buzsakilab.com/wp/database/. We publicly share more than a thousand sessions (about 40TB of raw and spike- and LFP-processed data) via our public data repository. The datasets are from freely moving rodents and include sleep-task-sleep sessions (3 to 24 hrs continuous recording sessions) in various brain structures, including metadata. We are happy to assist you in using the data. Our goal is that by sharing these data, other users can provide new insights, extend, contradict, or clarify our conclusions.

The databank contains electrophysiological recordings performed in freely moving rats and mice collected by investigators in the Buzsaki Lab over several years (a subset from head-fixed mice). Sessions have been collected with extracellular electrodes using high-channel-count silicon probes, with spike sorted single units, and intracellular and juxtacellular combined with extracellular electrodes. Several sessions include physiologically and optogenetically identified units. The sessions have been collected from various brain region pairs: the hippocampus, thalamus, amygdala, post-subiculum, septal region, and the entorhinal cortex, and various neocortical regions. In most behavioral tasks, the animals performed spatial behaviors (linear mazes and open fields), preceded and followed by long sleep sessions. Brain state classification is provided.

Getting started

The top menu “Databank” serves as a navigational menu to the databank. The metadata describing the experiments is stored in a relational database which means that there are many entry points for exploring the data. The databank is organized by projects, animal subjects, and sessions.

Accessing and downloading the datasets

We share the data through two services: our public Globus.org endpoint and our webshare: buzsakilab.nyumc.org. A subset of the datasets is also available at CRCNS.org. If you have an interest in a dataset that is not listed or is lacking information, please contact us. We pledge to make our data available immediately after publication.

Support

For support, please use our Buzsaki Databank google group. If you have an interest in a dataset that is not listed or is lacking information, please send us a request. Feel free to contact us, if you need more details on a given dataset or if a dataset is missing.
Amazon Web Services: year-on-year growth 2014-2025
statista.com
Updated May 13, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2025). Amazon Web Services: year-on-year growth 2014-2025 [Dataset]. https://www.statista.com/statistics/422273/yoy-quarterly-growth-aws-revenues/
Explore at:
Dataset updated
May 13, 2025
Dataset authored and provided by
Statistahttp://statista.com/
Area covered
Worldwide
Description
In the first quarter of 2025, revenues of Amazon Web Services (AWS) rose to 17 percent, a decrease from the previous three quarters. AWS is one of Amazon’s strongest revenue segments, generating over 115 billion U.S. dollars in 2024 net sales, up from 105 billion U.S. dollars in 2023. Amazon Web Services Amazon Web Services (AWS) provides on-demand cloud platforms and APIs through a pay-as-you-go-model to customers. AWS launched in 2002 providing general services and tools and produced its first cloud products in 2006. Today, more than 175 different cloud services for a variety of technologies and industries are released already. AWS ranks as one of the most popular public cloud infrastructure and platform services running applications worldwide in 2020, ahead of Microsoft Azure and Google cloud services. Cloud computing Cloud computing is essentially the delivery of online computing services to customers. As enterprises continually migrate their applications and data to the cloud instead of storing it on local machines, it becomes possible to access resources from different locations. Some of the key services of the AWS ecosystem for cloud applications include storage, database, security tools, and management tools. AWS is among the most popular cloud providers Some of the largest globally operating enterprises use AWS for their cloud services, including Netflix, BBC, and Baidu. Accordingly, AWS is one of the leading cloud providers in the global cloud market. Due to its continuously expanding portfolio of services and deepening of expertise, the company continues to be not only an important cloud service provider but also a business partner.
Dataset covidgilance signals
zenodo.org
bin, csv +3
Updated Sep 25, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Gaudinat Arnaud; Gaudinat Arnaud (2020). Dataset covidgilance signals [Dataset]. http://doi.org/10.5281/zenodo.4048460
Explore at:
csv, tsv, bin, text/x-python, txtAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.4048460
Dataset updated
Sep 25, 2020
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Gaudinat Arnaud; Gaudinat Arnaud
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Research datasets about top signals for covid 19 (coronavirus) for study into Google Trends (GT) and with SEO metrics

Website

The study is currently published on https://covidgilance.org website (in french)

Datasets description

covid signals -> |selection| -> 4 dataset -> |serp.py| -> 4 serp datasets -> |aggregate_serp.pl| -> 4 aggregated dataset of serp -> |prepare datasets| -> 4 ranked top seo dataset

Original lists of signals (mainly covid symptoms) - dataset

Description: contain the original relevant list of signals for covid19 (here list of queries where you can see, in GT, a relevant signal during the covid 19 period of time)
Name: covid_signal_list.tsv

List of content:

- id: unique id for the topic
- topic-fr: name of the topic in French
- topic-en: name of the topic in English
- topic-id: GT topic id
- keyword fr: one or several keywords in French for GT
- keyword en: one or several keywords in English for GT
- fr-topic-url-12M: link to 12-months French query topic in GT in France
- en-topic-url-12M: link to 12-months English query topic in GT in US
- fr-url-12M: link to 12-months French queries in GT in France
- en-url-12M: link to 12-months English queries topic in GT in US
- fr-topic-url-5M: link to 5-months French query topic in GT in France
- en-topic-url-5M: link to 5-months English query topic in GT in US
- fr-url-5M: link to 5-months French queries in GT in France
- en-url-5M: link to 5-months English queries topic in GT in US

Tool to get SERP of covid signals - tool

Description: query google with a list of covid signals and obtain a list of serps in csv (tsv in fact) file format
Name: serper.py

python serper.py

SERP files - datasets

Description Serp results for 4 datesets of queries Names: simple version of covid signals from google.ch in French: serp_signals_20_ch_fr.csv
simple version of covid signals from google.com in English: serp_signals_20_en.csv
amplified version of covid signals from google.ch in French: serp_signals_covid_20_ch_fr.csv
amplified version of covid signals from google.com in English: serp_signals_covid_20_en.csv

amplified version means that for each query we create two queries one with the keywords "covid" and one with "coronavirus"

Tool to aggregate SERP results - tool

Description: load csv serp data and aggregate the data to create a new csv file where each line is a website and each column is a query. Name: aggregate_serp.pl

`perl aggregate_serp.pl> aggregated_signals_20_en.csv

datasets of top website from the SERP results - dataset

Description a aggregated version of the SERP where each line is a website and each column a query
Names:
aggregated_signals_20_ch_fr.csv
aggregated_signals_20_en.csv
aggregated_signals_covid_20_ch_fr.csv
aggregated_signals_covid_20_en.csv

List of content:

- domain: domain name of the website
- signal 1: Position of the query 1 (signal 1) in the SERP where 30 indicates arbitrary that this website is not present in the SERP
- signal ...: Position of the query (signal) in the SERP where 30 indicates arbitrary that this website is not present in the SERP
- signal n: Position of the query n (signal n) in the SERP where 30 indicates arbitrary that this website is not present in the SERP
- total: average position (total of all position /divided by the number of queries)
- missing: Total number of missing results in the SERP for this website

datasets ranked top seo - dataset

Description a ranked (by weighted average position) version of the aggregated version of the SERP where each line is a website and each column a query. TOP 20 have more information about the type and HONcode validity (from the date of collect: September 2020)

Names:
ranked_signals_20_ch_fr.csv
ranked_signals_20_en.csv
ranked_signals_covid_20_ch_fr.csv
ranked_signals_covid_20_en.csv

List of content:

- domain: domain name of the website
- signal 1: Position of the query 1 (signal 1) in the SERP where 30 indicates arbitrary that this website is not present in the SERP
- signal ...: Position of the query (signal) in the SERP where 30 indicates arbitrary that this website is not present in the SERP
- signal n: Position of the query n (signal n) in the SERP where 30 indicates arbitrary that this website is not present in the SERP
- avg position: average position (total of all position /divided by the number of queries)
- nb missing: Total number of missing results in the SERP for this website
- % presence: % of presence
- weighted avg postion: combination of avg position and % of presence for final ranking
- honcode: status of the Honcode certificate for this website (none/valid/expired)
- type: type of the website (health, gov, edu or media)
f
DATS 6401 - Final Project - Yon ho Cheong.zip
figshare.com
zip
Updated Dec 15, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yon ho Cheong (2018). DATS 6401 - Final Project - Yon ho Cheong.zip [Dataset]. http://doi.org/10.6084/m9.figshare.7471007.v1
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.7471007.v1
Dataset updated
Dec 15, 2018
Dataset provided by
figshare
Authors
Yon ho Cheong
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
AbstractThe H1B is an employment-based visa category for temporary foreign workers in the United States. Every year, the US immigration department receives over 200,000 petitions and selects 85,000 applications through a random process and the U.S. employer must submit a petition for an H1B visa to the US immigration department. This is the most common visa status applied to international students once they complete college or higher education and begin working in a full-time position. The project provides essential information on job titles, preferred regions of settlement, foreign applicants and employers' trends for H1B visa application. According to locations, employers, job titles and salary range make up most of the H1B petitions, so different visualization utilizing tools will be used in order to analyze and interpreted in relation to the trends of the H1B visa to provide a recommendation to the applicant. This report is the base of the project for Visualization of Complex Data class at the George Washington University, some examples in this project has an analysis for the different relevant variables (Case Status, Employer Name, SOC name, Job Title, Prevailing Wage, Worksite, and Latitude and Longitude information) from Kaggle and Office of Foreign Labor Certification(OFLC) in order to see the H1B visa changes in the past several decades. Keywords: H1B visa, Data Analysis, Visualization of Complex Data, HTML, JavaScript, CSS, Tableau, D3.jsDatasetThe dataset contains 10 columns and covers a total of 3 million records spanning from 2011-2016. The relevant columns in the dataset include case status, employer name, SOC name, jobe title, full time position, prevailing wage, year, worksite, and latitude and longitude information.Link to dataset: https://www.kaggle.com/nsharan/h-1b-visaLink to dataset(FY2017): https://www.foreignlaborcert.doleta.gov/performancedata.cfmRunning the codeOpen Index.htmlData ProcessingDoing some data preprocessing to transform the raw data into an understandable format.Find and combine any other external datasets to enrich the analysis such as dataset of FY2017.To make appropriated Visualizations, variables should be Developed and compiled into visualization programs.Draw a geo map and scatter plot to compare the fastest growth in fixed value and in percentages.Extract some aspects and analyze the changes in employers’ preference as well as forecasts for the future trends.VisualizationsCombo chart: this chart shows the overall volume of receipts and approvals rate.Scatter plot: scatter plot shows the beneficiary country of birth.Geo map: this map shows All States of H1B petitions filed.Line chart: this chart shows top10 states of H1B petitions filed. Pie chart: this chart shows comparison of Education level and occupations for petitions FY2011 vs FY2017.Tree map: tree map shows overall top employers who submit the greatest number of applications.Side-by-side bar chart: this chart shows overall comparison of Data Scientist and Data Analyst.Highlight table: this table shows mean wage of a Data Scientist and Data Analyst with case status certified.Bubble chart: this chart shows top10 companies for Data Scientist and Data Analyst.Related ResearchThe H-1B Visa Debate, Explained - Harvard Business Reviewhttps://hbr.org/2017/05/the-h-1b-visa-debate-explainedForeign Labor Certification Data Centerhttps://www.foreignlaborcert.doleta.govKey facts about the U.S. H-1B visa programhttp://www.pewresearch.org/fact-tank/2017/04/27/key-facts-about-the-u-s-h-1b-visa-program/H1B visa News and Updates from The Economic Timeshttps://economictimes.indiatimes.com/topic/H1B-visa/newsH-1B visa - Wikipediahttps://en.wikipedia.org/wiki/H-1B_visaKey FindingsFrom the analysis, the government is cutting down the number of approvals for H1B on 2017.In the past decade, due to the nature of demand for high-skilled workers, visa holders have clustered in STEM fields and come mostly from countries in Asia such as China and India.Technical Jobs fill up the majority of Top 10 Jobs among foreign workers such as Computer Systems Analyst and Software Developers.The employers located in the metro areas thrive to find foreign workforce who can fill the technical position that they have in their organization.States like California, New York, Washington, New Jersey, Massachusetts, Illinois, and Texas are the prime location for foreign workers and provide many job opportunities. Top Companies such Infosys, Tata, IBM India that submit most H1B Visa Applications are companies based in India associated with software and IT services.Data Scientist position has experienced an exponential growth in terms of H1B visa applications and jobs are clustered in West region with the highest number.Visualization utilizing programsHTML, JavaScript, CSS, D3.js, Google API, Python, R, and Tableau
The Maestro Dataset v2
kaggle.com
zip
Updated Jun 14, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jack Vial (2020). The Maestro Dataset v2 [Dataset]. https://www.kaggle.com/datasets/jackvial/themaestrodatasetv2
Explore at:
zip(0 bytes)Available download formats
Dataset updated
Jun 14, 2020
Authors
Jack Vial
License
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Description
Note

I did not have any part in creating this dataset I am only uploading it here to make it easily available to others on Kaggle. More info about the dataset can be found here https://magenta.tensorflow.org/datasets/maestro

Wav -> mp3 Conversion

I had to convert the wav audio files to mp3 so the dataset would fit within Kaggle's 20gb limit, therefore all audio files have the extension .mp3 which is inconsistent with the .wav extensions in the .csv meta files.

Summary

MAESTRO (MIDI and Audio Edited for Synchronous Tracks and Organization) is a dataset composed of over 200 hours of virtuosic piano performances captured with fine alignment (~3 ms) between note labels and audio waveforms.

Dataset (from the Magenta site https://magenta.tensorflow.org/datasets/maestro )

We partnered with organizers of the International Piano-e-Competition for the raw data used in this dataset. During each installment of the competition virtuoso pianists perform on Yamaha Disklaviers which, in addition to being concert-quality acoustic grand pianos, utilize an integrated high-precision MIDI capture and playback system. Recorded MIDI data is of sufficient fidelity to allow the audition stage of the competition to be judged remotely by listening to contestant performances reproduced over the wire on another Disklavier instrument.

The dataset contains over 200 hours of paired audio and MIDI recordings from ten years of International Piano-e-Competition. The MIDI data includes key strike velocities and sustain/sostenuto/una corda pedal positions. Audio and MIDI files are aligned with ∼3 ms accuracy and sliced to individual musical pieces, which are annotated with composer, title, and year of performance. Uncompressed audio is of CD quality or higher (44.1–48 kHz 16-bit PCM stereo).

A train/validation/test split configuration is also proposed, so that the same composition, even if performed by multiple contestants, does not appear in multiple subsets. Repertoire is mostly classical, including composers from the 17th to early 20th century.

For more information about how the dataset was created and several applications of it, please see the paper where it was introduced: Enabling Factorized Piano Music Modeling and Generation with the MAESTRO Dataset.

For an example application of the dataset, see our blog post on Wave2Midi2Wave.

License

The dataset is made available by Google LLC under a Creative Commons Attribution Non-Commercial Share-Alike 4.0 (CC BY-NC-SA 4.0) license.

Acknowledgements

More info on the MAESTRO dataset https://magenta.tensorflow.org/datasets/maestro Enabling Factorized Piano Music Modeling and Generation with the MAESTRO Dataset https://arxiv.org/abs/1810.12247

Curtis Hawthorne, Andriy Stasyuk, Adam Roberts, Ian Simon, Cheng-Zhi Anna Huang, Sander Dieleman, Erich Elsen, Jesse Engel, and Douglas Eck. "Enabling Factorized Piano Music Modeling and Generation with the MAESTRO Dataset." In International Conference on Learning Representations, 2019.

Facebook

Twitter

Click to copy link

Link copied

Cite

(2018). How to make google plus posts private - Dataset - openAFRICA [Dataset]. https://open.africa/dataset/how-to-make-google-plus-posts-private

How to make google plus posts private - Dataset - openAFRICA

Explore at:

Dataset updated

Jan 4, 2018

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

so if you have to have a G+ account (for YouTube, location services, or other reasons) - here's how you can make it totally private! No one will be able to add you, send you spammy links, or otherwise annoy you. You need to visit the "Audience Settings" page - https://plus.google.com/u/0/settings/audience You can then set a "custom audience" - usually you would use this to restrict your account to people from a specific geographic location, or within a specific age range. In this case, we're going to choose a custom audience of "No-one" Check the box and hit save. Now, when people try to visit your Google+ profile - they'll see this "restricted" message. You can visit my G+ Profile if you want to see this working. (https://plus.google.com/114725651137252000986) If you are not able to understand you can follow this website : http://www.livehuntz.com/google-plus/support-phone-number

Clear search

Close search

Google apps

Main menu

How to make google plus posts private - Dataset - openAFRICA

About COVID-19 Public Datasets

Meta Kaggle Code

Explore our public notebook content!

Why we’re releasing this dataset

Sensitive data

Joining with Meta Kaggle

File organization

Questions / Comments

Google Trends - International

Google energy consumption 2011-2023

Google's Audioset: Reformatted

Google Analytics Sample

Context

Content

Acknowledgements

Inspiration

Outscraper Google Maps Scraper

Google Trends

Google Trends And Wikipedia Page Views

Company Datasets for Business Profiling

COVID-19 Community Mobility Report - Dataset - ADH Data Portal

Google GoEmotions Dataset

Columns

Distribution

Usage

Coverage

License

Who Can Use It

Dataset Name Suggestions

Attributes

Linked Open Data Management Services: A Comparison

COKI Language Dataset

Public electrophysiological datasets collected in the Buzsaki Lab

Amazon Web Services: year-on-year growth 2014-2025

Dataset covidgilance signals

DATS 6401 - Final Project - Yon ho Cheong.zip

The Maestro Dataset v2

Note

Wav -> mp3 Conversion

Summary

Dataset (from the Magenta site https://magenta.tensorflow.org/datasets/maestro )

License

Acknowledgements

How to make google plus posts private - Dataset - openAFRICASee More Versions

How to make google plus posts private - Dataset - openAFRICA