Making dataset "Google Analytics Customer Revenue Prediction" easier and quicker to parse.
This is the same information as dataset "Google Analytics Customer Revenue Prediction" with the JSON columns expanded (flattened) into additional csv columns.
Thanks to the original dataset "Google Analytics Customer Revenue Prediction"; it's safe to say that without you I could not exist as a more reduced space but equally as informative dataset.
Your data will be in front of the world's largest data science community. What questions do you want to see answered?
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
so if you have to have a G+ account (for YouTube, location services, or other reasons) - here's how you can make it totally private! No one will be able to add you, send you spammy links, or otherwise annoy you. You need to visit the "Audience Settings" page - https://plus.google.com/u/0/settings/audience You can then set a "custom audience" - usually you would use this to restrict your account to people from a specific geographic location, or within a specific age range. In this case, we're going to choose a custom audience of "No-one" Check the box and hit save. Now, when people try to visit your Google+ profile - they'll see this "restricted" message. You can visit my G+ Profile if you want to see this working. (https://plus.google.com/114725651137252000986) If you are not able to understand you can follow this website : http://www.livehuntz.com/google-plus/support-phone-number
Google Play Store dataset to explore detailed information about apps, including ratings, descriptions, updates, and developer details. Popular use cases include app performance analysis, market research, and consumer behavior insights.
Use our Google Play Store dataset to explore detailed information about apps available on the platform, including app titles, developers, monetization features, user ratings, reviews, and more. This dataset also includes data on app descriptions, safety measures, download counts, recent updates, and compatibility, providing a complete overview of app performance and features.
Tailored for app developers, marketers, and researchers, this dataset offers valuable insights into user preferences, app trends, and market dynamics. Whether you're optimizing app development, conducting competitive analysis, or tracking app performance, the Google Play Store dataset is an essential resource for making data-driven decisions in the mobile app ecosystem.
This dataset is ideal for a variety of applications:
CUSTOM Please review the respective licenses below: 1. Data Provider's License - Bright Data Master Service Agreement
~Up to $0.0025 per record. Min order $250
Approximately 10M new records are added each month. Approximately 13.8M records are updated each month. Get the complete dataset each delivery, including all records. Retrieve only the data you need with the flexibility to set Smart Updates.
New snapshot each month, 12 snapshots/year Paid monthly
New snapshot each quarter, 4 snapshots/year Paid quarterly
New snapshot every 6 months, 2 snapshots/year Paid twice-a-year
New snapshot one-time delivery Paid once
In an effort to help combat COVID-19, we created a COVID-19 Public Datasets program to make data more accessible to researchers, data scientists and analysts. The program will host a repository of public datasets that relate to the COVID-19 crisis and make them free to access and analyze. These include datasets from the New York Times, European Centre for Disease Prevention and Control, Google, Global Health Data from the World Bank, and OpenStreetMap. Free hosting and queries of COVID datasets As with all data in the Google Cloud Public Datasets Program , Google pays for storage of datasets in the program. BigQuery also provides free queries over certain COVID-related datasets to support the response to COVID-19. Queries on COVID datasets will not count against the BigQuery sandbox free tier , where you can query up to 1TB free each month. Limitations and duration Queries of COVID data are free. If, during your analysis, you join COVID datasets with non-COVID datasets, the bytes processed in the non-COVID datasets will be counted against the free tier, then charged accordingly, to prevent abuse. Queries of COVID datasets will remain free until Sept 15, 2021. The contents of these datasets are provided to the public strictly for educational and research purposes only. We are not onboarding or managing PHI or PII data as part of the COVID-19 Public Dataset Program. Google has practices & policies in place to ensure that data is handled in accordance with widely recognized patient privacy and data security policies. See the list of all datasets included in the program
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Meta Kaggle Code is an extension to our popular Meta Kaggle dataset. This extension contains all the raw source code from hundreds of thousands of public, Apache 2.0 licensed Python and R notebooks versions on Kaggle used to analyze Datasets, make submissions to Competitions, and more. This represents nearly a decade of data spanning a period of tremendous evolution in the ways ML work is done.
By collecting all of this code created by Kaggle’s community in one dataset, we hope to make it easier for the world to research and share insights about trends in our industry. With the growing significance of AI-assisted development, we expect this data can also be used to fine-tune models for ML-specific code generation tasks.
Meta Kaggle for Code is also a continuation of our commitment to open data and research. This new dataset is a companion to Meta Kaggle which we originally released in 2016. On top of Meta Kaggle, our community has shared nearly 1,000 public code examples. Research papers written using Meta Kaggle have examined how data scientists collaboratively solve problems, analyzed overfitting in machine learning competitions, compared discussions between Kaggle and Stack Overflow communities, and more.
The best part is Meta Kaggle enriches Meta Kaggle for Code. By joining the datasets together, you can easily understand which competitions code was run against, the progression tier of the code’s author, how many votes a notebook had, what kinds of comments it received, and much, much more. We hope the new potential for uncovering deep insights into how ML code is written feels just as limitless to you as it does to us!
While we have made an attempt to filter out notebooks containing potentially sensitive information published by Kaggle users, the dataset may still contain such information. Research, publications, applications, etc. relying on this data should only use or report on publicly available, non-sensitive information.
The files contained here are a subset of the KernelVersions
in Meta Kaggle. The file names match the ids in the KernelVersions
csv file. Whereas Meta Kaggle contains data for all interactive and commit sessions, Meta Kaggle Code contains only data for commit sessions.
The files are organized into a two-level directory structure. Each top level folder contains up to 1 million files, e.g. - folder 123 contains all versions from 123,000,000 to 123,999,999. Each sub folder contains up to 1 thousand files, e.g. - 123/456 contains all versions from 123,456,000 to 123,456,999. In practice, each folder will have many fewer than 1 thousand files due to private and interactive sessions.
The ipynb files in this dataset hosted on Kaggle do not contain the output cells. If the outputs are required, the full set of ipynbs with the outputs embedded can be obtained from this public GCS bucket: kaggle-meta-kaggle-code-downloads
. Note that this is a "requester pays" bucket. This means you will need a GCP account with billing enabled to download. Learn more here: https://cloud.google.com/storage/docs/requester-pays
We love feedback! Let us know in the Discussion tab.
Happy Kaggling!
The COVID-19 Search Trends symptoms dataset shows aggregated, anonymized trends in Google searches for a broad set of health symptoms, signs, and conditions. The dataset provides a daily or weekly time series for each region showing the relative volume of searches for each symptom. This dataset is intended to help researchers to better understand the impact of COVID-19. It shouldn't be used for medical diagnostic, prognostic, or treatment purposes. It also isn't intended to be used for guidance on personal travel plans. To learn more about the dataset, how we generate it and preserve privacy, read the data documentation . To visualize the data, try exploring these interactive charts and map of symptom search trends . As of Dec. 15, 2020, the dataset was expanded to include trends for Australia, Ireland, New Zealand, Singapore, and the United Kingdom. This expanded data is available in new tables that provide data at country and two subregional levels. We will not be updating existing state/county tables going forward. All bytes processed in queries against this dataset will be zeroed out, making this part of the query free. Data joined with the dataset will be billed at the normal rate to prevent abuse. After September 15, queries over these datasets will revert to the normal billing rate. This public dataset is hosted in Google BigQuery and is included in BigQuery's 1TB/mo of free tier processing. This means that each user receives 1TB of free BigQuery processing every month, which can be used to run queries on this public dataset. Watch this short video to learn how to get started quickly using BigQuery to access public datasets. What is BigQuery .
This is the second version of the Google Landmarks dataset (GLDv2), which contains images annotated with labels representing human-made and natural landmarks. The dataset can be used for landmark recognition and retrieval experiments. This version of the dataset contains approximately 5 million images, split into 3 sets of images: train, index and test
The Google Trends dataset will provide critical signals that individual users and businesses alike can leverage to make better data-driven decisions. This dataset simplifies the manual interaction with the existing Google Trends UI by automating and exposing anonymized, aggregated, and indexed search data in BigQuery. This dataset includes the Top 25 stories and Top 25 Rising queries from Google Trends. It will be made available as two separate BigQuery tables, with a set of new top terms appended daily. Each set of Top 25 and Top 25 rising expires after 30 days, and will be accompanied by a rolling five-year window of historical data in 210 distinct locations in the United States. This Google dataset is hosted in Google BigQuery as part of Google Cloud's Datasets solution and is included in BigQuery's 1TB/mo of free tier processing. This means that each user receives 1TB of free BigQuery processing every month, which can be used to run queries on this public dataset. Watch this short video to learn how to get started quickly using BigQuery to access public datasets. What is BigQuery
Company Datasets for valuable business insights!
Discover new business prospects, identify investment opportunities, track competitor performance, and streamline your sales efforts with comprehensive Company Datasets.
These datasets are sourced from top industry providers, ensuring you have access to high-quality information:
We provide fresh and ready-to-use company data, eliminating the need for complex scraping and parsing. Our data includes crucial details such as:
You can choose your preferred data delivery method, including various storage options, delivery frequency, and input/output formats.
Receive datasets in CSV, JSON, and other formats, with storage options like AWS S3 and Google Cloud Storage. Opt for one-time, monthly, quarterly, or bi-annual data delivery.
With Oxylabs Datasets, you can count on:
Pricing Options:
Standard Datasets: choose from various ready-to-use datasets with standardized data schemas, priced from $1,000/month.
Custom Datasets: Tailor datasets from any public web domain to your unique business needs. Contact our sales team for custom pricing.
Experience a seamless journey with Oxylabs:
Unlock the power of data with Oxylabs' Company Datasets and supercharge your business insights today!
Are you looking to identify B2B leads to promote your business, product, or service? Outscraper Google Maps Scraper might just be the tool you've been searching for. This powerful software enables you to extract business data directly from Google's extensive database, which spans millions of businesses across countless industries worldwide.
Outscraper Google Maps Scraper is a tool built with advanced technology that lets you scrape a myriad of valuable information about businesses from Google's database. This information includes but is not limited to, business names, addresses, contact information, website URLs, reviews, ratings, and operational hours.
Whether you are a small business trying to make a mark or a large enterprise exploring new territories, the data obtained from the Outscraper Google Maps Scraper can be a treasure trove. This tool provides a cost-effective, efficient, and accurate method to generate leads and gather market insights.
By using Outscraper, you'll gain a significant competitive edge as it allows you to analyze your market and find potential B2B leads with precision. You can use this data to understand your competitors' landscape, discover new markets, or enhance your customer database. The tool offers the flexibility to extract data based on specific parameters like business category or geographic location, helping you to target the most relevant leads for your business.
In a world that's growing increasingly data-driven, utilizing a tool like Outscraper Google Maps Scraper could be instrumental to your business' success. If you're looking to get ahead in your market and find B2B leads in a more efficient and precise manner, Outscraper is worth considering. It streamlines the data collection process, allowing you to focus on what truly matters – using the data to grow your business.
https://outscraper.com/google-maps-scraper/
As a result of the Google Maps scraping, your data file will contain the following details:
Query Name Site Type Subtypes Category Phone Full Address Borough Street City Postal Code State Us State Country Country Code Latitude Longitude Time Zone Plus Code Rating Reviews Reviews Link Reviews Per Scores Photos Count Photo Street View Working Hours Working Hours Old Format Popular Times Business Status About Range Posts Verified Owner ID Owner Title Owner Link Reservation Links Booking Appointment Link Menu Link Order Links Location Link Place ID Google ID Reviews ID
If you want to enrich your datasets with social media accounts and many more details you could combine Google Maps Scraper with Domain Contact Scraper.
Domain Contact Scraper can scrape these details:
Email Facebook Github Instagram Linkedin Phone Twitter Youtube
The International Google Trends dataset will provide critical signals that individual users and businesses alike can leverage to make better data-driven decisions. This dataset simplifies the manual interaction with the existing Google Trends UI by automating and exposing anonymized, aggregated, and indexed search data in BigQuery. This dataset includes the Top 25 stories and Top 25 Rising queries from Google Trends. It will be made available as two separate BigQuery tables, with a set of new top terms appended daily. Each set of Top 25 and Top 25 rising expires after 30 days, and will be accompanied by a rolling five-year window of historical data for each country and region across the globe, where data is available. This Google dataset is hosted in Google BigQuery as part of Google Cloud's Datasets solution and is included in BigQuery's 1TB/mo of free tier processing. This means that each user receives 1TB of free BigQuery processing every month, which can be used to run queries on this public dataset. Watch this short video to learn how to get started quickly using BigQuery to access public datasets. What is BigQuery
Abstract (our paper) The frequency of a web search keyword generally reflects the degree of public interest in a particular subject matter. Search logs are therefore useful resources for trend analysis. However, access to search logs is typically restricted to search engine providers. In this paper, we investigate whether search frequency can be estimated from a different resource such as Wikipedia page views of open data. We found frequently searched keywords to have remarkably high correlations with Wikipedia page views. This suggests that Wikipedia page views can be an effective tool for determining popular global web search trends. Data personal-name.txt.gz: The first column is the Wikipedia article id, the second column is the search keyword, the third column is the Wikipedia article title, and the fourth column is the total of page views from 2008 to 2014. personal-name_data_google-trends.txt.gz, personal-name_data_wikipedia.txt.gz: The first column is the period to be collected, the second column is the source (Google or Wikipedia), the third column is the Wikipedia article id, the fourth column is the search keyword, the fifth column is the date, and the sixth column is the value of search trend or page view. Publication This data set was created for our study. If you make use of this data set, please cite: Mitsuo Yoshida, Yuki Arase, Takaaki Tsunoda, Mikio Yamamoto. Wikipedia Page View Reflects Web Search Trend. Proceedings of the 2015 ACM Web Science Conference (WebSci '15). no.65, pp.1-2, 2015. http://dx.doi.org/10.1145/2786451.2786495 http://arxiv.org/abs/1509.02218 (author-created version) Note The raw data of Wikipedia page views is available in the following page. http://dumps.wikimedia.org/other/pagecounts-raw/ {"references": ["Mitsuo Yoshida, Yuki Arase, Takaaki Tsunoda, Mikio Yamamoto. Wikipedia Page View Reflects Web Search Trend. Proceedings of the 2015 ACM Web Science Conference (WebSci '15). no.65, pp.1-2, 2015.", "Mitsuo Yoshida, Yuki Arase, Takaaki Tsunoda, Mikio Yamamoto. Wikipedia Page View Analysis for Search Trend Prediction. Proceedings of the Annual Conference of Japanese Society for Artificial Intelligence (in Japanese). vol.29, no.2I1-1, pp.1-4, 2015."]}
Google’s energy consumption has increased over the last few years, reaching 25.9 terawatt hours in 2023, up from 12.8 terawatt hours in 2019. The company has made efforts to make its data centers more efficient through customized high-performance servers, using smart temperature and lighting, advanced cooling techniques, and machine learning. Datacenters and energy Through its operations, Google pursues a more sustainable impact on the environment by creating efficient data centers that use less energy than the average, transitioning towards renewable energy, creating sustainable workplaces, and providing its users with the technological means towards a cleaner future for the future generations. Through its efficient data centers, Google has also managed to divert waste from its operations away from landfills. Reducing Google’s carbon footprint Google’s clean energy efforts is also related to their efforts to reduce their carbon footprint. Since their commitment to using 100 percent renewable energy, the company has met their targets largely through solar and wind energy power purchase agreements and buying renewable power from utilities. Google is one of the largest corporate purchasers of renewable energy in the world.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Google's AudioSet consistently reformatted During my work with Google's AudioSet(https://research.google.com/audioset/index.html) I encountered some problems due to the fact that Weak (https://research.google.com/audioset/download.html) and Strong (https://research.google.com/audioset/download_strong.html) versions of the dataset used different csv formatting for the data, and that also labels used in the two datasets are different (https://github.com/audioset/ontology/issues/9) and also presented in files with different formatting. This dataset reformatting aims to unify the formats of the datasets so that it is possible to analyse them in the same pipelines, and also make the dataset files compatible with psds_eval, dcase_util and sed_eval Python packages used in Audio Processing. For better formatted documentation and source code of reformatting refer to https://github.com/bakhtos/GoogleAudioSetReformatted -Changes in dataset All files are converted to tab-separated `*.tsv` files (i.e. `csv` files with `\t` as a separator). All files have a header as the first line. -New fields and filenames Fields are renamed according to the following table, to be compatible with psds_eval: Old field -> New field YTID -> filename segment_id -> filename start_seconds -> onset start_time_seconds -> onset end_seconds -> offset end_time_seconds -> offset positive_labels -> event_label label -> event_label present -> present For class label files, `id` is now the name for the for `mid` label (e.g. `/m/09xor`) and `label` for the human-readable label (e.g. `Speech`). Index of label indicated for Weak dataset labels (`index` field in `class_labels_indices.csv`) is not used. Files are renamed according to the following table to ensure consisted naming of the form `audioset_[weak|strong]_[train|eval]_[balanced|unbalanced|posneg]*.tsv`: Old name -> New name balanced_train_segments.csv -> audioset_weak_train_balanced.tsv unbalanced_train_segments.csv -> audioset_weak_train_unbalanced.tsv eval_segments.csv -> audioset_weak_eval.tsv audioset_train_strong.tsv -> audioset_strong_train.tsv audioset_eval_strong.tsv -> audioset_strong_eval.tsv audioset_eval_strong_framed_posneg.tsv -> audioset_strong_eval_posneg.tsv class_labels_indices.csv -> class_labels.tsv (merged with mid_to_display_name.tsv) mid_to_display_name.tsv -> class_labels.tsv (merged with class_labels_indices.csv) -Strong dataset changes Only changes to the Strong dataset are renaming of fields and reordering of columns, so that both Weak and Strong version have `filename` and `event_label` as first two columns. -Weak dataset changes -- Labels are given one per line, instead of comma-separated and quoted list -- To make sure that `filename` format is the same as in Strong version, the following format change is made: The value of the `start_seconds` field is converted to milliseconds and appended to the `filename` with an underscore. Since all files in the dataset are assumed to be 10 seconds long, this unifies the format of `filename` with the Strong version and makes `end_seconds` also redundant. -Class labels changes Class labels from both datasets are merged into one file and given in alphabetical order of `id`s. Since same `id`s are present in both datasets, but sometimes with different human-readable labels, labels from Strong dataset overwrite those from Weak. It is possible to regenerate `class_labels.tsv` while giving priority to the Weak version of labels by calling `convert_labels(False)` from convert.py in the GitHub repository. -License Google's AudioSet was published in two stages - first the Weakly labelled data (Gemmeke, Jort F., et al. "Audio set: An ontology and human-labeled dataset for audio events." 2017 IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE, 2017.), then the strongly labelled data (Hershey, Shawn, et al. "The benefit of temporally-strong labels in audio event classification." ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2021.) Both the original dataset and this reworked version are licensed under [CC BY 4.0](https://creativecommons.org/licenses/by/4.0/)
Class labels come from the AudioSet Ontology, which is licensed under CC BY-SA 4.0.
Welcome to Apiscrapy, your ultimate destination for comprehensive location-based intelligence. As an AI-driven web scraping and automation platform, Apiscrapy excels in converting raw web data into polished, ready-to-use data APIs. With a unique capability to collect Google Address Data, Google Address API, Google Location API, Google Map, and Google Location Data with 100% accuracy, we redefine possibilities in location intelligence.
Key Features:
Unparalleled Data Variety: Apiscrapy offers a diverse range of address-related datasets, including Google Address Data and Google Location Data. Whether you seek B2B address data or detailed insights for various industries, we cover it all.
Integration with Google Address API: Seamlessly integrate our datasets with the powerful Google Address API. This collaboration ensures not just accessibility but a robust combination that amplifies the precision of your location-based insights.
Business Location Precision: Experience a new level of precision in business decision-making with our address data. Apiscrapy delivers accurate and up-to-date business locations, enhancing your strategic planning and expansion efforts.
Tailored B2B Marketing: Customize your B2B marketing strategies with precision using our detailed B2B address data. Target specific geographic areas, refine your approach, and maximize the impact of your marketing efforts.
Use Cases:
Location-Based Services: Companies use Google Address Data to provide location-based services such as navigation, local search, and location-aware advertisements.
Logistics and Transportation: Logistics companies utilize Google Address Data for route optimization, fleet management, and delivery tracking.
E-commerce: Online retailers integrate address autocomplete features powered by Google Address Data to simplify the checkout process and ensure accurate delivery addresses.
Real Estate: Real estate agents and property websites leverage Google Address Data to provide accurate property listings, neighborhood information, and proximity to amenities.
Urban Planning and Development: City planners and developers utilize Google Address Data to analyze population density, traffic patterns, and infrastructure needs for urban planning and development projects.
Market Analysis: Businesses use Google Address Data for market analysis, including identifying target demographics, analyzing competitor locations, and selecting optimal locations for new stores or offices.
Geographic Information Systems (GIS): GIS professionals use Google Address Data as a foundational layer for mapping and spatial analysis in fields such as environmental science, public health, and natural resource management.
Government Services: Government agencies utilize Google Address Data for census enumeration, voter registration, tax assessment, and planning public infrastructure projects.
Tourism and Hospitality: Travel agencies, hotels, and tourism websites incorporate Google Address Data to provide location-based recommendations, itinerary planning, and booking services for travelers.
Discover the difference with Apiscrapy – where accuracy meets diversity in address-related datasets, including Google Address Data, Google Address API, Google Location API, and more. Redefine your approach to location intelligence and make data-driven decisions with confidence. Revolutionize your business strategies today!
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
The Google Merchandise Store sells Google branded merchandise. The data is typical of what you would see for an ecommerce website.
The sample dataset contains Google Analytics 360 data from the Google Merchandise Store, a real ecommerce store. The Google Merchandise Store sells Google branded merchandise. The data is typical of what you would see for an ecommerce website. It includes the following kinds of information:
Traffic source data: information about where website visitors originate. This includes data about organic traffic, paid search traffic, display traffic, etc. Content data: information about the behavior of users on the site. This includes the URLs of pages that visitors look at, how they interact with content, etc. Transactional data: information about the transactions that occur on the Google Merchandise Store website.
Fork this kernel to get started.
Banner Photo by Edho Pratama from Unsplash.
What is the total number of transactions generated per device browser in July 2017?
The real bounce rate is defined as the percentage of visits with a single pageview. What was the real bounce rate per traffic source?
What was the average number of product pageviews for users who made a purchase in July 2017?
What was the average number of product pageviews for users who did not make a purchase in July 2017?
What was the average total transactions per user that made a purchase in July 2017?
What is the average amount of money spent per session in July 2017?
What is the sequence of pages viewed?
Open Database License (ODbL) v1.0https://www.opendatacommons.org/licenses/odbl/1.0/
License information was derived automatically
These datasets are important to genomics researchers because they characterize several aspects of what the scientific community has learned to date about human sequence variants. Making this human annotation data freely available in GCP will enable researchers to focus less on data movement and management tasks associated with procuring this data and instead make immediate use of the data to better understand the clinical relevance of particular variant such as disease causing or protective variants (ClinVar), search a catalog of SNPs that have been identified in the human genome (dbSNP), and discover how frequently a particular variant occurs across the human population (1000Genomes, ESP, ExAC, gnomAD) This human annotation dataset contains both a mirror of the original Variant Call Files (VCF) files from NCBI, NHLBI Exome Sequencing Project (ESP) and ensembl as Google Cloud Storage (GCS) objects. In addition, these human sequence variants have also been translated into a particular variant table format and made available in Google BigQuery giving researchers the ability to use cloud technology and code repositories such as the Verily Life Sciences Annotation Toolkit to perform analyses in parallel. This public dataset is hosted in Google BigQuery and is included in BigQuery's 1TB/mo of free tier processing. This means that each user receives 1TB of free BigQuery processing every month, which can be used to run queries on this public dataset. Watch this short video to learn how to get started quickly using BigQuery to access public datasets. What is BigQuery . This public dataset is hosted in Google Cloud Storage and available free to use. Use this quick start guide to quickly learn how to access public datasets on Google Cloud Storage.
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
If you use this dataset anywhere in your work, kindly cite as the below: L. Gupta, "Google Play Store Apps," Feb 2019. [Online]. Available: https://www.kaggle.com/lava18/google-play-store-apps
While many public datasets (on Kaggle and the like) provide Apple App Store data, there are not many counterpart datasets available for Google Play Store apps anywhere on the web. On digging deeper, I found out that iTunes App Store page deploys a nicely indexed appendix-like structure to allow for simple and easy web scraping. On the other hand, Google Play Store uses sophisticated modern-day techniques (like dynamic page load) using JQuery making scraping more challenging.
Each app (row) has values for catergory, rating, size, and more.
This information is scraped from the Google Play Store. This app information would not be available without it.
The Play Store apps data has enormous potential to drive app-making businesses to success. Actionable insights can be drawn for developers to work on and capture the Android market!
Open Data Commons Attribution License (ODC-By) v1.0https://www.opendatacommons.org/licenses/by/1.0/
License information was derived automatically
Covid19Kerala.info-Data is a consolidated multi-source open dataset of metadata from the COVID-19 outbreak in the Indian state of Kerala. It is created and maintained by volunteers of ‘Collective for Open Data Distribution-Keralam’ (CODD-K), a nonprofit consortium of individuals formed for the distribution and longevity of open-datasets. Covid19Kerala.info-Data covers a set of correlated temporal and spatial metadata of SARS-CoV-2 infections and prevention measures in Kerala. Static releases of this dataset snapshots are manually produced from a live database maintained as a set of publicly accessible Google sheets. This dataset is made available under the Open Data Commons Attribution License v1.0 (ODC-BY 1.0).
Schema and data package Datapackage with schema definition is accessible at https://codd-k.github.io/covid19kerala.info-data/datapackage.json. Provided datapackage and schema are based on Frictionless data Data Package specification.
Temporal and Spatial Coverage
This dataset covers COVID-19 outbreak and related data from the state of Kerala, India, from January 31, 2020 till the date of the publication of this snapshot. The dataset shall be maintained throughout the entirety of the COVID-19 outbreak.
The spatial coverage of the data lies within the geographical boundaries of the Kerala state which includes its 14 administrative subdivisions. The state is further divided into Local Self Governing (LSG) Bodies. Reference to this spatial information is included on appropriate data facets. Available spatial information on regions outside Kerala was mentioned, but it is limited as a reference to the possible origins of the infection clusters or movement of the individuals.
Longevity and Provenance
The dataset snapshot releases are published and maintained in a designated GitHub repository maintained by CODD-K team. Periodic snapshots from the live database will be released at regular intervals. The GitHub commit logs for the repository will be maintained as a record of provenance, and archived repository will be maintained at the end of the project lifecycle for the longevity of the dataset.
Data Stewardship
CODD-K expects all administrators, managers, and users of its datasets to manage, access, and utilize them in a manner that is consistent with the consortium’s need for security and confidentiality and relevant legal frameworks within all geographies, especially Kerala and India. As a responsible steward to maintain and make this dataset accessible— CODD-K absolves from all liabilities of the damages, if any caused by inaccuracies in the dataset.
License
This dataset is made available by the CODD-K consortium under ODC-BY 1.0 license. The Open Data Commons Attribution License (ODC-By) v1.0 ensures that users of this dataset are free to copy, distribute and use the dataset to produce works and even to modify, transform and build upon the database, as long as they attribute the public use of the database or works produced from the same, as mentioned in the citation below.
Disclaimer
Covid19Kerala.info-Data is provided under the ODC-BY 1.0 license as-is. Though every attempt is taken to ensure that the data is error-free and up to date, the CODD-K consortium do not bear any responsibilities for inaccuracies in the dataset or any losses—monetary or otherwise—that users of this dataset may incur.
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
NYC Open Data is an opportunity to engage New Yorkers in the information that is produced and used by City government. We believe that every New Yorker can benefit from Open Data, and Open Data can benefit from every New Yorker. Source: https://opendata.cityofnewyork.us/overview/
Thanks to NYC Open Data, which makes public data generated by city agencies available for public use, and Citi Bike, we've incorporated over 150 GB of data in 5 open datasets into Google BigQuery Public Datasets, including:
Over 8 million 311 service requests from 2012-2016
More than 1 million motor vehicle collisions 2012-present
Citi Bike stations and 30 million Citi Bike trips 2013-present
Over 1 billion Yellow and Green Taxi rides from 2009-present
Over 500,000 sidewalk trees surveyed decennially in 1995, 2005, and 2015
This dataset is deprecated and not being updated.
Fork this kernel to get started with this dataset.
https://opendata.cityofnewyork.us/
This dataset is publicly available for anyone to use under the following terms provided by the Dataset Source - https://data.cityofnewyork.us/ - and is provided "AS IS" without any warranty, express or implied, from Google. Google disclaims all liability for any damages, direct or indirect, resulting from the use of the dataset.
By accessing datasets and feeds available through NYC Open Data, the user agrees to all of the Terms of Use of NYC.gov as well as the Privacy Policy for NYC.gov. The user also agrees to any additional terms of use defined by the agencies, bureaus, and offices providing data. Public data sets made available on NYC Open Data are provided for informational purposes. The City does not warranty the completeness, accuracy, content, or fitness for any particular purpose or use of any public data set made available on NYC Open Data, nor are any such warranties to be implied or inferred with respect to the public data sets furnished therein.
The City is not liable for any deficiencies in the completeness, accuracy, content, or fitness for any particular purpose or use of any public data set, or application utilizing such data set, provided by any third party.
Banner Photo by @bicadmedia from Unplash.
On which New York City streets are you most likely to find a loud party?
Can you find the Virginia Pines in New York City?
Where was the only collision caused by an animal that injured a cyclist?
What’s the Citi Bike record for the Longest Distance in the Shortest Time (on a route with at least 100 rides)?
https://cloud.google.com/blog/big-data/2017/01/images/148467900588042/nyc-dataset-6.png" alt="enter image description here">
https://cloud.google.com/blog/big-data/2017/01/images/148467900588042/nyc-dataset-6.png
Making dataset "Google Analytics Customer Revenue Prediction" easier and quicker to parse.
This is the same information as dataset "Google Analytics Customer Revenue Prediction" with the JSON columns expanded (flattened) into additional csv columns.
Thanks to the original dataset "Google Analytics Customer Revenue Prediction"; it's safe to say that without you I could not exist as a more reduced space but equally as informative dataset.
Your data will be in front of the world's largest data science community. What questions do you want to see answered?