29 datasets found

Google Trends
console.cloud.google.com
Updated May 15, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
https://console.cloud.google.com/marketplace/browse?filter=partner:BigQuery%20Public%20Datasets%20Program&hl=it (2022). Google Trends [Dataset]. https://console.cloud.google.com/marketplace/product/bigquery-public-datasets/google-search-trends?hl=it
Explore at:
Dataset updated
May 15, 2022
Dataset provided by
BigQueryhttps://cloud.google.com/bigquery
Google Searchhttp://google.com/
Googlehttp://google.com/
Description
The Google Trends dataset will provide critical signals that individual users and businesses alike can leverage to make better data-driven decisions. This dataset simplifies the manual interaction with the existing Google Trends UI by automating and exposing anonymized, aggregated, and indexed search data in BigQuery. This dataset includes the Top 25 stories and Top 25 Rising queries from Google Trends. It will be made available as two separate BigQuery tables, with a set of new top terms appended daily. Each set of Top 25 and Top 25 rising expires after 30 days, and will be accompanied by a rolling five-year window of historical data in 210 distinct locations in the United States. This Google dataset is hosted in Google BigQuery as part of Google Cloud's Datasets solution and is included in BigQuery's 1TB/mo of free tier processing. This means that each user receives 1TB of free BigQuery processing every month, which can be used to run queries on this public dataset. Watch this short video to learn how to get started quickly using BigQuery to access public datasets. What is BigQuery
U
United States Google Search Trends: Government Measures: Government Subsidy
ceicdata.com
Updated Mar 6, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
CEICdata.com (2025). United States Google Search Trends: Government Measures: Government Subsidy [Dataset]. https://www.ceicdata.com/en/united-states/google-search-trends-by-categories/google-search-trends-government-measures-government-subsidy
Explore at:
Dataset updated
Mar 6, 2025
Dataset provided by
CEICdata.com
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
Feb 23, 2025 - Mar 6, 2025
Area covered
United States
Description
United States Google Search Trends: Government Measures: Government Subsidy data was reported at 0.000 Score in 14 May 2025. This stayed constant from the previous number of 0.000 Score for 13 May 2025. United States Google Search Trends: Government Measures: Government Subsidy data is updated daily, averaging 0.000 Score from Dec 2021 (Median) to 14 May 2025, with 1261 observations. The data reached an all-time high of 0.000 Score in 14 May 2025 and a record low of 0.000 Score in 14 May 2025. United States Google Search Trends: Government Measures: Government Subsidy data remains active status in CEIC and is reported by Google Trends. The data is categorized under Global Database’s United States – Table US.Google.GT: Google Search Trends: by Categories.
f
Table1_Reliability of Google Trends: Analysis of the Limits and Potential of...
frontiersin.figshare.com
docx
Updated May 30, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Alessandro Rovetta (2023). Table1_Reliability of Google Trends: Analysis of the Limits and Potential of Web Infoveillance During COVID-19 Pandemic and for Future Research.DOCX [Dataset]. http://doi.org/10.3389/frma.2021.670226.s001
Explore at:
docxAvailable download formats
Unique identifier
https://doi.org/10.3389/frma.2021.670226.s001
Dataset updated
May 30, 2023
Dataset provided by
Frontiers
Authors
Alessandro Rovetta
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Background: Alongside the COVID-19 pandemic, government authorities around the world have had to face a growing infodemic capable of causing serious damages to public health and economy. In this context, the use of infoveillance tools has become a primary necessity.Objective: The aim of this study is to test the reliability of a widely used infoveillance tool which is Google Trends. In particular, the paper focuses on the analysis of relative search volumes (RSVs) quantifying their dependence on the day they are collected.Methods: RSVs of the query coronavirus + covid during February 1—December 4, 2020 (period 1), and February 20—May 18, 2020 (period 2), were collected daily by Google Trends from December 8 to 27, 2020. The survey covered Italian regions and cities, and countries and cities worldwide. The search category was set to all categories. Each dataset was analyzed to observe any dependencies of RSVs from the day they were gathered. To do this, by calling i the country, region, or city under investigation and j the day its RSV was collected, a Gaussian distribution Xi=X(σi,x¯i) was used to represent the trend of daily variations of xij=RSVsij. When a missing value was revealed (anomaly), the affected country, region or city was excluded from the analysis. When the anomalies exceeded 20% of the sample size, the whole sample was excluded from the statistical analysis. Pearson and Spearman correlations between RSVs and the number of COVID-19 cases were calculated day by day thus to highlight any variations related to the day RSVs were collected. Welch’s t-test was used to assess the statistical significance of the differences between the average RSVs of the various countries, regions, or cities of a given dataset. Two RSVs were considered statistical confident when t
Web robot detection - Server logs
zenodo.org
data.niaid.nih.gov
csv, json
Updated Jan 4, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Athanasios Lagopoulos; Athanasios Lagopoulos; Grigorios Tsoumakas; Grigorios Tsoumakas (2021). Web robot detection - Server logs [Dataset]. http://doi.org/10.5281/zenodo.3477932
Explore at:
csv, jsonAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.3477932
Dataset updated
Jan 4, 2021
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Athanasios Lagopoulos; Athanasios Lagopoulos; Grigorios Tsoumakas; Grigorios Tsoumakas
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset contains server logs from the search engine of the library and information center of the Aristotle University of Thessaloniki in Greece (http://search.lib.auth.gr/). The search engine enables users to check the availability of books and other written works, and search for digitized material and scientific publications. The server logs obtained span an entire month, from March 1st to March 31 2018 and consist of 4,091,155 requests with an average of 131,973 requests per day and a standard deviation of 36,996.7 requests. In total, there are requests from 27,061 unique IP addresses and 3,441 unique user-agent strings. The server logs are in JSON format and they are anonymized by masking the last 6 digits of the IP address and by hashing the last part of the URLs requested (after last /). The dataset also contains the processed form of the server logs as a labelled dataset of log entries grouped into sessions along with their extracted features (simple semantic features). We make this dataset publicly available, the first one in this domain, in order to provide a common ground for testing web robot detection methods, as well as other methods that analyze server logs.
d
Factori Machine Learning (ML) Data | 247 Countries Coverage | 5.2 B Event...
datarade.ai
.csv
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Factori, Factori Machine Learning (ML) Data | 247 Countries Coverage | 5.2 B Event per Day [Dataset]. https://datarade.ai/data-products/factori-ai-ml-training-data-web-data-machine-learning-d-factori
Explore at:
.csvAvailable download formats
Dataset authored and provided by
Factori
Area covered
Taiwan, Turks and Caicos Islands, Cameroon, Japan, Palestine, Uzbekistan, Egypt, Faroe Islands, Austria, Sweden
Description
Factori's AI & ML training data is thoroughly tested and reviewed to ensure that what you receive on your end is of the best quality.

Integrate the comprehensive AI & ML training data provided by Grepsr and develop a superior AI & ML model.

Whether you're training algorithms for natural language processing, sentiment analysis, or any other AI application, we can deliver comprehensive datasets tailored to fuel your machine learning initiatives.

Enhanced Data Quality: We have rigorous data validation processes and also conduct quality assurance checks to guarantee the integrity and reliability of the training data for you to develop the AI & ML models.

Gain a competitive edge, drive innovation, and unlock new opportunities by leveraging the power of tailored Artificial Intelligence and Machine Learning training data with Factori.

We offer web activity data of users that are browsing popular websites around the world. This data can be used to analyze web behavior across the web and build highly accurate audience segments based on web activity for targeting ads based on interest categories and search/browsing intent.

Web Data Reach: Our reach data represents the total number of data counts available within various categories and comprises attributes such as Country, Anonymous ID, IP addresses, Search Query, and so on.

Data Export Methodology: Since we collect data dynamically, we provide the most updated data and insights via a best-suited method at a suitable interval (daily/weekly/monthly).

Data Attributes: Anonymous_id IDType Timestamp Estid Ip userAgent browserFamily deviceType Os Url_metadata_canonical_url Url_metadata_raw_query_params refDomain mappedEvent Channel searchQuery Ttd_id Adnxs_id Keywords Categories Entities Concepts
d
Datasys | Clickstream Data | Categorized Search Behavior (500M+ daily events...
datarade.ai
.json
Updated May 12, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Datasys (2022). Datasys | Clickstream Data | Categorized Search Behavior (500M+ daily events | organized by vertical) [Dataset]. https://datarade.ai/data-products/datasys-clickstream-data-categorized-search-behavior-500-datasys
Explore at:
.jsonAvailable download formats
Dataset updated
May 12, 2022
Dataset authored and provided by
Datasys
Area covered
Pakistan, Aruba, Japan, Canada, Bahrain, Greenland, Paraguay, Chile, Saint Lucia, Dominica
Description
Datasys Categorized Search Behavior organizes millions of daily searches into industry-based categories like retail, finance, travel, and technology. By grouping raw search queries into verticals, this dataset makes it easy to monitor demand shifts, compare interest across sectors, and build targeted audience profiles for digital campaigns.
C
China Google Search Trends: Online Shopping: Tmall
ceicdata.com
Updated Mar 18, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
CEICdata.com (2025). China Google Search Trends: Online Shopping: Tmall [Dataset]. https://www.ceicdata.com/en/china/google-search-trends-by-categories/google-search-trends-online-shopping-tmall
Explore at:
Dataset updated
Mar 18, 2025
Dataset provided by
CEICdata.com
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
Mar 7, 2025 - Mar 18, 2025
Area covered
China
Description
China Google Search Trends: Online Shopping: Tmall data was reported at 8.000 Score in 14 May 2025. This stayed constant from the previous number of 8.000 Score for 13 May 2025. China Google Search Trends: Online Shopping: Tmall data is updated daily, averaging 0.000 Score from Dec 2021 (Median) to 14 May 2025, with 1261 observations. The data reached an all-time high of 70.000 Score in 22 Jan 2023 and a record low of 0.000 Score in 02 May 2025. China Google Search Trends: Online Shopping: Tmall data remains active status in CEIC and is reported by Google Trends. The data is categorized under Global Database’s China – Table CN.Google.GT: Google Search Trends: by Categories.
f
Web Search Queries Can Predict Stock Market Volumes
figshare.com
pdf
Updated Jun 1, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ilaria Bordino; Stefano Battiston; Guido Caldarelli; Matthieu Cristelli; Antti Ukkonen; Ingmar Weber (2023). Web Search Queries Can Predict Stock Market Volumes [Dataset]. http://doi.org/10.1371/journal.pone.0040014
Explore at:
pdfAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0040014
Dataset updated
Jun 1, 2023
Dataset provided by
PLOS ONE
Authors
Ilaria Bordino; Stefano Battiston; Guido Caldarelli; Matthieu Cristelli; Antti Ukkonen; Ingmar Weber
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
We live in a computerized and networked society where many of our actions leave a digital trace and affect other people’s actions. This has lead to the emergence of a new data-driven research field: mathematical methods of computer science, statistical physics and sociometry provide insights on a wide range of disciplines ranging from social science to human mobility. A recent important discovery is that search engine traffic (i.e., the number of requests submitted by users to search engines on the www) can be used to track and, in some cases, to anticipate the dynamics of social phenomena. Successful examples include unemployment levels, car and home sales, and epidemics spreading. Few recent works applied this approach to stock prices and market sentiment. However, it remains unclear if trends in financial markets can be anticipated by the collective wisdom of on-line users on the web. Here we show that daily trading volumes of stocks traded in NASDAQ-100 are correlated with daily volumes of queries related to the same stocks. In particular, query volumes anticipate in many cases peaks of trading by one day or more. Our analysis is carried out on a unique dataset of queries, submitted to an important web search engine, which enable us to investigate also the user behavior. We show that the query volume dynamics emerges from the collective but seemingly uncoordinated activity of many users. These findings contribute to the debate on the identification of early warnings of financial systemic risk, based on the activity of users of the www.
COVID-19 Search Trends symptoms dataset
console.cloud.google.com
Updated Jan 5, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
https://console.cloud.google.com/marketplace/browse?filter=partner:BigQuery%20Public%20Datasets%20Program&hl=es (2023). COVID-19 Search Trends symptoms dataset [Dataset]. https://console.cloud.google.com/marketplace/product/bigquery-public-datasets/covid19-search-trends?hl=es
Explore at:
Dataset updated
Jan 5, 2023
Dataset provided by
BigQueryhttps://cloud.google.com/bigquery
Googlehttp://google.com/
Description
The COVID-19 Search Trends symptoms dataset shows aggregated, anonymized trends in Google searches for a broad set of health symptoms, signs, and conditions. The dataset provides a daily or weekly time series for each region showing the relative volume of searches for each symptom. This dataset is intended to help researchers to better understand the impact of COVID-19. It shouldn't be used for medical diagnostic, prognostic, or treatment purposes. It also isn't intended to be used for guidance on personal travel plans. To learn more about the dataset, how we generate it and preserve privacy, read the data documentation . To visualize the data, try exploring these interactive charts and map of symptom search trends . As of Dec. 15, 2020, the dataset was expanded to include trends for Australia, Ireland, New Zealand, Singapore, and the United Kingdom. This expanded data is available in new tables that provide data at country and two subregional levels. We will not be updating existing state/county tables going forward. All bytes processed in queries against this dataset will be zeroed out, making this part of the query free. Data joined with the dataset will be billed at the normal rate to prevent abuse. After September 15, queries over these datasets will revert to the normal billing rate. This public dataset is hosted in Google BigQuery and is included in BigQuery's 1TB/mo of free tier processing. This means that each user receives 1TB of free BigQuery processing every month, which can be used to run queries on this public dataset. Watch this short video to learn how to get started quickly using BigQuery to access public datasets. What is BigQuery .
d
Datasys | Clickstream Data (500M+ daily events | global coverage | updated...
datarade.ai
.json
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Datasys, Datasys | Clickstream Data (500M+ daily events | global coverage | updated daily) [Dataset]. https://datarade.ai/data-products/datastream-clickstream-browser-data-feed-datasys
Explore at:
.jsonAvailable download formats
Dataset authored and provided by
Datasys
Area covered
Argentina, United States of America, Kyrgyzstan, Cuba, Aruba, Guadeloupe, Vietnam, Mongolia, Malaysia, Cambodia
Description
Our clickstream data offers unparalleled access to a vast array of global datasets, capturing user interactions across websites, apps, and digital platforms worldwide. With coverage spanning multiple industries and geographies, our data provides detailed insights into consumer behavior, online trends, and digital engagement patterns.

Whether you're analyzing traffic flows, identifying audience interests, or tracking competitive performance, our clickstream datasets deliver the scale and granularity needed to inform strategic decisions. Updated regularly to ensure accuracy and relevance, this robust resource empowers businesses to uncover actionable insights and stay ahead in a dynamic digital landscape.
g
Alexa, International Top 100 Websites, Global, 10.12.2007
geocommons.com
Updated Apr 29, 2008
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Alexa (2008). Alexa, International Top 100 Websites, Global, 10.12.2007 [Dataset]. http://geocommons.com/search.html
Explore at:
Dataset updated
Apr 29, 2008
Dataset provided by
data
Alexa
Description
This Dataset shows the Alexa Top 100 International Websites, and provides metrics on the volume of traffic that these sites were able to handle. The Alexa top 100 lists the 100 most visited websites in the world and measures various statistical information. I have looked up the Headquarters, either through alexa, or a Whois Lookup to get street address with i was then able to geocode. I was only able to successfully geocode 85 of the top 100 sites throughout the world. Source of Data was Alexa.com, Source URL: http://www.alexa.com/site/ds/top_sites?ts_mode=global&lang=none Data was from October 12, 2007. Alexa is updated daily so to get more up to date information visit their site directly. they don't have maps though.
d
Datasys | Clickstream Data (500M+ daily events | global coverage | updated...
data.datasys.com
Updated Sep 11, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Datasys (2025). Datasys | Clickstream Data (500M+ daily events | global coverage | updated daily) [Dataset]. https://data.datasys.com/products/datastream-clickstream-browser-data-feed-datasys
Explore at:
Dataset updated
Sep 11, 2025
Dataset authored and provided by
Datasys
Area covered
Vietnam, French Guiana, Haiti, Malaysia, Saudi Arabia, Saint Pierre and Miquelon, Sint Eustatius and Saba, Virgin Islands, Macao, Cambodia
Description
Datasys has developed one of the most extensive clickstream data sets available. By merging numerous global sources into a single master feed, it covers diverse categories such as search, shopping, and website visits.
m
Dataplex: Reddit Data | Global Social Media Data | 2.1M+ subreddits: trends,...
dataplex.mydatastorefront.com
Updated Aug 19, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dataplex (2024). Dataplex: Reddit Data | Global Social Media Data | 2.1M+ subreddits: trends, audience insights + more | Ideal for Interest-Based Segmentation [Dataset]. https://dataplex.mydatastorefront.com/products/dataplex-reddit-data-global-social-media-data-1-1m-mill-dataplex
Explore at:
Dataset updated
Aug 19, 2024
Dataset authored and provided by
Dataplex
Area covered
Norway, Brazil, Palau, Turks and Caicos Islands, South Korea, The Democratic Republic of the, Vietnam, Lithuania, Mali, Tuvalu
Description
The Reddit data dataset offers social media data tracking 2.1+ million subreddits with daily subscriber counts since January 2023. We also leverage AI to append subreddit attributes, allowing you to categorize and find subreddits by subject.
n
MODIS/Terra Vegetation Indices Daily Rolling-8-Day L3 Global 500m SIN Grid...
earthdata.nasa.gov
datasets.ai
+4more
Updated Feb 7, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
LANCEMODIS (2021). MODIS/Terra Vegetation Indices Daily Rolling-8-Day L3 Global 500m SIN Grid NRT [Dataset]. http://doi.org/10.5067/MODIS/MOD13A4N.NRT.061
Explore at:
Unique identifier
https://doi.org/10.5067/MODIS/MOD13A4N.NRT.061
Dataset updated
Feb 7, 2021
Dataset authored and provided by
LANCEMODIS
Description
The MODIS level-3 Vegetation Indices Daily Rolling-8-Day Near Real Time (NRT), MOD13A4N data are provided everyday at 500-meter spatial resolution as a gridded level-3 product in the Sinusoidal projection. Vegetation indices are used for global monitoring of vegetation conditions and are used in products displaying land cover and land cover changes. These data may be used as input for modeling global biogeochemical and hydrologic processes and global and regional climate. These data also may be used for characterizing land surface biophysical properties and processes including primary production and land cover conversion.

Note: This is a near real-time product only. Standard historical data and imagery for MOD13Q4N (250m) and MOD13A4N (500m) are not available. Users can either use the NDVI standard products from LAADS web (https://ladsweb.modaps.eosdis.nasa.gov/search/) or access the science quality MxD09[A1/Q1] data and create the NDVI product of their own.
Lead Scoring Dataset
kaggle.com
zip
Updated Aug 17, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Amrita Chatterjee (2020). Lead Scoring Dataset [Dataset]. https://www.kaggle.com/amritachatterjee09/lead-scoring-dataset
Explore at:
zip(411028 bytes)Available download formats
Dataset updated
Aug 17, 2020
Authors
Amrita Chatterjee
Description
Context

An education company named X Education sells online courses to industry professionals. On any given day, many professionals who are interested in the courses land on their website and browse for courses.

The company markets its courses on several websites and search engines like Google. Once these people land on the website, they might browse the courses or fill up a form for the course or watch some videos. When these people fill up a form providing their email address or phone number, they are classified to be a lead. Moreover, the company also gets leads through past referrals. Once these leads are acquired, employees from the sales team start making calls, writing emails, etc. Through this process, some of the leads get converted while most do not. The typical lead conversion rate at X education is around 30%.

Now, although X Education gets a lot of leads, its lead conversion rate is very poor. For example, if, say, they acquire 100 leads in a day, only about 30 of them are converted. To make this process more efficient, the company wishes to identify the most potential leads, also known as ‘Hot Leads’. If they successfully identify this set of leads, the lead conversion rate should go up as the sales team will now be focusing more on communicating with the potential leads rather than making calls to everyone.

There are a lot of leads generated in the initial stage (top) but only a few of them come out as paying customers from the bottom. In the middle stage, you need to nurture the potential leads well (i.e. educating the leads about the product, constantly communicating, etc. ) in order to get a higher lead conversion.

X Education wants to select the most promising leads, i.e. the leads that are most likely to convert into paying customers. The company requires you to build a model wherein you need to assign a lead score to each of the leads such that the customers with higher lead score h have a higher conversion chance and the customers with lower lead score have a lower conversion chance. The CEO, in particular, has given a ballpark of the target lead conversion rate to be around 80%.

Content

Variables Description * Prospect ID - A unique ID with which the customer is identified. * Lead Number - A lead number assigned to each lead procured. * Lead Origin - The origin identifier with which the customer was identified to be a lead. Includes API, Landing Page Submission, etc. * Lead Source - The source of the lead. Includes Google, Organic Search, Olark Chat, etc. * Do Not Email -An indicator variable selected by the customer wherein they select whether of not they want to be emailed about the course or not. * Do Not Call - An indicator variable selected by the customer wherein they select whether of not they want to be called about the course or not. * Converted - The target variable. Indicates whether a lead has been successfully converted or not. * TotalVisits - The total number of visits made by the customer on the website. * Total Time Spent on Website - The total time spent by the customer on the website. * Page Views Per Visit - Average number of pages on the website viewed during the visits. * Last Activity - Last activity performed by the customer. Includes Email Opened, Olark Chat Conversation, etc. * Country - The country of the customer. * Specialization - The industry domain in which the customer worked before. Includes the level 'Select Specialization' which means the customer had not selected this option while filling the form. * How did you hear about X Education - The source from which the customer heard about X Education. * What is your current occupation - Indicates whether the customer is a student, umemployed or employed. * What matters most to you in choosing this course An option selected by the customer - indicating what is their main motto behind doing this course. * Search - Indicating whether the customer had seen the ad in any of the listed items. * Magazine
* Newspaper Article * X Education Forums
* Newspaper * Digital Advertisement * Through Recommendations - Indicates whether the customer came in through recommendations. * Receive More Updates About Our Courses - Indicates whether the customer chose to receive more updates about the courses. * Tags - Tags assigned to customers indicating the current status of the lead. * Lead Quality - Indicates the quality of lead based on the data and intuition the employee who has been assigned to the lead. * Update me on Supply Chain Content - Indicates whether the customer wants updates on the Supply Chain Content. * Get updates on DM Content - Indicates whether the customer wants updates on the DM Content. * Lead Profile - A lead level assigned to each customer based on their profile. * City - The city of the customer. * Asymmetric Activity Index - An index and score assigned to each customer based on their activity and their profile * Asymmetric Profile Index * Asymmetric Activity Score * Asymmetric Profile Score
* I agree to pay the amount through cheque - Indicates whether the customer has agreed to pay the amount through cheque or not. * a free copy of Mastering The Interview - Indicates whether the customer wants a free copy of 'Mastering the Interview' or not. * Last Notable Activity - The last notable activity performed by the student.

Acknowledgements

UpGrad Case Study

Inspiration

Your data will be in front of the world's largest data science community. What questions do you want to see answered?
f
An Archive of #DH2016 Tweets Published on Thursday 14 July 2016 GMT
city.figshare.com
html
Updated May 31, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ernesto Priego (2023). An Archive of #DH2016 Tweets Published on Thursday 14 July 2016 GMT [Dataset]. http://doi.org/10.6084/m9.figshare.3487103.v1
Explore at:
htmlAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.3487103.v1
Dataset updated
May 31, 2023
Dataset provided by
City, University of London
Authors
Ernesto Priego
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
BackgroundThe Digital Humanities 2016 conference is taking/took place in Kraków, Poland, between Sunday 11 July and Saturday 16 July 2016. #DH2016 is/was the conference official hashtag.What This Output IsThis is a CSV file containing a total of 3717 Tweets publicly published with the hashtag #DH2016 on Thursday 14 July 2016 GMT.The

archive starts with a Tweet published on Thursday July 14 2016 at 00:01:04 +0000 and ends with a Tweet published on Thursday July 14 2016 at 23:49:14 +0000 (GMT). Previous days have been shared on a different output. A breakdown of Tweets per day so far:Sunday 10 July 2016: 179 TweetsMonday 11 July 2016: 981 TweetsTuesday 12 July 2016: 2318 TweetsWednesday 13 July 2016: 4175 TweetsThursday 14 July 2016: 3717 Tweets Methodology and LimitationsThe Tweets contained in this file were collected by Ernesto Priego using Martin Hawksey's TAGS 6.0. Only users with at least 1 follower were included in the archive. Retweets have been included (Retweets count as Tweets). The collection spreadsheet was customised to reflect the time zone and geographical location of the conference.The profile_image_url and entities_str metadata were removed before public sharing in this archive. Please bear in mind that the conference hashtag has been spammed so some Tweets colllected may be from spam accounts. Some automated refining has been performed to remove Tweets not related to the conference but the data is likely to require further refining and deduplication. Both research and experience show that the Twitter search API is not 100% reliable. Large Tweet volumes affect the search collection process. The API might "over-represent the more central users", not offering "an accurate picture of peripheral activity" (Gonzalez-Bailon, Sandra, et al. 2012).Apart from the filters and limitations already declared, it cannot be guaranteed that this file contains each and every Tweet tagged with #dh2016 during the indicated period, and the dataset is shared for archival, comparative and indicative educational research purposes only.Only content from public accounts is included and was obtained from the Twitter Search API. The shared data is also publicly available to all Twitter users via the Twitter Search API and available to anyone with an Internet connection via the Twitter and Twitter Search web client and mobile apps without the need of a Twitter account.Each Tweet and its contents were published openly on the Web with the queried hashtag and are responsibility of the original authors. Original Tweets are likely to be copyright their individual authors but please check individually. No private personal information is shared in this dataset. The collection and sharing of this dataset is enabled and allowed by Twitter's Privacy Policy. The sharing of this dataset complies with Twitter's Developer Rules of the Road. This dataset is shared to archive, document and encourage open educational research into scholarly activity on Twitter. Other ConsiderationsTweets published publicly by scholars during academic conferences are often tagged (labeled) with a hashtag dedicated to the conference in question.The purpose and function of hashtags is to organise and describe information/outputs under the relevant label in order to enhance the discoverability of the labeled information/outputs (Tweets in this case). A hashtag is metadata users choose freely to use so their content is associated, directly linked to and categorised with the chosen hashtag. Though every reason for Tweeters' use of hashtags cannot be generalised nor predicted, it can be argued that scholarly Twitter users form specialised, self-selecting public professional networks that tend to observe scholarly practices and accepted modes of social and professional behaviour. In general terms it can be argued that scholarly Twitter users willingly and consciously tag their public Tweets with a conference hashtag as a means to network and to promote, report from, reflect on, comment on and generally contribute publicly to the scholarly conversation around conferences. As Twitter users, conference Twitter hashtag contributors have agreed to Twitter's Privacy and data sharing policies. Professional associations like the Modern Language Association recognise Tweets as citeable scholarly outputs. Archiving scholarly Tweets is a means to preserve this form of rapid online scholarship that otherwise can very likely become unretrievable as time passes; Twitter's search API has well-known temporal limitations for retrospective historical search and collection.Beyond individual tweets as scholarly outputs, the collective scholarly activity on Twitter around a conference or academic project or event can provide interesting insights for the contemporary history of scholarly communications. To date, collecting in real time is the only relatively accurate method to archive tweets at a small scale. Though these datasets have limitations and are not thoroughly systematic, it is hoped they can contribute to developing new insights into the discipline's presence on Twitter over time.The CC-BY license has been applied to the output in the repository as a curated dataset. Authorial/curatorial/collection work has been performed on the file in order to make it available as part of the scholarly record. The data contained in the deposited file is otherwise freely available elsewhere through different methods and anyone not wishing to attribute the data to the creator of this output is needless to say free to do their own collection and clean their own data.
Crimes - One year prior to present
chicago.gov
data.cityofchicago.org
+2more
csv, xlsx, xml
Updated Sep 15, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Chicago Police Department (2025). Crimes - One year prior to present [Dataset]. https://www.chicago.gov/city/en/dataset/crime.html
Explore at:
xlsx, xml, csvAvailable download formats
Dataset updated
Sep 15, 2025
Dataset authored and provided by
Chicago Police Departmenthttp://www.chicagopolice.org/
Description
This dataset reflects reported incidents of crime (with the exception of murders where data exists for each victim) that have occurred in the City of Chicago over the past year, minus the most recent seven days of data. Data is extracted from the Chicago Police Department's CLEAR (Citizen Law Enforcement Analysis and Reporting) system. In order to protect the privacy of crime victims, addresses are shown at the block level only and specific locations are not identified. Should you have questions about this dataset, you may contact the Research & Development Division of the Chicago Police Department at 312.745.6071 or RandD@chicagopolice.org. Disclaimer: These crimes may be based upon preliminary information supplied to the Police Department by the reporting parties that have not been verified. The preliminary crime classifications may be changed at a later date based upon additional investigation and there is always the possibility of mechanical or human error. Therefore, the Chicago Police Department does not guarantee (either expressed or implied) the accuracy, completeness, timeliness, or correct sequencing of the information and the information should not be used for comparison purposes over time. The Chicago Police Department will not be responsible for any error or omission, or for the use of, or the results obtained from the use of this information. All data visualizations on maps should be considered approximate and attempts to derive specific addresses are strictly prohibited.

The Chicago Police Department is not responsible for the content of any off-site pages that are referenced by or that reference this web page other than an official City of Chicago or Chicago Police Department web page. The user specifically acknowledges that the Chicago Police Department is not responsible for any defamatory, offensive, misleading, or illegal conduct of other users, links, or third parties and that the risk of injury from the foregoing rests entirely with the user. The unauthorized use of the words "Chicago Police Department," "Chicago Police," or any colorable imitation of these words or the unauthorized use of the Chicago Police Department logo is unlawful. This web page does not, in any way, authorize such use. Data is updated daily Tuesday through Sunday. The dataset contains more than 65,000 records/rows of data and cannot be viewed in full in Microsoft Excel. Therefore, when downloading the file, select CSV from the Export menu. Open the file in an ASCII text editor, such as Wordpad, to view and search. To access a list of Chicago Police Department - Illinois Uniform Crime Reporting (IUCR) codes, go to http://bit.ly/rk5Tpc.
d
MLP-based Learnable Window Size Dataset for Bitcoin Market Price
search.dataone.org
dataverse.harvard.edu
Updated Nov 8, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Rajabi, Shahab (2023). MLP-based Learnable Window Size Dataset for Bitcoin Market Price [Dataset]. http://doi.org/10.7910/DVN/5YBLKV
Explore at:
Unique identifier
https://doi.org/10.7910/DVN/5YBLKV
Dataset updated
Nov 8, 2023
Dataset provided by
Harvard Dataverse
Authors
Rajabi, Shahab
Description
The dataset of this paper is collected based on Google, Blockchain, and the Bitcoin market. Generally, there is a total of 26 features, however, a feature whose correlation rate is lower than 0.3 between the variations of price and the variations of feature has been eliminated. Hence, a total of 21 practical features including Market capitalization, Trade-volume, Transaction-fees USD, Average confirmation time, Difficulty, High price, Low price, Total hash rate, Block-size, Miners-revenue, N-transactions-total, Google searches, Open price, N-payments-per Block, Total circulating Bitcoin, Cost-per-transaction percent, Fees-USD-per transaction, N-unique-addresses, N-transactions-per block, and Output-volume have been selected. In addition to the values of these features, for each feature, a new one is created that includes the difference between the previous day and the day before the previous day as a supportive feature. From the point of view of the number and history of the dataset used, a total of 1275 training data were used in the proposed model to extract patterns of Bitcoin price and they were collected from 12 Nov 2018 to 4 Jun 2021.
Bitcoin Dataset without Missing Values
zenodo.org
data.niaid.nih.gov
zip
Updated Jul 23, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Rakshitha Godahewa; Rakshitha Godahewa; Christoph Bergmeir; Christoph Bergmeir; Geoff Webb; Geoff Webb; Rob Hyndman; Rob Hyndman; Pablo Montero-Manso; Pablo Montero-Manso (2021). Bitcoin Dataset without Missing Values [Dataset]. http://doi.org/10.5281/zenodo.5122101
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.5122101
Dataset updated
Jul 23, 2021
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Rakshitha Godahewa; Rakshitha Godahewa; Christoph Bergmeir; Christoph Bergmeir; Geoff Webb; Geoff Webb; Rob Hyndman; Rob Hyndman; Pablo Montero-Manso; Pablo Montero-Manso
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset contains the potential influencers of the bitcoin price. There are a total of 18 daily time series including hash rate, block size, mining difficulty etc. It also encompasses public opinion in the form of tweets and google searches mentioning the keyword bitcoin. The data is scraped from the interactive web-graphs available at https://bitinfocharts.com.

The original dataset contains missing values and they have been replaced by carrying forward the corresponding last seen observations (LOCF method).
u
Data from: SGP97 Surface: NCDC Summary of the Day COOP Precipitation Data
agdatacommons.nal.usda.gov
geodata.nal.usda.gov
+1more
bin
Updated Nov 30, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Thomas Jackson (2023). SGP97 Surface: NCDC Summary of the Day COOP Precipitation Data [Dataset]. https://agdatacommons.nal.usda.gov/articles/dataset/SGP97_Surface_NCDC_Summary_of_the_Day_COOP_Precipitation_Data/24665067
Explore at:
binAvailable download formats
Dataset updated
Nov 30, 2023
Dataset provided by
National Center for Atmospheric Research / Earth Observing Laboratory
Authors
Thomas Jackson
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The Southern Great Plains 1997 (SGP97) Hydrology Experiment originated from an interdisciplinary investigation, "Soil Moisture Mapping at Satellite Temporal and Spatial Scales" (PI: Thomas J. Jackson, USDA Agricultural Research Service, Beltsville, MD) selected under the NASA Research Announcement 95-MTPE-03. The region selected for investigation is the best instrumented site for surface soil moisture, hydrology and meteorology in the world. This includes the USDA/ARS Little Washita Watershed, the USDA/ARS facility at El Reno, Oklahoma, the ARM/CART central facility, as well as the Oklahoma Mesonet. The National Climatic Data Center (NCDC) Summary of the Day Co-operative Precipitation Dataset is one of several surface precipitation datasets provided in the Global Energy and Water Cycle Experiment (GEWEX) Continental-Scale International Project (GCIP) by UCAR/JOSS. The primary thrust of the cooperative observing program is the recording of 24-hour precipitation amounts. The observations are for the 24-hour period ending at the time of observation. Observer convenience or special program needs mean that observing times vary from station to station. However, the vast majority of observations are taken near either 7:00 AM or 7:00 PM local time. The National Weather Service (NWS) Cooperative Observer Daily Precipitation dataset was formed by extracting the daily incremental precipitation values provided in the National Climatic Data Center (NCDC) TD 3200 dataset. The Daily Precipitation data set contains six metadata parameters and four data parameters. The metadata parameters describe the station location and time at which the data were collected. The four data parameters repeat once for each day in the monthly record. Every record has 31 days reported, regardless of the actual number of days in the month. For months with less than 31 days, the extra days are reported as missing (i.e., '-999.99 7 M'). Each 24 hour precipitation value has an associated observation hour. The observation hour is the ending UTC hour for the 24 hour period for which the precipitation value is valid. Resources in this dataset:Resource Title: GeoData catalog record. File Name: Web Page, url: https://geodata.nal.usda.gov/geonetwork/srv/eng/catalog.search#/metadata/SGP97COOPprecipitation_jjm_2015-05-04_0933

Facebook

Twitter

Click to copy link

Link copied

Cite

https://console.cloud.google.com/marketplace/browse?filter=partner:BigQuery%20Public%20Datasets%20Program&hl=it (2022). Google Trends [Dataset]. https://console.cloud.google.com/marketplace/product/bigquery-public-datasets/google-search-trends?hl=it

Google Trends

Explore at:

Dataset updated

May 15, 2022

Dataset provided by

BigQueryhttps://cloud.google.com/bigquery
Google Searchhttp://google.com/
Googlehttp://google.com/

Description

The Google Trends dataset will provide critical signals that individual users and businesses alike can leverage to make better data-driven decisions. This dataset simplifies the manual interaction with the existing Google Trends UI by automating and exposing anonymized, aggregated, and indexed search data in BigQuery. This dataset includes the Top 25 stories and Top 25 Rising queries from Google Trends. It will be made available as two separate BigQuery tables, with a set of new top terms appended daily. Each set of Top 25 and Top 25 rising expires after 30 days, and will be accompanied by a rolling five-year window of historical data in 210 distinct locations in the United States. This Google dataset is hosted in Google BigQuery as part of Google Cloud's Datasets solution and is included in BigQuery's 1TB/mo of free tier processing. This means that each user receives 1TB of free BigQuery processing every month, which can be used to run queries on this public dataset. Watch this short video to learn how to get started quickly using BigQuery to access public datasets. What is BigQuery

Clear search

Close search

Google apps

Main menu

Google Trends

United States Google Search Trends: Government Measures: Government Subsidy

Table1_Reliability of Google Trends: Analysis of the Limits and Potential of...

Web robot detection - Server logs

Factori Machine Learning (ML) Data | 247 Countries Coverage | 5.2 B Event...

Datasys | Clickstream Data | Categorized Search Behavior (500M+ daily events...

China Google Search Trends: Online Shopping: Tmall

Web Search Queries Can Predict Stock Market Volumes

COVID-19 Search Trends symptoms dataset

Datasys | Clickstream Data (500M+ daily events | global coverage | updated...

Alexa, International Top 100 Websites, Global, 10.12.2007

Datasys | Clickstream Data (500M+ daily events | global coverage | updated...

Dataplex: Reddit Data | Global Social Media Data | 2.1M+ subreddits: trends,...

MODIS/Terra Vegetation Indices Daily Rolling-8-Day L3 Global 500m SIN Grid...

Lead Scoring Dataset

Context

Content

Acknowledgements

Inspiration

An Archive of #DH2016 Tweets Published on Thursday 14 July 2016 GMT

Crimes - One year prior to present

MLP-based Learnable Window Size Dataset for Bitcoin Market Price

Bitcoin Dataset without Missing Values

Data from: SGP97 Surface: NCDC Summary of the Day COOP Precipitation Data

Google TrendsSee More Versions

Google Trends