100+ datasets found
  1. Trust level French people have in Google and Facebook to ensure data...

    • statista.com
    Updated Mar 21, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Trust level French people have in Google and Facebook to ensure data protection 2019 [Dataset]. https://www.statista.com/statistics/1010095/trust-google-facebook-developing-better-data-protection-france/
    Explore at:
    Dataset updated
    Mar 21, 2022
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    May 15, 2019 - May 16, 2019
    Area covered
    France
    Description

    This pie chart displays the level of trust people have in Google and Facebook to develop better tools for personal data protection on the Internet in France in a survey from 2019. It shows that 37 percent of the respondents rather did not trust those companies to ensure data protection, while 35 percent declared they rather trusted them.

  2. Frequency of using Google Docs in the U.S. 2018

    • statista.com
    Updated Jan 25, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2022). Frequency of using Google Docs in the U.S. 2018 [Dataset]. https://www.statista.com/forecasts/1011649/frequency-of-using-google-docs-in-the-us
    Explore at:
    Dataset updated
    Jan 25, 2022
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    Oct 26, 2018 - Nov 5, 2018
    Area covered
    United States
    Description

    The displayed data on the frequency of using Google Docs shows results of an exclusive Statista survey conducted in the United States in 2018. Some 29 percent of respondents answered the question ''How often do you use Google Docs?'' with ''Daily''.The Survey Data Table for the Statista survey Tech Giants and Digital Services in the United States 2019 contains the complete tables for the survey including various column headings.

  3. DataForSEO Google Full (Keywords+SERP) database, historical data available

    • datarade.ai
    .json, .csv
    Updated Aug 17, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    DataForSEO (2023). DataForSEO Google Full (Keywords+SERP) database, historical data available [Dataset]. https://datarade.ai/data-products/dataforseo-google-full-keywords-serp-database-historical-d-dataforseo
    Explore at:
    .json, .csvAvailable download formats
    Dataset updated
    Aug 17, 2023
    Dataset provided by
    Authors
    DataForSEO
    Area covered
    Bolivia (Plurinational State of), Portugal, Burkina Faso, Sweden, South Africa, United Kingdom, Côte d'Ivoire, Cyprus, Costa Rica, Paraguay
    Description

    You can check the fields description in the documentation: current Full database: https://docs.dataforseo.com/v3/databases/google/full/?bash; Historical Full database: https://docs.dataforseo.com/v3/databases/google/history/full/?bash.

    Full Google Database is a combination of the Advanced Google SERP Database and Google Keyword Database.

    Google SERP Database offers millions of SERPs collected in 67 regions with most of Google’s advanced SERP features, including featured snippets, knowledge graphs, people also ask sections, top stories, and more.

    Google Keyword Database encompasses billions of search terms enriched with related Google Ads data: search volume trends, CPC, competition, and more.

    This database is available in JSON format only.

    You don’t have to download fresh data dumps in JSON – we can deliver data straight to your storage or database. We send terrabytes of data to dozens of customers every month using Amazon S3, Google Cloud Storage, Microsoft Azure Blob, Eleasticsearch, and Google Big Query. Let us know if you’d like to get your data to any other storage or database.

  4. NYC Open Data

    • kaggle.com
    zip
    Updated Mar 20, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    NYC Open Data (2019). NYC Open Data [Dataset]. https://www.kaggle.com/datasets/nycopendata/new-york
    Explore at:
    zip(0 bytes)Available download formats
    Dataset updated
    Mar 20, 2019
    Dataset authored and provided by
    NYC Open Data
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Context

    NYC Open Data is an opportunity to engage New Yorkers in the information that is produced and used by City government. We believe that every New Yorker can benefit from Open Data, and Open Data can benefit from every New Yorker. Source: https://opendata.cityofnewyork.us/overview/

    Content

    Thanks to NYC Open Data, which makes public data generated by city agencies available for public use, and Citi Bike, we've incorporated over 150 GB of data in 5 open datasets into Google BigQuery Public Datasets, including:

    • Over 8 million 311 service requests from 2012-2016

    • More than 1 million motor vehicle collisions 2012-present

    • Citi Bike stations and 30 million Citi Bike trips 2013-present

    • Over 1 billion Yellow and Green Taxi rides from 2009-present

    • Over 500,000 sidewalk trees surveyed decennially in 1995, 2005, and 2015

    This dataset is deprecated and not being updated.

    Fork this kernel to get started with this dataset.

    Acknowledgements

    https://opendata.cityofnewyork.us/

    https://cloud.google.com/blog/big-data/2017/01/new-york-city-public-datasets-now-available-on-google-bigquery

    This dataset is publicly available for anyone to use under the following terms provided by the Dataset Source - https://data.cityofnewyork.us/ - and is provided "AS IS" without any warranty, express or implied, from Google. Google disclaims all liability for any damages, direct or indirect, resulting from the use of the dataset.

    By accessing datasets and feeds available through NYC Open Data, the user agrees to all of the Terms of Use of NYC.gov as well as the Privacy Policy for NYC.gov. The user also agrees to any additional terms of use defined by the agencies, bureaus, and offices providing data. Public data sets made available on NYC Open Data are provided for informational purposes. The City does not warranty the completeness, accuracy, content, or fitness for any particular purpose or use of any public data set made available on NYC Open Data, nor are any such warranties to be implied or inferred with respect to the public data sets furnished therein.

    The City is not liable for any deficiencies in the completeness, accuracy, content, or fitness for any particular purpose or use of any public data set, or application utilizing such data set, provided by any third party.

    Banner Photo by @bicadmedia from Unplash.

    Inspiration

    On which New York City streets are you most likely to find a loud party?

    Can you find the Virginia Pines in New York City?

    Where was the only collision caused by an animal that injured a cyclist?

    What’s the Citi Bike record for the Longest Distance in the Shortest Time (on a route with at least 100 rides)?

    https://cloud.google.com/blog/big-data/2017/01/images/148467900588042/nyc-dataset-6.png" alt="enter image description here"> https://cloud.google.com/blog/big-data/2017/01/images/148467900588042/nyc-dataset-6.png

  5. World Bank: Education Data

    • kaggle.com
    zip
    Updated Mar 20, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    World Bank (2019). World Bank: Education Data [Dataset]. https://www.kaggle.com/datasets/theworldbank/world-bank-intl-education
    Explore at:
    zip(0 bytes)Available download formats
    Dataset updated
    Mar 20, 2019
    Dataset authored and provided by
    World Bankhttp://worldbank.org/
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Context

    The World Bank is an international financial institution that provides loans to countries of the world for capital projects. The World Bank's stated goal is the reduction of poverty. Source: https://en.wikipedia.org/wiki/World_Bank

    Content

    This dataset combines key education statistics from a variety of sources to provide a look at global literacy, spending, and access.

    For more information, see the World Bank website.

    Fork this kernel to get started with this dataset.

    Acknowledgements

    https://bigquery.cloud.google.com/dataset/bigquery-public-data:world_bank_health_population

    http://data.worldbank.org/data-catalog/ed-stats

    https://cloud.google.com/bigquery/public-data/world-bank-education

    Citation: The World Bank: Education Statistics

    Dataset Source: World Bank. This dataset is publicly available for anyone to use under the following terms provided by the Dataset Source - http://www.data.gov/privacy-policy#data_policy - and is provided "AS IS" without any warranty, express or implied, from Google. Google disclaims all liability for any damages, direct or indirect, resulting from the use of the dataset.

    Banner Photo by @till_indeman from Unplash.

    Inspiration

    Of total government spending, what percentage is spent on education?

  6. Cyclistic Dataset

    • kaggle.com
    zip
    Updated Feb 2, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ethan Tyler Rundquist (2022). Cyclistic Dataset [Dataset]. https://www.kaggle.com/datasets/ethantylerrundquist/cyclistic-dataset
    Explore at:
    zip(204750591 bytes)Available download formats
    Dataset updated
    Feb 2, 2022
    Authors
    Ethan Tyler Rundquist
    Description

    Dataset

    This dataset was created by Ethan Tyler Rundquist

    Contents

  7. U.S. trust in tech companies with personal data 2021

    • statista.com
    Updated Jul 7, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2022). U.S. trust in tech companies with personal data 2021 [Dataset]. https://www.statista.com/statistics/800764/trust-tech-companies-keep-personal-data-secure-private/
    Explore at:
    Dataset updated
    Jul 7, 2022
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    Nov 2021
    Area covered
    United States
    Description

    As of November 2021 in the United States, 53 percent of surveyed participants said that they trusted Amazon to handle their personal data, whereas 40 percent said they distrusted the service with their information. Overall, 72 percent of respondents said that they did not trust Facebook with their private data, and 63 percent said they did not trust TikTok with such information. Just under half of all respondents stated that they trusted Google and 43 percent trusted Microsoft.

  8. INTRODUCTION OF COVID-NEWS-US-NNK AND COVID-NEWS-BD-NNK DATASET

    • zenodo.org
    • data.niaid.nih.gov
    pdf
    Updated Jul 19, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nafiz Sadman; Nishat Anjum; Kishor Datta Gupta; Kishor Datta Gupta; Nafiz Sadman; Nishat Anjum (2024). INTRODUCTION OF COVID-NEWS-US-NNK AND COVID-NEWS-BD-NNK DATASET [Dataset]. http://doi.org/10.5281/zenodo.4047648
    Explore at:
    pdfAvailable download formats
    Dataset updated
    Jul 19, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Nafiz Sadman; Nishat Anjum; Kishor Datta Gupta; Kishor Datta Gupta; Nafiz Sadman; Nishat Anjum
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Bangladesh, United States
    Description

    Introduction

    There are several works based on Natural Language Processing on newspaper reports. Mining opinions from headlines [ 1 ] using Standford NLP and SVM by Rameshbhaiet. Al.compared several algorithms on a small and large dataset. Rubinet. al., in their paper [ 2 ], created a mechanism to differentiate fake news from real ones by building a set of characteristics of news according to their types. The purpose was to contribute to the low resource data available for training machine learning algorithms. Doumitet. al.in [ 3 ] have implemented LDA, a topic modeling approach to study bias present in online news media.

    However, there are not many NLP research invested in studying COVID-19. Most applications include classification of chest X-rays and CT-scans to detect presence of pneumonia in lungs [ 4 ], a consequence of the virus. Other research areas include studying the genome sequence of the virus[ 5 ][ 6 ][ 7 ] and replicating its structure to fight and find a vaccine. This research is crucial in battling the pandemic. The few NLP based research publications are sentiment classification of online tweets by Samuel et el [ 8 ] to understand fear persisting in people due to the virus. Similar work has been done using the LSTM network to classify sentiments from online discussion forums by Jelodaret. al.[ 9 ]. NKK dataset is the first study on a comparatively larger dataset of a newspaper report on COVID-19, which contributed to the virus’s awareness to the best of our knowledge.

    2 Data-set Introduction

    2.1 Data Collection

    We accumulated 1000 online newspaper report from United States of America (USA) on COVID-19. The newspaper includes The Washington Post (USA) and StarTribune (USA). We have named it as “Covid-News-USA-NNK”. We also accumulated 50 online newspaper report from Bangladesh on the issue and named it “Covid-News-BD-NNK”. The newspaper includes The Daily Star (BD) and Prothom Alo (BD). All these newspapers are from the top provider and top read in the respective countries. The collection was done manually by 10 human data-collectors of age group 23- with university degrees. This approach was suitable compared to automation to ensure the news were highly relevant to the subject. The newspaper online sites had dynamic content with advertisements in no particular order. Therefore there were high chances of online scrappers to collect inaccurate news reports. One of the challenges while collecting the data is the requirement of subscription. Each newspaper required $1 per subscriptions. Some criteria in collecting the news reports provided as guideline to the human data-collectors were as follows:

    • The headline must have one or more words directly or indirectly related to COVID-19.
    • The content of each news must have 5 or more keywords directly or indirectly related to COVID-19.
    • The genre of the news can be anything as long as it is relevant to the topic. Political, social, economical genres are to be more prioritized.
    • Avoid taking duplicate reports.
    • Maintain a time frame for the above mentioned newspapers.

    To collect these data we used a google form for USA and BD. We have two human editor to go through each entry to check any spam or troll entry.

    2.2 Data Pre-processing and Statistics

    Some pre-processing steps performed on the newspaper report dataset are as follows:

    • Remove hyperlinks.
    • Remove non-English alphanumeric characters.
    • Remove stop words.
    • Lemmatize text.

    While more pre-processing could have been applied, we tried to keep the data as much unchanged as possible since changing sentence structures could result us in valuable information loss. While this was done with help of a script, we also assigned same human collectors to cross check for any presence of the above mentioned criteria.

    The primary data statistics of the two dataset are shown in Table 1 and 2.

    Table 1: Covid-News-USA-NNK data statistics
    
    No of words per
    headline
    
    7 to 20
    
    No of words per body
    content
    
    150 to 2100
    
    Table 2: Covid-News-BD-NNK data statistics
    No of words per
    headline
    
    10 to 20
    
    No of words per body
    content
    
    100 to 1500
    

    2.3 Dataset Repository

    We used GitHub as our primary data repository in account name NKK^1. Here, we created two repositories USA-NKK^2 and BD-NNK^3. The dataset is available in both CSV and JSON format. We are regularly updating the CSV files and regenerating JSON using a py script. We provided a python script file for essential operation. We welcome all outside collaboration to enrich the dataset.

    3 Literature Review

    Natural Language Processing (NLP) deals with text (also known as categorical) data in computer science, utilizing numerous diverse methods like one-hot encoding, word embedding, etc., that transform text to machine language, which can be fed to multiple machine learning and deep learning algorithms.

    Some well-known applications of NLP includes fraud detection on online media sites[ 10 ], using authorship attribution in fallback authentication systems[ 11 ], intelligent conversational agents or chatbots[ 12 ] and machine translations used by Google Translate[ 13 ]. While these are all downstream tasks, several exciting developments have been made in the algorithm solely for Natural Language Processing tasks. The two most trending ones are BERT[ 14 ], which uses bidirectional encoder-decoder architecture to create the transformer model, that can do near-perfect classification tasks and next-word predictions for next generations, and GPT-3 models released by OpenAI[ 15 ] that can generate texts almost human-like. However, these are all pre-trained models since they carry huge computation cost. Information Extraction is a generalized concept of retrieving information from a dataset. Information extraction from an image could be retrieving vital feature spaces or targeted portions of an image; information extraction from speech could be retrieving information about names, places, etc[ 16 ]. Information extraction in texts could be identifying named entities and locations or essential data. Topic modeling is a sub-task of NLP and also a process of information extraction. It clusters words and phrases of the same context together into groups. Topic modeling is an unsupervised learning method that gives us a brief idea about a set of text. One commonly used topic modeling is Latent Dirichlet Allocation or LDA[17].

    Keyword extraction is a process of information extraction and sub-task of NLP to extract essential words and phrases from a text. TextRank [ 18 ] is an efficient keyword extraction technique that uses graphs to calculate the weight of each word and pick the words with more weight to it.

    Word clouds are a great visualization technique to understand the overall ’talk of the topic’. The clustered words give us a quick understanding of the content.

    4 Our experiments and Result analysis

    We used the wordcloud library^4 to create the word clouds. Figure 1 and 3 presents the word cloud of Covid-News-USA- NNK dataset by month from February to May. From the figures 1,2,3, we can point few information:

    • In February, both the news paper have talked about China and source of the outbreak.
    • StarTribune emphasized on Minnesota as the most concerned state. In April, it seemed to have been concerned more.
    • Both the newspaper talked about the virus impacting the economy, i.e, bank, elections, administrations, markets.
    • Washington Post discussed global issues more than StarTribune.
    • StarTribune in February mentioned the first precautionary measurement: wearing masks, and the uncontrollable spread of the virus throughout the nation.
    • While both the newspaper mentioned the outbreak in China in February, the weight of the spread in the United States are more highlighted through out March till May, displaying the critical impact caused by the virus.

    We used a script to extract all numbers related to certain keywords like ’Deaths’, ’Infected’, ’Died’ , ’Infections’, ’Quarantined’, Lock-down’, ’Diagnosed’ etc from the news reports and created a number of cases for both the newspaper. Figure 4 shows the statistics of this series. From this extraction technique, we can observe that April was the peak month for the covid cases as it gradually rose from February. Both the newspaper clearly shows us that the rise in covid cases from February to March was slower than the rise from March to April. This is an important indicator of possible recklessness in preparations to battle the virus. However, the steep fall from April to May also shows the positive response against the attack. We used Vader Sentiment Analysis to extract sentiment of the headlines and the body. On average, the sentiments were from -0.5 to -0.9. Vader Sentiment scale ranges from -1(highly negative to 1(highly positive). There were some cases

    where the sentiment scores of the headline and body contradicted each other,i.e., the sentiment of the headline was negative but the sentiment of the body was slightly positive. Overall, sentiment analysis can assist us sort the most concerning (most negative) news from the positive ones, from which we can learn more about the indicators related to COVID-19 and the serious impact caused by it. Moreover, sentiment analysis can also provide us information about how a state or country is reacting to the pandemic. We used PageRank algorithm to extract

  9. stock data of google

    • kaggle.com
    zip
    Updated May 19, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mohammed Al-huraibi (2024). stock data of google [Dataset]. https://www.kaggle.com/datasets/mohammedalhuraibi/stock-data-of-google/code
    Explore at:
    zip(92756 bytes)Available download formats
    Dataset updated
    May 19, 2024
    Authors
    Mohammed Al-huraibi
    Description

    Dataset

    This dataset was created by Mohammed Al-huraibi

    Contents

  10. 202203_cyclistics

    • kaggle.com
    zip
    Updated Apr 12, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Elliott Maglio (2023). 202203_cyclistics [Dataset]. https://www.kaggle.com/datasets/elliottmaglio/202203-cyclistics
    Explore at:
    zip(11382826 bytes)Available download formats
    Dataset updated
    Apr 12, 2023
    Authors
    Elliott Maglio
    Description

    Dataset

    This dataset was created by Elliott Maglio

    Contents

  11. Frequency of using Google Assistant in the U.S. 2018

    • statista.com
    Updated Jan 25, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2022). Frequency of using Google Assistant in the U.S. 2018 [Dataset]. https://www.statista.com/forecasts/1011646/frequency-of-using-google-assistant-in-the-us
    Explore at:
    Dataset updated
    Jan 25, 2022
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    Oct 26, 2018 - Nov 5, 2018
    Area covered
    United States
    Description

    The displayed data on the frequency of using Google Assistant shows results of an exclusive Statista survey conducted in the United States in 2018. Some 38 percent of respondents answered the question ''How often do you use Google Assistant?'' with ''Daily''.The Survey Data Table for the Statista survey Tech Giants and Digital Services in the United States 2019 contains the complete tables for the survey including various column headings.

  12. c

    State of Iowa Google My Business Profile Analytics by Month

    • s.cnmilf.com
    • data.iowa.gov
    • +2more
    Updated Jul 12, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    data.iowa.gov (2024). State of Iowa Google My Business Profile Analytics by Month [Dataset]. https://s.cnmilf.com/user74170196/https/catalog.data.gov/dataset/state-of-iowa-google-my-business-profile-analytics-by-month
    Explore at:
    Dataset updated
    Jul 12, 2024
    Dataset provided by
    data.iowa.gov
    Area covered
    Iowa
    Description

    This dataset provides insights by month on how people find State of Iowa agency listings on the web via Google Search and Maps, and what they do once they find it to include providing reviews (ratings), accessing agency websites, requesting directions, and making calls.

  13. c

    Provenance of social media: survey data, 2016

    • datacatalogue.cessda.eu
    Updated Mar 25, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Edwards, P; Corsar, D; Markovic, M (2025). Provenance of social media: survey data, 2016 [Dataset]. http://doi.org/10.5255/UKDA-SN-852507
    Explore at:
    Dataset updated
    Mar 25, 2025
    Dataset provided by
    University of Aberdeen
    Authors
    Edwards, P; Corsar, D; Markovic, M
    Time period covered
    Aug 1, 2016 - Aug 31, 2016
    Area covered
    United Kingdom
    Variables measured
    Individual
    Measurement technique
    This dataset was created via an online survey. Respondents were self selecting from emails to social science researchers and Tweets requesting participants to complete the form. The sample population were social science researchers that have used or plan to use data from social media services in their research.
    Description

    Survey instrument and anonymised responses collected as part of Sub-Project B4 “Provenance of Social Media” of the larger Social Media - Developing Understanding, Infrastructure & Engagement (Social Media Enhancement) award (ES/M001628/1). The survey aimed to further our understanding of the current practices and attitudes towards the provenance of data collected from social media platforms and its analysis by researchers in the social sciences. This includes all forms of social media, such as Twitter, Facebook, Wikipedia, Quora, blogs, discussion forums, etc. The survey was conducted as an online-survey using Google Forms. Findings from this survey influenced the work of the sub-project, and the development of tools to support researchers who wish to increase the transparency of their research using social media data.

    Dataset of collected survey responses, and pdf versions of the Google Forms online survey instrument. Each PDF file denotes one possible survey path that depended on the response of a participant to the question “What level of experience do you have using data from a social media platforms as part of your research?” The three paths are:

    (1) SurveyInstrument-Path-1.pdf - is used if the participant selected the option "I have used/am currently using social media data as part of my research."

    (2) SurveyInstrument-Path-2.pdf - is used if the participant selected the option "I am aware of others using social media data as part of their research and may consider using it within mine."

    (3) SurveyInstrument-Path-3.pdf - is used if the participant selected the option "Neither of the above."

    There is now a broad consensus that new forms of social data emerging from people’s day-to-day activities on the web have the potential to transform the social sciences. However, there is also agreement that current analytical techniques fall short of the methodological standards required for academic research and policymaking and that conclusions drawn from social media data have much greater utility when combined with results drawn from other datasets (including various public sector resources made available through open data initiatives). In this proposal we outline the case for further investigations into the challenges surrounding social media data and the social sciences. Aspects of the work will involve analysis of social media data in a number of contexts, including: -transport disruption around the 2014 Commonwealth Games (Glasgow) - news stories about Scottish independence and UK-EU relations - island communities in the Western Isles. Guided by insights from these case studies we will: - develop a suite of software tools to support various aspects of data analysis and curation; - provide guidance on ethical considerations surrounding analysis of social media data; - deliver training workshops for social science researchers; - engage with the public on this important topic through a series of festivals (food, music, science).

  14. n

    Data underlying the paper: "What is Sensitive about (Sensitive) Data?...

    • 4tu.edu.hpc.n-helix.com
    • data.4tu.nl
    zip
    Updated Feb 28, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Alejandra Gómez Ortega (2024). Data underlying the paper: "What is Sensitive about (Sensitive) Data? Characterizing Sensitivity and Intimacy with Google Assistant Users" Included in Chapter 5 of the PhD thesis: Sensitive Data Donation [Dataset]. http://doi.org/10.4121/0ebb1579-88ab-4bec-a866-1e9aff19581f.v1
    Explore at:
    zipAvailable download formats
    Dataset updated
    Feb 28, 2024
    Dataset provided by
    4TU.ResearchData
    Authors
    Alejandra Gómez Ortega
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This project investigates the perceived characteristics of the data collected by the Google Voice Assistant (i.e. speech records). Speech records were collected through data donation, analysed, represented visually, and used during semi-structured interviews to interrogate people's perceptions of sensitivity and intimacy. The dataset includes the analysis and interview protocol, the visual representation of the data, and the thematic structure of the results.

  15. O

    Website statistics—People with disability

    • data.qld.gov.au
    csv
    Updated Apr 24, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Communities, Housing and Digital Economy (2021). Website statistics—People with disability [Dataset]. https://www.data.qld.gov.au/dataset/website-statistics-people-with-disability
    Explore at:
    csv(18944), csv(13824), csv(15872), csv(14848), csv(13312), csv(10752), csv(12288), csv(14336), csv(12800), csv(15360)Available download formats
    Dataset updated
    Apr 24, 2021
    Dataset authored and provided by
    Communities, Housing and Digital Economy
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Monthly statistics for pages viewed by visitors to the Queensland Government website—People with disability franchise. Source: Google Analytics

  16. T

    Data for: Ok Google, What Am I Doing? Acoustic Activity Recognition Bounded...

    • dataverse.tdl.org
    audio/vnd.wave, pdf
    Updated Oct 20, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rebecca Adaimi; Howard Yong; Rebecca Adaimi; Howard Yong (2021). Data for: Ok Google, What Am I Doing? Acoustic Activity Recognition Bounded by Conversational Assistant Interactions [Dataset]. http://doi.org/10.18738/T8/OCWAZW
    Explore at:
    audio/vnd.wave(2580538), audio/vnd.wave(2426938), audio/vnd.wave(39800), audio/vnd.wave(84412), audio/vnd.wave(80028), audio/vnd.wave(10321964), audio/vnd.wave(130540), audio/vnd.wave(9263984), audio/vnd.wave(104012), audio/vnd.wave(73012), audio/vnd.wave(134752), audio/vnd.wave(176848), audio/vnd.wave(185264), audio/vnd.wave(57822), audio/vnd.wave(160008), audio/vnd.wave(181056), audio/vnd.wave(116516), audio/vnd.wave(73622), audio/vnd.wave(7372844), audio/vnd.wave(176844), audio/vnd.wave(50598), audio/vnd.wave(105284), audio/vnd.wave(189476), audio/vnd.wave(112744), audio/vnd.wave(193684), audio/vnd.wave(102086), audio/vnd.wave(227360), audio/vnd.wave(63444), audio/vnd.wave(52772), audio/vnd.wave(160828), audio/vnd.wave(138960), audio/vnd.wave(109492), audio/vnd.wave(115412), audio/vnd.wave(81266), audio/vnd.wave(57420), audio/vnd.wave(126332), audio/vnd.wave(66642), audio/vnd.wave(61406), audio/vnd.wave(123704), audio/vnd.wave(45850), audio/vnd.wave(45766), audio/vnd.wave(166576), audio/vnd.wave(103578), audio/vnd.wave(117912), audio/vnd.wave(173502), audio/vnd.wave(92656), audio/vnd.wave(139872), audio/vnd.wave(77024), audio/vnd.wave(134786), audio/vnd.wave(94572), audio/vnd.wave(147380), audio/vnd.wave(84236), audio/vnd.wave(269456), audio/vnd.wave(202104), audio/vnd.wave(91726), audio/vnd.wave(63472), audio/vnd.wave(168428), audio/vnd.wave(118474), audio/vnd.wave(92652), audio/vnd.wave(88444), audio/vnd.wave(298924), audio/vnd.wave(88030), audio/vnd.wave(122120), audio/vnd.wave(111804), audio/vnd.wave(538872), audio/vnd.wave(122124), audio/vnd.wave(70568), audio/vnd.wave(286296), audio/vnd.wave(218944), audio/vnd.wave(500984), audio/vnd.wave(109342), audio/vnd.wave(73384), audio/vnd.wave(96508), audio/vnd.wave(160752), audio/vnd.wave(117916), audio/vnd.wave(244200), audio/vnd.wave(88394), audio/vnd.wave(197892), audio/vnd.wave(164216), audio/vnd.wave(68910), audio/vnd.wave(64740), audio/vnd.wave(62200), audio/vnd.wave(471520), audio/vnd.wave(395744), audio/vnd.wave(87488), audio/vnd.wave(2506810), audio/vnd.wave(166186), audio/vnd.wave(164220), audio/vnd.wave(10238800), audio/vnd.wave(108328), audio/vnd.wave(61202), audio/vnd.wave(101076), audio/vnd.wave(246992), audio/vnd.wave(113182), audio/vnd.wave(128426), audio/vnd.wave(143168), audio/vnd.wave(964040), audio/vnd.wave(96864), audio/vnd.wave(370488), audio/vnd.wave(496776), audio/vnd.wave(109496), audio/vnd.wave(75816), audio/vnd.wave(564128), audio/vnd.wave(56818), audio/vnd.wave(235784), audio/vnd.wave(2371642), audio/vnd.wave(113704), audio/vnd.wave(88080), audio/vnd.wave(47068), audio/vnd.wave(2564154), audio/vnd.wave(138964), audio/vnd.wave(64554), audio/vnd.wave(92026), audio/vnd.wave(69550), audio/vnd.wave(165404), audio/vnd.wave(88448), audio/vnd.wave(315764), audio/vnd.wave(282088), audio/vnd.wave(155800), audio/vnd.wave(115114), audio/vnd.wave(488356), audio/vnd.wave(239988), audio/vnd.wave(105938), audio/vnd.wave(86684), audio/vnd.wave(200196), audio/vnd.wave(433632), audio/vnd.wave(78560), audio/vnd.wave(555712), audio/vnd.wave(64090), audio/vnd.wave(345232), audio/vnd.wave(593596), audio/vnd.wave(100090), audio/vnd.wave(1191360), audio/vnd.wave(60956), audio/vnd.wave(113694), audio/vnd.wave(88002), audio/vnd.wave(57984), audio/vnd.wave(2558010), audio/vnd.wave(134922), audio/vnd.wave(56046), audio/vnd.wave(214732), audio/vnd.wave(49222), audio/vnd.wave(77962), audio/vnd.wave(10237696), audio/vnd.wave(143560), audio/vnd.wave(10267056), audio/vnd.wave(79852), audio/vnd.wave(193688), audio/vnd.wave(53066), audio/vnd.wave(210524), audio/vnd.wave(53972), audio/vnd.wave(223152), audio/vnd.wave(644112), audio/vnd.wave(61750), audio/vnd.wave(89310), audio/vnd.wave(66412), audio/vnd.wave(120552), audio/vnd.wave(235780), audio/vnd.wave(526244), audio/vnd.wave(51782), audio/vnd.wave(111024), audio/vnd.wave(10031776), audio/vnd.wave(231572), audio/vnd.wave(50804), audio/vnd.wave(151588), audio/vnd.wave(189472), audio/vnd.wave(218940), audio/vnd.wave(290504), audio/vnd.wave(129638), audio/vnd.wave(72450), audio/vnd.wave(543084), audio/vnd.wave(202108), audio/vnd.wave(509404), audio/vnd.wave(59604), audio/vnd.wave(53974), audio/vnd.wave(86670), audio/vnd.wave(522032), audio/vnd.wave(206312), audio/vnd.wave(324180), audio/vnd.wave(55244), audio/vnd.wave(45546), audio/vnd.wave(66726), audio/vnd.wave(57634), audio/vnd.wave(353648), audio/vnd.wave(56384), audio/vnd.wave(357860), audio/vnd.wave(61366), audio/vnd.wave(51484), audio/vnd.wave(92782), audio/vnd.wave(265252), audio/vnd.wave(67400), audio/vnd.wave(92392), audio/vnd.wave(378908), audio/vnd.wave(43032), audio/vnd.wave(311556), audio/vnd.wave(366276), audio/vnd.wave(139774), audio/vnd.wave(282084), audio/vnd.wave(602016), audio/vnd.wave(67544), audio/vnd.wave(294716), audio/vnd.wave(307344), audio/vnd.wave(277876), audio/vnd.wave(147376), audio/vnd.wave(70120), audio/vnd.wave(50418), audio/vnd.wave(104704), audio/vnd.wave(77328), audio/vnd.wave(77112), audio/vnd.wave(197896), audio/vnd.wave(2289722), audio/vnd.wave(58538), audio/vnd.wave(223156), audio/vnd.wave(332604), audio/vnd.wave(2314298), audio/vnd.wave(63220), audio/vnd.wave(328392), audio/vnd.wave(94080), audio/vnd.wave(91864), audio/vnd.wave(558082), audio/vnd.wave(83458), audio/vnd.wave(84562), audio/vnd.wave(132416), audio/vnd.wave(206316), audio/vnd.wave(110772), audio/vnd.wave(484148), audio/vnd.wave(85586), audio/vnd.wave(227364), pdf(61374), audio/vnd.wave(248412), audio/vnd.wave(105480), audio/vnd.wave(56178), audio/vnd.wave(121168), audio/vnd.wave(127702), audio/vnd.wave(99970), audio/vnd.wave(72446), audio/vnd.wave(58346), audio/vnd.wave(57428), audio/vnd.wave(252620), audio/vnd.wave(56018), audio/vnd.wave(98290), audio/vnd.wave(130544), audio/vnd.wave(168424), audio/vnd.wave(121796), audio/vnd.wave(124164), audio/vnd.wave(161790), audio/vnd.wave(77728), audio/vnd.wave(135134), audio/vnd.wave(79158), audio/vnd.wave(101570), audio/vnd.wave(68410), audio/vnd.wave(273668), audio/vnd.wave(244204), audio/vnd.wave(64190), audio/vnd.wave(1174972), audio/vnd.wave(75132), audio/vnd.wave(303132), audio/vnd.wave(95060), audio/vnd.wave(185268), audio/vnd.wave(86620), audio/vnd.wave(43612), audio/vnd.wave(362068), audio/vnd.wave(100786), audio/vnd.wave(9716592), audio/vnd.wave(106872), audio/vnd.wave(65818), audio/vnd.wave(55282), audio/vnd.wave(71608), audio/vnd.wave(191604), audio/vnd.wave(892476), audio/vnd.wave(83374), audio/vnd.wave(68142), audio/vnd.wave(73656), audio/vnd.wave(9491472), audio/vnd.wave(92930), audio/vnd.wave(90172), audio/vnd.wave(50738), audio/vnd.wave(51110), audio/vnd.wave(151592), audio/vnd.wave(9168064), audio/vnd.wave(41226), audio/vnd.wave(172636), audio/vnd.wave(980880), audio/vnd.wave(72188), audio/vnd.wave(66672), audio/vnd.wave(56360), audio/vnd.wave(55190), audio/vnd.wave(130638), audio/vnd.wave(128776), audio/vnd.wave(57714), audio/vnd.wave(117446), audio/vnd.wave(102278), audio/vnd.wave(256828), audio/vnd.wave(425212), audio/vnd.wave(54820), audio/vnd.wave(96868), audio/vnd.wave(101072), audio/vnd.wave(412584), audio/vnd.wave(101706), audio/vnd.wave(293264), audio/vnd.wave(67222), audio/vnd.wave(112254), audio/vnd.wave(124486), audio/vnd.wave(92908), audio/vnd.wave(10195984), audio/vnd.wave(191244), audio/vnd.wave(88382), audio/vnd.wave(319972), audio/vnd.wave(134966), audio/vnd.wave(76926), audio/vnd.wave(311552), audio/vnd.wave(140588), audio/vnd.wave(72720), audio/vnd.wave(51436), audio/vnd.wave(99200), audio/vnd.wave(618856), audio/vnd.wave(2547770), audio/vnd.wave(125228), audio/vnd.wave(49158), audio/vnd.wave(303136), audio/vnd.wave(150564), audio/vnd.wave(54656), audio/vnd.wave(442052)Available download formats
    Dataset updated
    Oct 20, 2021
    Dataset provided by
    Texas Data Repository
    Authors
    Rebecca Adaimi; Howard Yong; Rebecca Adaimi; Howard Yong
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    An annotated dataset of audio interactions with a conversational assistant with background sounds of 19 activities. Abstract: Conversational assistants in the form of stand-alone devices such as Amazon Echo and Google Home have become popular and embraced by millions of people. By serving as a natural interface to services ranging from home automation to media players, conversational assistants help people perform many tasks with ease, such as setting timers, playing music and managing to-do lists. While these systems offer useful capabilities, they are largely passive and unaware of the human behavioral context in which they are used. In this work, we explore how off-the-shelf conversational assistants can be enhanced with acoustic-based human activity recognition by leveraging the short interval after a voice command is given to the device. Since always-on audio recording can pose privacy concerns, our method is unique in that it does not require capturing and analyzing any audio other than the speech-based interactions between people and their conversational assistants. In particular, we leverage background environmental sounds present in these short duration voice-based interactions to recognize activities of daily living. We conducted a study with 14 participants in 3 different locations in their own homes. We showed that our method can recognize 19 different activities of daily living with average precision of 84.85% and average recall of 85.67% in a leave-one-participant-out performance evaluation with 30-second audio clips bound by the voice interactions. IRB approved under ID: 2016020035-MODCR01

  17. g

    Demographics

    • health.google.com
    Updated Oct 7, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2021). Demographics [Dataset]. https://health.google.com/covid-19/open-data/raw-data
    Explore at:
    Dataset updated
    Oct 7, 2021
    Variables measured
    key, population, population_male, rural_population, urban_population, population_female, population_density, clustered_population, population_age_00_09, population_age_10_19, and 11 more
    Description

    Various population statistics, including structured demographics data.

  18. American Community Survey (ACS)

    • console.cloud.google.com
    Updated Aug 3, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    https://console.cloud.google.com/marketplace/browse?filter=partner:United%20States%20Census%20Bureau&hl=tr (2022). American Community Survey (ACS) [Dataset]. https://console.cloud.google.com/marketplace/product/united-states-census-bureau/acs?hl=tr
    Explore at:
    Dataset updated
    Aug 3, 2022
    Dataset provided by
    Googlehttp://google.com/
    Description

    The American Community Survey (ACS) is an ongoing survey that provides vital information on a yearly basis about our nation and its people by contacting over 3.5 million households across the country. The resulting data provides incredibly detailed demographic information across the US aggregated at various geographic levels which helps determine how more than $675 billion in federal and state funding are distributed each year. Businesses use ACS data to inform strategic decision-making. ACS data can be used as a component of market research, provide information about concentrations of potential employees with a specific education or occupation, and which communities could be good places to build offices or facilities. For example, someone scouting a new location for an assisted-living center might look for an area with a large proportion of seniors and a large proportion of people employed in nursing occupations. Through the ACS, we know more about jobs and occupations, educational attainment, veterans, whether people own or rent their homes, and other topics. Public officials, planners, and entrepreneurs use this information to assess the past and plan the future. For more information, see the Census Bureau's ACS Information Guide . This public dataset is hosted in Google BigQuery as part of the Google Cloud Public Datasets Program , with Carto providing cleaning and onboarding support. It is included in BigQuery's 1TB/mo of free tier processing. This means that each user receives 1TB of free BigQuery processing every month, which can be used to run queries on this public dataset. Watch this short video to learn how to get started quickly using BigQuery to access public datasets. What is BigQuery .

  19. f

    Collective Attention and Stock Prices: Evidence from Google Trends Data on...

    • plos.figshare.com
    zip
    Updated May 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Raphael H. Heiberger (2023). Collective Attention and Stock Prices: Evidence from Google Trends Data on Standard and Poor's 100 [Dataset]. http://doi.org/10.1371/journal.pone.0135311
    Explore at:
    zipAvailable download formats
    Dataset updated
    May 30, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Raphael H. Heiberger
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Today´s connected world allows people to gather information in shorter intervals than ever before, widely monitored by massive online data sources. As a dramatic economic event, recent financial crisis increased public interest for large companies considerably. In this paper, we exploit this change in information gathering behavior by utilizing Google query volumes as a "bad news" indicator for each corporation listed in the Standard and Poor´s 100 index. Our results provide not only an investment strategy that gains particularly in times of financial turmoil and extensive losses by other market participants, but reveal new sectoral patterns between mass online behavior and (bearish) stock market movements. Based on collective attention shifts in search queries for individual companies, hence, these findings can help to identify early warning signs of financial systemic risk. However, our disaggregated data also illustrate the need for further efforts to understand the influence of collective attention shifts on financial behavior in times of regular market activities with less tremendous changes in search volumes.

  20. COVID-19 Community Mobility Reports

    • google.com
    • google.com.tr
    • +5more
    csv, pdf
    Updated Oct 17, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Google (2022). COVID-19 Community Mobility Reports [Dataset]. https://www.google.com/covid19/mobility/
    Explore at:
    csv, pdfAvailable download formats
    Dataset updated
    Oct 17, 2022
    Dataset authored and provided by
    Googlehttp://google.com/
    Description

    As global communities responded to COVID-19, we heard from public health officials that the same type of aggregated, anonymized insights we use in products such as Google Maps would be helpful as they made critical decisions to combat COVID-19. These Community Mobility Reports aimed to provide insights into what changed in response to policies aimed at combating COVID-19. The reports charted movement trends over time by geography, across different categories of places such as retail and recreation, groceries and pharmacies, parks, transit stations, workplaces, and residential.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Trust level French people have in Google and Facebook to ensure data protection 2019 [Dataset]. https://www.statista.com/statistics/1010095/trust-google-facebook-developing-better-data-protection-france/
Organization logo

Trust level French people have in Google and Facebook to ensure data protection 2019

Explore at:
Dataset updated
Mar 21, 2022
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
May 15, 2019 - May 16, 2019
Area covered
France
Description

This pie chart displays the level of trust people have in Google and Facebook to develop better tools for personal data protection on the Internet in France in a survey from 2019. It shows that 37 percent of the respondents rather did not trust those companies to ensure data protection, while 35 percent declared they rather trusted them.

Search
Clear search
Close search
Google apps
Main menu