45 datasets found

A web tracking data set of online browsing behavior of 2,148 users
zenodo.org
explore.openaire.eu
application/gzip, txt +1
Updated May 14, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Juhi Kulshrestha; Juhi Kulshrestha; Marcos Oliveira; Marcos Oliveira; Orkut Karacalik; Denis Bonnay; Claudia Wagner; Orkut Karacalik; Denis Bonnay; Claudia Wagner (2021). A web tracking data set of online browsing behavior of 2,148 users [Dataset]. http://doi.org/10.5281/zenodo.4757574
Explore at:
zip, txt, application/gzipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.4757574
Dataset updated
May 14, 2021
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Juhi Kulshrestha; Juhi Kulshrestha; Marcos Oliveira; Marcos Oliveira; Orkut Karacalik; Denis Bonnay; Claudia Wagner; Orkut Karacalik; Denis Bonnay; Claudia Wagner
License
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Description
This anonymized data set consists of one month's (October 2018) web tracking data of 2,148 German users. For each user, the data contains the anonymized URL of the webpage the user visited, the domain of the webpage, category of the domain, which provides 41 distinct categories. In total, these 2,148 users made 9,151,243 URL visits, spanning 49,918 unique domains. For each user in our data set, we have self-reported information (collected via a survey) about their gender and age.

We acknowledge the support of Respondi AG, which provided the web tracking and survey data free of charge for research purposes, with special thanks to François Erner and Luc Kalaora at Respondi for their insights and help with data extraction.

The data set is analyzed in the following paper:

Kulshrestha, J., Oliveira, M., Karacalik, O., Bonnay, D., Wagner, C. "Web Routineness and Limits of Predictability: Investigating Demographic and Behavioral Differences Using Web Tracking Data." Proceedings of the International AAAI Conference on Web and Social Media. 2021. https://arxiv.org/abs/2012.15112.

The code used to analyze the data is also available at https://github.com/gesiscss/web_tracking.

If you use data or code from this repository, please cite the paper above and the Zenodo link.
u
Google Analytics & Twitter dataset from a movies, TV series and videogames...
portalcientificovalencia.univeuropea.com
figshare.com
Updated 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yeste, Víctor; Yeste, Víctor (2024). Google Analytics & Twitter dataset from a movies, TV series and videogames website [Dataset]. https://portalcientificovalencia.univeuropea.com/documentos/67321ed3aea56d4af0485dc8
Explore at:
Dataset updated
2024
Authors
Yeste, Víctor; Yeste, Víctor
Description
Author: Víctor Yeste. Universitat Politècnica de Valencia.The object of this study is the design of a cybermetric methodology whose objectives are to measure the success of the content published in online media and the possible prediction of the selected success variables.In this case, due to the need to integrate data from two separate areas, such as web publishing and the analysis of their shares and related topics on Twitter, has opted for programming as you access both the Google Analytics v4 reporting API and Twitter Standard API, always respecting the limits of these.The website analyzed is hellofriki.com. It is an online media whose primary intention is to solve the need for information on some topics that provide daily a vast number of news in the form of news, as well as the possibility of analysis, reports, interviews, and many other information formats. All these contents are under the scope of the sections of cinema, series, video games, literature, and comics.This dataset has contributed to the elaboration of the PhD Thesis:Yeste Moreno, VM. (2021). Diseño de una metodología cibermétrica de cálculo del éxito para la optimización de contenidos web [Tesis doctoral]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/176009Data have been obtained from each last-minute news article published online according to the indicators described in the doctoral thesis. All related data are stored in a database, divided into the following tables:tesis_followers: User ID list of media account followers.tesis_hometimeline: data from tweets posted by the media account sharing breaking news from the web.status_id: Tweet IDcreated_at: date of publicationtext: content of the tweetpath: URL extracted after processing the shortened URL in textpost_shared: Article ID in WordPress that is being sharedretweet_count: number of retweetsfavorite_count: number of favoritestesis_hometimeline_other: data from tweets posted by the media account that do not share breaking news from the web. Other typologies, automatic Facebook shares, custom tweets without link to an article, etc. With the same fields as tesis_hometimeline.tesis_posts: data of articles published by the web and processed for some analysis.stats_id: Analysis IDpost_id: Article ID in WordPresspost_date: article publication date in WordPresspost_title: title of the articlepath: URL of the article in the middle webtags: Tags ID or WordPress tags related to the articleuniquepageviews: unique page viewsentrancerate: input ratioavgtimeonpage: average visit timeexitrate: output ratiopageviewspersession: page views per sessionadsense_adunitsviewed: number of ads viewed by usersadsense_viewableimpressionpercent: ad display ratioadsense_ctr: ad click ratioadsense_ecpm: estimated ad revenue per 1000 page viewstesis_stats: data from a particular analysis, performed at each published breaking news item. Fields with statistical values can be computed from the data in the other tables, but total and average calculations are saved for faster and easier further processing.id: ID of the analysisphase: phase of the thesis in which analysis has been carried out (right now all are 1)time: "0" if at the time of publication, "1" if 14 days laterstart_date: date and time of measurement on the day of publicationend_date: date and time when the measurement is made 14 days latermain_post_id: ID of the published article to be analysedmain_post_theme: Main section of the published article to analyzesuperheroes_theme: "1" if about superheroes, "0" if nottrailer_theme: "1" if trailer, "0" if notname: empty field, possibility to add a custom name manuallynotes: empty field, possibility to add personalized notes manually, as if some tag has been removed manually for being considered too generic, despite the fact that the editor put itnum_articles: number of articles analysednum_articles_with_traffic: number of articles analysed with traffic (which will be taken into account for traffic analysis)num_articles_with_tw_data: number of articles with data from when they were shared on the media’s Twitter accountnum_terms: number of terms analyzeduniquepageviews_total: total page viewsuniquepageviews_mean: average page viewsentrancerate_mean: average input ratioavgtimeonpage_mean: average duration of visitsexitrate_mean: average output ratiopageviewspersession_mean: average page views per sessiontotal: total of ads viewedadsense_adunitsviewed_mean: average of ads viewedadsense_viewableimpressionpercent_mean: average ad display ratioadsense_ctr_mean: average ad click ratioadsense_ecpm_mean: estimated ad revenue per 1000 page viewsTotal: total incomeretweet_count_mean: average incomefavorite_count_total: total of favoritesfavorite_count_mean: average of favoritesterms_ini_num_tweets: total tweets on the terms on the day of publicationterms_ini_retweet_count_total: total retweets on the terms on the day of publicationterms_ini_retweet_count_mean: average retweets on the terms on the day of publicationterms_ini_favorite_count_total: total of favorites on the terms on the day of publicationterms_ini_favorite_count_mean: average of favorites on the terms on the day of publicationterms_ini_followers_talking_rate: ratio of followers of the media Twitter account who have recently published a tweet talking about the terms on the day of publicationterms_ini_user_num_followers_mean: average followers of users who have spoken of the terms on the day of publicationterms_ini_user_num_tweets_mean: average number of tweets published by users who spoke about the terms on the day of publicationterms_ini_user_age_mean: average age in days of users who have spoken of the terms on the day of publicationterms_ini_ur_inclusion_rate: URL inclusion ratio of tweets talking about terms on the day of publicationterms_end_num_tweets: total tweets on terms 14 days after publicationterms_ini_retweet_count_total: total retweets on terms 14 days after publicationterms_ini_retweet_count_mean: average retweets on terms 14 days after publicationterms_ini_favorite_count_total: total bookmarks on terms 14 days after publicationterms_ini_favorite_count_mean: average of favorites on terms 14 days after publicationterms_ini_followers_talking_rate: ratio of media Twitter account followers who have recently posted a tweet talking about the terms 14 days after publicationterms_ini_user_num_followers_mean: average followers of users who have spoken of the terms 14 days after publicationterms_ini_user_num_tweets_mean: average number of tweets published by users who have spoken about the terms 14 days after publicationterms_ini_user_age_mean: the average age in days of users who have spoken of the terms 14 days after publicationterms_ini_ur_inclusion_rate: URL inclusion ratio of tweets talking about terms 14 days after publication.tesis_terms: data of the terms (tags) related to the processed articles.stats_id: Analysis IDtime: "0" if at the time of publication, "1" if 14 days laterterm_id: Term ID (tag) in WordPressname: Name of the termslug: URL of the termnum_tweets: number of tweetsretweet_count_total: total retweetsretweet_count_mean: average retweetsfavorite_count_total: total of favoritesfavorite_count_mean: average of favoritesfollowers_talking_rate: ratio of followers of the media Twitter account who have recently published a tweet talking about the termuser_num_followers_mean: average followers of users who were talking about the termuser_num_tweets_mean: average number of tweets published by users who were talking about the termuser_age_mean: average age in days of users who were talking about the termurl_inclusion_rate: URL inclusion ratio
s
Statistics Interface Province-Level Data Collection - Datasets - This...
store.smartdatahub.io
Updated Nov 11, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). Statistics Interface Province-Level Data Collection - Datasets - This service has been deprecated - please visit https://www.smartdatahub.io/ to access data. See the About page for details. // [Dataset]. https://store.smartdatahub.io/dataset/fi_tilastokeskus_tilastointialueet_maakunta1000k
Explore at:
Dataset updated
Nov 11, 2024
Description
The dataset collection in question is a compilation of related data tables sourced from the website of Tilastokeskus (Statistics Finland) in Finland. The data present in the collection is organized in a tabular format comprising of rows and columns, each holding related data. The collection includes several tables, each of which represents different years, providing a temporal view of the data. The description provided by the data source, Tilastokeskuksen palvelurajapinta (Statistics Finland's service interface), suggests that the data is likely to be statistical in nature and could be related to regional statistics, given the nature of the source. This dataset is licensed under CC BY 4.0 (Creative Commons Attribution 4.0, https://creativecommons.org/licenses/by/4.0/deed.fi).
e
OGD Portal: Daily usage by record (since January 2024)
data.europa.eu
csv, excel xls, json +5
Updated Apr 6, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
kanton-basel-landschaft (2025). OGD Portal: Daily usage by record (since January 2024) [Dataset]. https://data.europa.eu/data/datasets/12610-kanton-basel-landschaft?locale=en
Explore at:
n3, rdf xml, csv, json-ld, json, rdf turtle, parquet, excel xlsAvailable download formats
Dataset updated
Apr 6, 2025
Dataset authored and provided by
kanton-basel-landschaft
License
http://dcat-ap.ch/vocabulary/licenses/terms_byhttp://dcat-ap.ch/vocabulary/licenses/terms_by
Description
The data on the use of the data sets on the OGD portal BL (data.bl.ch) are collected and published by the specialist and coordination office OGD BL. Contains the day the usage was measured.dataset_title: The title of the dataset_id record: The technical ID of the dataset.visitors: Specifies the number of daily visitors to the record. Visitors are recorded by counting the unique IP addresses that recorded access on the day of the survey. The IP address represents the network address of the device from which the portal was accessed.interactions: Includes all interactions with any record on data.bl.ch. A visitor can trigger multiple interactions. Interactions include clicks on the website (searching datasets, filters, etc.) as well as API calls (downloading a dataset as a JSON file, etc.).RemarksOnly calls to publicly available datasets are shown.IP addresses and interactions of users with a login of the Canton of Basel-Landschaft - in particular of employees of the specialist and coordination office OGD - are removed from the dataset before publication and therefore not shown.Calls from actors that are clearly identifiable as bots by the user agent header are also not shown.Combinations of dataset and date for which no use occurred (Visitors == 0 & Interactions == 0) are not shown.Due to synchronization problems, data may be missing by the day.
A
‘K-Pop Hits Through The Years’ analyzed by Analyst-2
analyst-2.ai
Updated Oct 14, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2021). ‘K-Pop Hits Through The Years’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-k-pop-hits-through-the-years-48a1/latest
Explore at:
Dataset updated
Oct 14, 2021
Dataset authored and provided by
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Analysis of ‘K-Pop Hits Through The Years’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/sberj127/kpop-hits-through-the-years on 28 January 2022.

--- Dataset description provided by original source is as follows ---

What is the data?

The datasets contain the top songs from the said era or year accordingly (as presented in the name of each dataset). Note that only the KPopHits90s dataset represents an era (1989-2001). Although there is a lack of easily available and reliable sources to show the actual K-Pop hits per year during the 90s, this era was still included as this time period was when the first generation of K-Pop stars appeared. Each of the other datasets represent a specific year after the 90s.

How was it obtained?

A song is considered to be a K-Pop hit during that era or year if it is included in the annual series of K-Pop Hits playlists, which is created officially by Apple Music. Note that for the dataset that represents the 90s, the playlist 90s K-Pop Essentials was used as the reference.

These playlists were transferred into Spotify through the Tune My Music site. After transferring, the site also presented all the missing songs from each Spotify playlist when compared to the original Apple Music playlists.

Any data besides the names and artists of the hit songs were not directly obtained from Apple Music since these other details of songs in this music service are only available for those enrolled as members of the Apple Developer Program.

The presented missing songs from each playlist was manually searched and, if found, added to the respective Spotify playlist.

For the songs that were found, there are three types: (1) the song by the original artist, (2) the instrumental of the original song and (3) a cover of the song. When the first type is not found, the two other types are searched and are compared to each other. The one that sounded the most like the original song (from the Apple Music playlist) is chosen as the substitute in the Spotify playlist.

Presented is a link containing all the missing data per playlist (when the initial Spotify playlists were compared to the original Apple Music playlists) and the action done to each one.

The necessary identification details and specific audio features of each track were obtained through the use of the Spotipy library and Spotify Web API documentation.

Why did you make this?

As someone who has a particular curiosity to the field of data science and a genuine love for the musicality in the K-Pop scene, this data set was created to make something out of the strong interest I have for these separate subjects.

Acknowledgements

I would like to express my sincere gratitude to Apple Music for creating the annual K-Pop playlists, Spotify for making their API very accessible, Spotipy for making it easier to get the desired data from the Spotify Web API, Tune My Music for automating the process of transferring one's library into another service's library and, of course, all those involved in the making of these songs and artists included in these datasets for creating such high quality music and concepts digestible even for the general public.

--- Original source retains full ownership of the source dataset ---
s
Statistics Interface Large Area Dataset 2015 - Datasets - This service has...
store.smartdatahub.io
Updated Nov 11, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). Statistics Interface Large Area Dataset 2015 - Datasets - This service has been deprecated - please visit https://www.smartdatahub.io/ to access data. See the About page for details. // [Dataset]. https://store.smartdatahub.io/dataset/fi_tilastokeskus_tilastointialueet_suuralue1000k_2015
Explore at:
Dataset updated
Nov 11, 2024
Description
The dataset collection in question is a compilation of statistical area data. It includes one or more tables of interconnected data, structured in the form of rows and columns. The data in the collection is sourced from the 'Statistics Centre' (Tilastokeskus), a recognized institution in Finland. The description provided by the data source, translated to English, is 'Statistical Centre's Service Interface (WFS)'. This suggests that the dataset collection is likely a representation of statistical data provided through a web feature service by the Statistics Centre. The dataset collection might include various statistical area details, possibly related to the greater area of 1000 square kilometers, as suggested by the year 2015, which may indicate the time period the data covers. This dataset is licensed under CC BY 4.0 (Creative Commons Attribution 4.0, https://creativecommons.org/licenses/by/4.0/deed.fi).
d
Inpatient, Emergency Department, and Outpatient Visits for Respiratory...
catalog.data.gov
data.cityofchicago.org
+1more
Updated Jun 21, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
data.cityofchicago.org (2025). Inpatient, Emergency Department, and Outpatient Visits for Respiratory Illnesses [Dataset]. https://catalog.data.gov/dataset/inpatient-emergency-department-and-outpatient-visits-for-respiratory-illnesses
Explore at:
Dataset updated
Jun 21, 2025
Dataset provided by
data.cityofchicago.org
Description
This dataset includes aggregated weekly data on the percent of emergency department visits and the percent of hospital inpatient admissions due to influenza-like illness (ILI), COVID-19, influenza, RSV, and acute respiratory illness. The Illinois Department of Public Health (IDPH) collects data for Emergency Department visits to all 185 acute care hospitals in Illinois. The data are submitted from IDPH to the CDC’s BioSense Platform for access and analysis by health departments via the ESSENCE system. The CDC National Syndromic Surveillance Program (NSSP) utilizes diagnostic codes and clinical terms to create definitions for diagnosed COVID-19, influenza, RSV, and acute respiratory illness. For more information on diagnostic codes and clinical terms used, visit: https://www.cdc.gov/nssp/php/onboarding-resources/companion-guide-ed-data-respiratory-illness.html The data is characterized by selected demographic groups including age group and race/ethnicity. The dataset also includes percent of weekly outpatient visits due to ILI as reported by several outpatient clinics throughout Chicago that participate in CDC’s Influenza-like Illness Surveillance Network (ILINet). For more information on ESSENCE, see https://www.dph.illinois.gov/data-statistics/syndromic-surveillance For more information on ILINet, see https://www.cdc.gov/fluview/overview/index.html#cdc_generic_section_3-outpatient-illness-surveillance All data are provisional and subject to change. Information is updated as additional details are received. At any given time, this dataset reflects data currently known to CDPH. Numbers in this dataset may differ from other public sources.
o
Michigan Public Policy Survey Public Use Datasets
openicpsr.org
delimited, spss +1
Updated Aug 19, 2016
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Center for Local, State, and Urban Policy (2016). Michigan Public Policy Survey Public Use Datasets [Dataset]. http://doi.org/10.3886/E100132V30
Explore at:
delimited, spss, stataAvailable download formats
Unique identifier
https://doi.org/10.3886/E100132V30
Dataset updated
Aug 19, 2016
Dataset authored and provided by
Center for Local, State, and Urban Policy
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Michigan
Description
The Michigan Public Policy Survey (MPPS) is a program of state-wide surveys of local government leaders in Michigan. The MPPS is designed to fill an important information gap in the policymaking process. While there are ongoing surveys of the business community and of the citizens of Michigan, before the MPPS there were no ongoing surveys of local government officials that were representative of all general purpose local governments in the state. Therefore, while we knew the policy priorities and views of the state's businesses and citizens, we knew very little about the views of the local officials who are so important to the economies and community life throughout Michigan. The MPPS was launched in 2009 by the Center for Local, State, and Urban Policy (CLOSUP) at the University of Michigan and is conducted in partnership with the Michigan Association of Counties, Michigan Municipal League, and Michigan Townships Association. The associations provide CLOSUP with contact information for the survey's respondents, and consult on survey topics. CLOSUP makes all decisions on survey design, data analysis, and reporting, and receives no funding support from the associations. The surveys investigate local officials' opinions and perspectives on a variety of important public policy issues and solicit factual information about their localities relevant to policymaking. Over time, the program has covered issues such as fiscal, budgetary and operational policy, fiscal health, public sector compensation, workforce development, local-state governmental relations, intergovernmental collaboration, economic development strategies and initiatives such as placemaking and economic gardening, the role of local government in environmental sustainability, energy topics such as hydraulic fracturing ("fracking") and wind power, trust in government, views on state policymaker performance, opinions on the impacts of the Federal Stimulus Program (ARRA), and more. The program will investigate many other issues relevant to local and state policy in the future. A searchable database of every question the MPPS has asked is available on CLOSUP's website. Results of MPPS surveys are currently available as reports, and via online data tables. Out of a commitment to promoting public knowledge of Michigan local governance, the Center for Local, State, and Urban Policy is releasing public use datasets. In order to protect respondent confidentiality, CLOSUP has divided the data collected in each wave of the survey into separate datasets focused on different topics that were covered in the survey. Each dataset contains only variables relevant to that subject, and the datasets cannot be linked together. Variables have also been omitted or recoded to further protect respondent confidentiality. For researchers looking for a more extensive release of the MPPS data, restricted datasets are available through openICPSR's Virtual Data Enclave. Please note: additional waves of MPPS public use datasets are being prepared, and will be available as part of this project as soon as they are completed. For information on accessing MPPS public use and restricted datasets, please visit the MPPS data access page: http://closup.umich.edu/mpps-download-datasets
d
Museum Visitors
catalog.data.gov
data.lacity.org
Updated May 24, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
data.lacity.org (2025). Museum Visitors [Dataset]. https://catalog.data.gov/dataset/museum-visitors
Explore at:
Dataset updated
May 24, 2025
Dataset provided by
data.lacity.org
Description
Individual visits to El Pueblo museums, per month. *The Museum of Social Justice is an independently operated museum, and reopened to the public May 2021. All El Pueblo-operated museums partially reopened June 10, 2021.
G
DEEPEN Global Standardized Categorical Exploration Datasets for Magmatic...
gdr.openei.org
data.openei.org
+1more
website
Updated Jun 30, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nicole Taverna; Nils Caliandro; Rachel King; Nicole Taverna; Nils Caliandro; Rachel King (2023). DEEPEN Global Standardized Categorical Exploration Datasets for Magmatic Plays [Dataset]. http://doi.org/10.15121/1995526
Explore at:
websiteAvailable download formats
Unique identifier
https://doi.org/10.15121/1995526
Dataset updated
Jun 30, 2023
Dataset provided by
National Renewable Energy Laboratory
USDOE Office of Energy Efficiency and Renewable Energy (EERE), Renewable Power Office. Geothermal Technologies Program (EE-4G)
Geothermal Data Repository
Authors
Nicole Taverna; Nils Caliandro; Rachel King; Nicole Taverna; Nils Caliandro; Rachel King
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
DEEPEN stands for DE-risking Exploration of geothermal Plays in magmatic ENvironments.

As part of the development of the DEEPEN 3D play fairway analysis (PFA) methodology for magmatic plays (conventional hydrothermal, superhot EGS, and supercritical), weights needed to be developed for use in the weighted sum of the different favorability index models produced from geoscientific exploration datasets. This was done using two different approaches: one based on expert opinions, and one based on statistical learning. This GDR submission includes the datasets used to produce the statistical learning-based weights.

While expert opinions allow us to include more nuanced information in the weights, expert opinions are subject to human bias. Data-centric or statistical approaches help to overcome these potential human biases by focusing on and drawing conclusions from the data alone. The drawback is that, to apply these types of approaches, a dataset is needed. Therefore, we attempted to build comprehensive standardized datasets mapping anomalies in each exploration dataset to each component of each play. This data was gathered through a literature review focused on magmatic hydrothermal plays along with well-characterized areas where superhot or supercritical conditions are thought to exist. Datasets were assembled for all three play types, but the hydrothermal dataset is the least complete due to its relatively low priority.

For each known or assumed resource, the dataset states what anomaly in each exploration dataset is associated with each component of the system. The data is only a semi-quantitative, where values are either high, medium, or low, relative to background levels. In addition, the dataset has significant gaps, as not every possible exploration dataset has been collected and analyzed at every known or suspected geothermal resource area, in the context of all possible play types. The following training sites were used to assemble this dataset: - Conventional magmatic hydrothermal: Akutan (from AK PFA), Oregon Cascades PFA, Glass Buttes OR, Mauna Kea (from HI PFA), Lanai (from HI PFA), Mt St Helens Shear Zone (from WA PFA), Wind River Valley (From WA PFA), Mount Baker (from WA PFA). - Superhot EGS: Newberry (EGS demonstration project), Coso (EGS demonstration project), Geysers (EGS demonstration project), Eastern Snake River Plain (EGS demonstration project), Utah FORGE, Larderello, Kakkonda, Taupo Volcanic Zone, Acoculco, Krafla. - Supercritical: Coso, Geysers, Salton Sea, Larderello, Los Humeros, Taupo Volcanic Zone, Krafla, Reyjanes, Hengill. **Disclaimer: Treat the supercritical fluid anomalies with skepticism. They are based on assumptions due to the general lack of confirmed supercritical fluid encounters and samples at the sites included in this dataset, at the time of assembling the dataset. The main assumption was that the supercritical fluid in a given geothermal system has shared properties with the hydrothermal fluid, which may not be the case in reality.

Once the datasets were assembled, principal component analysis (PCA) was applied to each. PCA is an unsupervised statistical learning technique, meaning that labels are not required on the data, that summarized the directions of variance in the data. This approach was chosen because our labels are not certain, i.e., we do not know with 100% confidence that superhot resources exist at all the assumed positive areas. We also do not have data for any known non-geothermal areas, meaning that it would be challenging to apply a supervised learning technique. In order to generate weights from the PCA, an analysis of the PCA loading values was conducted. PCA loading values represent how much a feature is contributing to each principal component, and therefore the overall variance in the data.
Study of Women's Health Across the Nation (SWAN), 2005-2007: Visit 09...
icpsr.umich.edu
ascii, delimited, r +3
Updated Nov 20, 2018
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sutton-Tyrell, Kim; Selzer, Faith; Sowers, MaryFran R. (Mary Francis Roy); Finkelstein, Joel; Powell, Lynda; Gold, Ellen; Greendale, Gail; Weiss, Gerson; Matthews, Karen; Brooks, Maria Mori (2018). Study of Women's Health Across the Nation (SWAN), 2005-2007: Visit 09 Dataset [Dataset]. http://doi.org/10.3886/ICPSR32721.v2
Explore at:
sas, r, stata, delimited, ascii, spssAvailable download formats
Unique identifier
https://doi.org/10.3886/ICPSR32721.v2
Dataset updated
Nov 20, 2018
Dataset provided by
Inter-university Consortium for Political and Social Researchhttps://www.icpsr.umich.edu/web/pages/
Authors
Sutton-Tyrell, Kim; Selzer, Faith; Sowers, MaryFran R. (Mary Francis Roy); Finkelstein, Joel; Powell, Lynda; Gold, Ellen; Greendale, Gail; Weiss, Gerson; Matthews, Karen; Brooks, Maria Mori
License
https://www.icpsr.umich.edu/web/ICPSR/studies/32721/termshttps://www.icpsr.umich.edu/web/ICPSR/studies/32721/terms
Time period covered
Feb 15, 2005 - Jan 31, 2007
Area covered
Contra Costa County, Michigan, Pittsburgh, Pennsylvania, Massachusetts, Inkster, Los Angeles, Alameda County, Illinois
Description
The Study of Women's Health Across the Nation (SWAN), is a multi-site longitudinal, epidemiologic study designed to examine the health of women during their middle years. The study examines the physical, biological, psychological and social changes during this transitional period. The goal of SWAN's research is to help scientists, health care providers and women learn how mid-life experiences affect health and quality of life during aging. Data were collected about doctor visits, medical conditions, medications, treatments, medical procedures, relationships, smoking, and menopause related information such as age at pre-, peri- and post-menopause, self-attitudes, feelings, and common physical problems associated with menopause. The study began in 1994. Between 2005 and 2007, 2,255 of the 3,302 women that joined SWAN were seen for their ninth follow-up visit. The research centers are located in the following communities: Ypsilanti and Inkster, MI (University of Michigan); Boston, MA (Massachusetts General Hospital); Chicago, IL (Rush Presbyterian-St. Luke's Medical Center); Alameda and Contra Costa County, CA (University of California-Davis and Kaiser Permanente); Los Angeles, CA (University of California-Los Angeles); Hackensack, NJ (Hackensack University Medical Center); and Pittsburgh, PA (University of Pittsburgh). SWAN participants represent five racial/ethnic groups and a variety of backgrounds and cultures. Though the New Jersey site was still part of the study, data was not collected from this site for the ninth visit. Demographic and background information includes age, language of interview, marital status, household composition, and employment.
d
NYSERDA New York Offshore Wind Supply Chain Dataset
catalog.data.gov
gimi9.com
+1more
Updated Dec 13, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
data.ny.gov (2024). NYSERDA New York Offshore Wind Supply Chain Dataset [Dataset]. https://catalog.data.gov/dataset/nyserda-new-york-offshore-wind-supply-chain-dataset
Explore at:
Dataset updated
Dec 13, 2024
Dataset provided by
data.ny.gov
Description
The dataset contains contact and description information for local supply chain organizations, offshore wind developers, and original equipment manufacturers that provide goods and services to support New York State’s offshore wind industry. To request placement in this database, or to update your company’s information, please visit NYSERDA’s Supply Chain Database webpage at https://www.nyserda.ny.gov/All-Programs/Offshore-Wind/Focus-Areas/Supply-Chain-Economic-Development/Supply-Chain-Database to submit a request form. How does your organization use this dataset? What other NYSERDA or energy-related datasets would you like to see on Open NY? Let us know by emailing OpenNY@nyserda.ny.gov. The New York State Energy Research and Development Authority (NYSERDA) offers objective information and analysis, innovative programs, technical expertise, and support to help New Yorkers increase energy efficiency, save money, use renewable energy, and reduce reliance on fossil fuels. To learn more about NYSERDA’s programs, visit https://nyserda.ny.gov or follow us on Twitter, Facebook, YouTube, or Instagram.
O
TDA Meals Served Dashboard - Visits
data.texas.gov
application/rdfxml +5
Updated Jun 16, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2025). TDA Meals Served Dashboard - Visits [Dataset]. https://data.texas.gov/dataset/TDA-Meals-Served-Dashboard-Visits/2eqh-gjgb
Explore at:
csv, xml, json, tsv, application/rssxml, application/rdfxmlAvailable download formats
Dataset updated
Jun 16, 2025
Description
This dataset is a complete inventory of all assets on this site and any assets sourced from other sites, if applicable. Use this dataset to track the performance of data publishing, conduct metadata maintenance, or present an overview of what kinds of data exists on the site.
z
Water Supply Node - Dataset - data.govt.nz - discover and use data
portal.zero.govt.nz
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Water Supply Node - Dataset - data.govt.nz - discover and use data [Dataset]. https://portal.zero.govt.nz/77d6ef04507c10508fcfc67a7c24be32/dataset/water-supply-node9
Explore at:
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Asset inventory data for a variety of structures and infrastructure relating to water systems or drainage in urban areas. The features in this dataset are measured by length and represent linear features such as pipe networks or open drains. The information is extracted from the asset inventory database on a daily basis. Items identified have been geolocated over a long period of time and through various methods, including information provided by 3rd parties. In general, asset locations are obtained from as built diagrams and as such may not be validated in all circumstances. The asset inventory is frequently updated and modification can be made to the asset data structure (asset hierarchy) without prior notification. Due to a wide range of source information all asset locations should be verified through the Asset Information Officers and or site visits. This is an incomplete dataset, other information is held and maintained independently.The primary purpose of this inventory is for asset valuations. The inventory is utilised in forward works and capital work planning. Information on Water Supply assets for service requests is displayed on 3 Waters map. The Water Supply network is an integral part of the land use and consents process, however site visits should be done to validate the status, position and condition of assets.Waikato OneView does not make any representation or give any warranty as to the accuracy or exhaustiveness of the data released for public download. Locations and dimensions of assets depicted in the data may not be accurate due to circumstances not notified to Council. While you are free to crop, export and re-purpose the data, we ask that you attribute the Waikato OneView and clearly state that your work is a derivative and not the authoritative data source.
Asthma Emergency Department Visit Rates
data.chhs.ca.gov
data.ca.gov
+3more
csv, pdf, zip
Updated Aug 28, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
California Department of Public Health (2024). Asthma Emergency Department Visit Rates [Dataset]. https://data.chhs.ca.gov/dataset/asthma-emergency-department-visit-rates
Explore at:
pdf, zip, csv(449238), csv(696499), csv(423817)Available download formats
Dataset updated
Aug 28, 2024
Dataset authored and provided by
California Department of Public Healthhttps://www.cdph.ca.gov/
Description
This dataset contains counts and rates (per 10,000 residents) of asthma emergency department (ED) visits among Californians. The table “Asthma Emergency Department Visit Rates by County” contains statewide and county-level data stratified by age group (all ages, 0-17, 18+, 0-4, 5-17, 18-64, 65+) and race/ethnicity (white, black, Hispanic, Asian/Pacific Islander, American Indian/Alaskan Native). The table “Asthma Emergency Department Visit Rates by ZIP Code” contains zip-code level data stratified by age group (all ages, 0-17, 18+). The data are derived from the Department of Health Care Access and Information emergency department database. These data include emergency department visits from all licensed hospitals in California. These data are based only on primary discharge diagnosis codes. On October 1, 2015, diagnostic coding for asthma transitioned from ICD9-CM (493) to ICD10-CM (J45). Because of this change, CDPH and CDC do not recommend comparing data from 2015 (or earlier) to 2016 (or later). NOTE: Rates are calculated from the total number of asthma emergency department visits (not the unique number of individuals).
A
‘Coursera Course Dataset’ analyzed by Analyst-2
analyst-2.ai
Updated Jan 28, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2022). ‘Coursera Course Dataset’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-coursera-course-dataset-839a/86aaffe7/?iid=003-724&v=presentation
Explore at:
Dataset updated
Jan 28, 2022
Dataset authored and provided by
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Analysis of ‘Coursera Course Dataset’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/siddharthm1698/coursera-course-dataset on 28 January 2022.

--- Dataset description provided by original source is as follows ---

Context

This is a dataset i generated during a hackathon for project purpose. Here i have scrapped data from Coursera official web site. Our project aims to help any new learner get the right course to learn by just answering a few questions. It is an intelligent course recommendation system. Hence we had to scrap data from few educational websites. This is data scrapped from Coursera website. For the project visit: https://github.com/Siddharth1698/Coursu . Please do show your support by following us. I have just started to learn on data science and hope this dataset will be helpful to someone for his/her personal purposes. The scrapping code is here : https://github.com/Siddharth1698/Coursera-Course-Dataset Article about the dataset generation : https://medium.com/analytics-vidhya/web-scraping-and-coursera-8db6af45d83f

Content

This dataset contains mainly 6 columns and 890 course data. The detailed description: 1. course_title : Contains the course title. 2. course_organization : It tells which organization is conducting the courses. 3. course_Certificate_type : It has details about what are the different certifications available in courses. 4. course_rating : It has the ratings associated with each course. 5. course_difficulty : It tells about how difficult or what is the level of the course. 6. course_students_enrolled : It has the number of students that are enrolled in the course.

Inspiration

This is just one of my first scrapped dataset. Follow my GitHub for more: https://github.com/Siddharth1698

--- Original source retains full ownership of the source dataset ---
z
Storm Water Pipe - Dataset - data.govt.nz - discover and use data
portal.zero.govt.nz
Updated Jun 17, 2019
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2019). Storm Water Pipe - Dataset - data.govt.nz - discover and use data [Dataset]. https://portal.zero.govt.nz/77d6ef04507c10508fcfc67a7c24be32/dataset/storm-water-pipe9
Explore at:
Dataset updated
Jun 17, 2019
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Asset inventory data for a variety of structures and infrastructure relating to storm water systems or drainage in urban areas. The features in this dataset are measured by length and represent linear features such as pipe networks or open drains. The information is extracted from the asset inventory database (Asset-Finda) on a daily basis. Items identified have been geolocated over a long period of time and through various methods, including information provided by 3rd parties. In general, asset locations are obtained from as built diagrams and as such may not be validated in all circumstances. The asset inventory is frequently updated and modification can be made to the asset data structure (asset hierarchy) without prior notification. Due to a wide range of source information all asset locations should be verified through the Asset Information Officers and or site visits. This is an incomplete dataset, other information is held and maintained independently. Waikato District Alliance holds storm water asset information for all assets under the road pavement. Waikato Regional Council holds further asset information in all the rural areas The primary purpose of this inventory is for asset valuations. The inventory is utilised in forward works and capital work planning. Information on Storm water assets for service requests is displayed on 3 Waters map. The storm water network is an integral part of the land use and consents process, however site visits should be done to validate the status, position and condition of assets.
VDEQ Springs FIELD MEASUREMENTS
data.virginia.gov
opendata.winchesterva.gov
Updated Aug 31, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Virginia Department of Environmental Quality (2023). VDEQ Springs FIELD MEASUREMENTS [Dataset]. https://data.virginia.gov/dataset/vdeq-springs-field-measurements
Explore at:
gdb, zip, xlsx, gpkg, csv, kml, arcgis geoservices rest api, html, txt, geojsonAvailable download formats
Dataset updated
Aug 31, 2023
Dataset authored and provided by
Virginia Department of Environmental Qualityhttps://deq.virginia.gov/
Description
VDEQ Spring SITES
The VDEQ Spring SITES database contains data describing the geographic locations and site attributes of natural springs throughout the commonwealth. This data coverage continues to evolve and contains only spring locations known to exist with a reasonable degree of certainty on the date of publication. The dataset does not replace site specific inventorying or receptor surveys but can be used as a starting point. VDEQ's initial geospatial dataset of approximately 325 springs was formed in 2008 by digitizing historical spring information sheets created by State Water Control Board geologists in the 1970s through early 1990s. Additional data has been consolidated from the EPA STORET database, the U.S. Geological Survey's Ground Water Site Inventory (GWSI) and Geographic Names Inventory System (GNIS), the Virginia Department of Health SDWIS database, the Virginia DEQ Virginia Water Use Data Set (VWUDS), the Commonwealth of Virginia Division of Water Resources and Power Bulletin No. 1: "Springs of Virginia" by Collins et al., 1930 as well as several VDWR&P Surface Water Supply bulletins from the 1940's - 1950's. A 1992 Virginia Department of Game and Inland Fisheries / Virginia Tech sponsored study by Helfrich et al. titled "Evaluation of the Natural Springs of Virginia: Fisheries Management Implications", a 2004 Rockbridge County groundwater resources report written by Frits van der Leeden, and several smaller datasets from consultants and citizens were evaluated and added to the database when confidence in locational accuracy was high or could be verified with aerial or LIDAR imagery. Significant contributions have been made throughout the years by VDEQ Groundwater Characterization staff site visits as well as other geologists working in the region including: Matt Heller at Virginia Division of Geology and Mineral Resources (VDMME), Wil Orndorff at the Virginia Department of Conservation and Recreation Karst Program (VDCR), and David Nelms and Dan Doctor of the U.S. Geological Survey (USGS). Substantial effort has been made to improve locational accuracy and remove duplication present between data sources. Hundreds of spring locations that were originally obtained using topographic maps or unknown methods were updated to sub-meter locational accuracy using post-processed differential GPS (PPGPS) and through the use of several generations of aerial imagery (2002-2017) obtained from Virginia's Geographic Information Network (VGIN) and 1-meter LIDAR, where available. Scores of new spring locations were also obtained by systematic quadrangle by quadrangle analysis in areas of the Shenandoah Valley where 1-meter LIDAR datasets where obtained from the U.S. Geological Survey. Future improvements to the dataset will result when statewide 1-meter LIDAR datasets becomes available and through continued field work by DEQ staff and other contributors working in the region. Please do not hesitate to contact the author to correct mistakes or to contribute to the database.

VDEQ_Springs_FIELD_MEASUREMENTS
The VDEQ Spring FIELD MEASUREMENTS database contains data describing field derived physio-chemical properties of spring discharges measured throughout the Commonwealth of Virginia. Field visits compiled in this dataset were performed from 1928 to 2019 by geologists with the State Water Control Board, the Virginia Division of Water and Power, the Virginia Department of Environmental Quality, and the U.S. Geological Survey with contributions from other sources as noted. Values of -9999 indicate that measurements were not performed for the referenced parameter. Please do not hesitate to contact the author to add data to the database or correct errors.

VDEQ_Springs_WQ
The VDEQ_Spring_WQ database is a geodatabase containing groundwater sample information collected from springs throughout Virginia. Sample specific information include: location and site information, measured field parameters, and lab verified quantifications of major ionic concentrations, trace element concentrations, nutrient concentrations, and radiological data. The VDEQ_Spring_WQ database is a subset of the VDEQ GWCHEM database which is a flat-file geodatabase containing groundwater sample information from groundwater wells and springs throughout Virginia. Sample information has been correlated via DEQ Well # and projected using coordinates in VDEQ_Spring_SITES database. The GWCHEM database is comprised of historic groundwater sample data originally archived in the United States Geological Survey (USGS) National Water Information System (NWIS) and the Environmental Protection Agency (EPA) Storage and Retrieval (STORET) data warehouse. Archived STORET data originated as groundwater sample data collected and uploaded by Virginia State Water Control Board Personnel. While groundwater sample data in the STORET data warehouse are static, new groundwater sample data are periodically uploaded to NWIS and spring laboratory WQ data reflect NWIS downloaded on 9/30/2019. Recent groundwater sample data collected by Virginia Department of Environmental Quality (DEQ) personnel as part of the Ambient Groundwater Sampling Program are entered into the database as lab results are made available by the Division of Consolidated Laboratory Services (DCLS). When possible, charge balances were calculated for samples with reported values for major ions including (at a minimum) calcium, magnesium, potassium, sodium, bicarbonate, chloride, and sulfate. Reported values for Nitrate as N, carbonate, and fluoride were included in the charge balance calculation when available. Field determined values for bicarbonate and carbonate were used in the charge balance calculation when available. For much of the legacy DEQ groundwater sample data, bicarbonate values were derived from lab reported values of alkalinity (as mg/CaCO3) under the assumption that there was no contribution by carbonate to the reported alkalinity value. Charge balance values are reported in the "Charge Balance" column of the GWCHEM geodatabase. The closer the charge balance value is to unity (1), the lower the assumed charge balance error.In order to preserve the numerical capabilities of the database, non- numeric lab qualifiers were given the following numeric identifiers:- (minus sign) = less than the concentration specified to the right of the sign-11110 = estimated-22220 = presence verified but not quantified-33330 = radchem non-detect, below sslc-4440 = analyzed for but not detected-55550 = greater than the concentration to the right of the zero-66660 = sample held beyond normal holding time-77770 = quality control failure. Data not valid.-88880 = sample held beyond normal holding time. Sample analyzed for but not detected. Value stored is limit of detection for proces in use.-11120 = Value reported is less than the criteria of detection.-9999 = no data (parameter not quantified)

A more in depth descprition and hydrogeologic analysis of the database can be found here
An in Depth data fact sheet can be found here
YouTube Trending Video Dataset (updated daily)
kaggle.com
zip
Updated Apr 15, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Rishav Sharma (2024). YouTube Trending Video Dataset (updated daily) [Dataset]. https://www.kaggle.com/rsrishav/YouTube-Trending-Video-Dataset
Explore at:
zip(0 bytes)Available download formats
Dataset updated
Apr 15, 2024
Authors
Rishav Sharma
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Area covered
YouTube
Description
This dataset is a daily record of the top trending YouTube videos and it will be updated daily.

Context

YouTube maintains a list of the top trending videos on the platform. According to Variety magazine, “To determine the year’s top-trending videos, YouTube uses a combination of factors including measuring users interactions (number of views, shares, comments and likes). Note that they’re not the most-viewed videos overall for the calendar year”.

Note that this dataset is a structurally improved version of this dataset.

Content

This dataset includes several months (and counting) of data on daily trending YouTube videos. Data is included for the IN, US, GB, DE, CA, FR, RU, BR, MX, KR, and JP regions (India, USA, Great Britain, Germany, Canada, France, Russia, Brazil, Mexico, South Korea, and, Japan respectively), with up to 200 listed trending videos per day.

Each region’s data is in a separate file. Data includes the video title, channel title, publish time, tags, views, likes and dislikes, description, and comment count.

The data also includes a category_id field, which varies between regions. To retrieve the categories for a specific video, find it in the associated JSON. One such file is included for each of the 11 regions in the dataset.

For more information on specific columns in the dataset refer to the column metadata.

Acknowledgements

This dataset was collected using the YouTube API. This dataset is the updated version of Trending YouTube Video Statistics.

Inspiration

Possible uses for this dataset could include: - Sentiment analysis in a variety of forms - Categorizing YouTube videos based on their comments and statistics. - Training ML algorithms like RNNs to generate their own YouTube comments. - Analyzing what factors affect how popular a YouTube video will be. - Statistical analysis over time.

For further inspiration, see the kernels on this dataset!
e
Dataset Direct Download Service (WFS): Operation Grand Site in Cantal
data.europa.eu
unknown
Updated Feb 18, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2022). Dataset Direct Download Service (WFS): Operation Grand Site in Cantal [Dataset]. https://data.europa.eu/data/datasets/fr-120066022-srv-29baa063-a292-4009-8686-6b8e0f950cd6
Explore at:
unknownAvailable download formats
Dataset updated
Feb 18, 2022
Description
A Grand Site operation is the approach proposed by the State to local and regional authorities in order to respond to the difficulties posed by welcoming visitors and maintaining sites classified as highly known and subject to high traffic. It makes it possible to define and implement a concerted project for the restoration, preservation and development of the territory.

It applies to a site classified under Articles L.341-1 to 22 of the Environmental Code (Law of 2 May 1930) faced with a problem of tourist use or maintenance for which management decisions of the site are required. Its purpose is to accompany the territory towards the eventual acquisition of the Grand Site de France label This label “Grand site de France”, owned by the State, has a legal scope since 2010 (Article L. 341-15-1 of the Environmental Code)

Facebook

Twitter

Click to copy link

Link copied

Cite

Juhi Kulshrestha; Juhi Kulshrestha; Marcos Oliveira; Marcos Oliveira; Orkut Karacalik; Denis Bonnay; Claudia Wagner; Orkut Karacalik; Denis Bonnay; Claudia Wagner (2021). A web tracking data set of online browsing behavior of 2,148 users [Dataset]. http://doi.org/10.5281/zenodo.4757574

A web tracking data set of online browsing behavior of 2,148 users

Explore at:

zip, txt, application/gzipAvailable download formats

Unique identifier

https://doi.org/10.5281/zenodo.4757574

Dataset updated

May 14, 2021

Dataset provided by

Zenodohttp://zenodo.org/

Authors

Juhi Kulshrestha; Juhi Kulshrestha; Marcos Oliveira; Marcos Oliveira; Orkut Karacalik; Denis Bonnay; Claudia Wagner; Orkut Karacalik; Denis Bonnay; Claudia Wagner

License

Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically

Description

This anonymized data set consists of one month's (October 2018) web tracking data of 2,148 German users. For each user, the data contains the anonymized URL of the webpage the user visited, the domain of the webpage, category of the domain, which provides 41 distinct categories. In total, these 2,148 users made 9,151,243 URL visits, spanning 49,918 unique domains. For each user in our data set, we have self-reported information (collected via a survey) about their gender and age.

We acknowledge the support of Respondi AG, which provided the web tracking and survey data free of charge for research purposes, with special thanks to François Erner and Luc Kalaora at Respondi for their insights and help with data extraction.

The data set is analyzed in the following paper:

Kulshrestha, J., Oliveira, M., Karacalik, O., Bonnay, D., Wagner, C. "Web Routineness and Limits of Predictability: Investigating Demographic and Behavioral Differences Using Web Tracking Data." Proceedings of the International AAAI Conference on Web and Social Media. 2021. https://arxiv.org/abs/2012.15112.

The code used to analyze the data is also available at https://github.com/gesiscss/web_tracking.

If you use data or code from this repository, please cite the paper above and the Zenodo link.

Clear search

Close search

Google apps

Main menu

A web tracking data set of online browsing behavior of 2,148 users

Google Analytics & Twitter dataset from a movies, TV series and videogames...

Statistics Interface Province-Level Data Collection - Datasets - This...

OGD Portal: Daily usage by record (since January 2024)

‘K-Pop Hits Through The Years’ analyzed by Analyst-2

What is the data?

How was it obtained?

Why did you make this?

Acknowledgements

Statistics Interface Large Area Dataset 2015 - Datasets - This service has...

Inpatient, Emergency Department, and Outpatient Visits for Respiratory...

Michigan Public Policy Survey Public Use Datasets

Museum Visitors

DEEPEN Global Standardized Categorical Exploration Datasets for Magmatic...

Study of Women's Health Across the Nation (SWAN), 2005-2007: Visit 09...

NYSERDA New York Offshore Wind Supply Chain Dataset

TDA Meals Served Dashboard - Visits

Water Supply Node - Dataset - data.govt.nz - discover and use data

Asthma Emergency Department Visit Rates

‘Coursera Course Dataset’ analyzed by Analyst-2

Context

Content

Inspiration

Storm Water Pipe - Dataset - data.govt.nz - discover and use data

VDEQ Springs FIELD MEASUREMENTS

YouTube Trending Video Dataset (updated daily)

This dataset is a daily record of the top trending YouTube videos and it will be updated daily.

Context

Content

Acknowledgements

Inspiration

Dataset Direct Download Service (WFS): Operation Grand Site in Cantal

A web tracking data set of online browsing behavior of 2,148 users