TagX Web Browsing Clickstream Data: Unveiling Digital Behavior Across North America and EU Unique Insights into Online User Behavior TagX Web Browsing clickstream Data offers an unparalleled window into the digital lives of 1 million users across North America and the European Union. This comprehensive dataset stands out in the market due to its breadth, depth, and stringent compliance with data protection regulations. What Makes Our Data Unique?
Extensive Geographic Coverage: Spanning two major markets, our data provides a holistic view of web browsing patterns in developed economies. Large User Base: With 300K active users, our dataset offers statistically significant insights across various demographics and user segments. GDPR and CCPA Compliance: We prioritize user privacy and data protection, ensuring that our data collection and processing methods adhere to the strictest regulatory standards. Real-time Updates: Our clickstream data is continuously refreshed, providing up-to-the-minute insights into evolving online trends and user behaviors. Granular Data Points: We capture a wide array of metrics, including time spent on websites, click patterns, search queries, and user journey flows.
Data Sourcing: Ethical and Transparent Our web browsing clickstream data is sourced through a network of partnered websites and applications. Users explicitly opt-in to data collection, ensuring transparency and consent. We employ advanced anonymization techniques to protect individual privacy while maintaining the integrity and value of the aggregated data. Key aspects of our data sourcing process include:
Voluntary user participation through clear opt-in mechanisms Regular audits of data collection methods to ensure ongoing compliance Collaboration with privacy experts to implement best practices in data anonymization Continuous monitoring of regulatory landscapes to adapt our processes as needed
Primary Use Cases and Verticals TagX Web Browsing clickstream Data serves a multitude of industries and use cases, including but not limited to:
Digital Marketing and Advertising:
Audience segmentation and targeting Campaign performance optimization Competitor analysis and benchmarking
E-commerce and Retail:
Customer journey mapping Product recommendation enhancements Cart abandonment analysis
Media and Entertainment:
Content consumption trends Audience engagement metrics Cross-platform user behavior analysis
Financial Services:
Risk assessment based on online behavior Fraud detection through anomaly identification Investment trend analysis
Technology and Software:
User experience optimization Feature adoption tracking Competitive intelligence
Market Research and Consulting:
Consumer behavior studies Industry trend analysis Digital transformation strategies
Integration with Broader Data Offering TagX Web Browsing clickstream Data is a cornerstone of our comprehensive digital intelligence suite. It seamlessly integrates with our other data products to provide a 360-degree view of online user behavior:
Social Media Engagement Data: Combine clickstream insights with social media interactions for a holistic understanding of digital footprints. Mobile App Usage Data: Cross-reference web browsing patterns with mobile app usage to map the complete digital journey. Purchase Intent Signals: Enrich clickstream data with purchase intent indicators to power predictive analytics and targeted marketing efforts. Demographic Overlays: Enhance web browsing data with demographic information for more precise audience segmentation and targeting.
By leveraging these complementary datasets, businesses can unlock deeper insights and drive more impactful strategies across their digital initiatives. Data Quality and Scale We pride ourselves on delivering high-quality, reliable data at scale:
Rigorous Data Cleaning: Advanced algorithms filter out bot traffic, VPNs, and other non-human interactions. Regular Quality Checks: Our data science team conducts ongoing audits to ensure data accuracy and consistency. Scalable Infrastructure: Our robust data processing pipeline can handle billions of daily events, ensuring comprehensive coverage. Historical Data Availability: Access up to 24 months of historical data for trend analysis and longitudinal studies. Customizable Data Feeds: Tailor the data delivery to your specific needs, from raw clickstream events to aggregated insights.
Empowering Data-Driven Decision Making In today's digital-first world, understanding online user behavior is crucial for businesses across all sectors. TagX Web Browsing clickstream Data empowers organizations to make informed decisions, optimize their digital strategies, and stay ahead of the competition. Whether you're a marketer looking to refine your targeting, a product manager seeking to enhance user experience, or a researcher exploring digital trends, our cli...
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
By [source]
This dataset collects job offers from web scraping which are filtered according to specific keywords, locations and times. This data gives users rich and precise search capabilities to uncover the best working solution for them. With the information collected, users can explore options that match with their personal situation, skillset and preferences in terms of location and schedule. The columns provide detailed information around job titles, employer names, locations, time frames as well as other necessary parameters so you can make a smart choice for your next career opportunity
For more datasets, click here.
- 🚨 Your notebook can be here! 🚨!
This dataset is a great resource for those looking to find an optimal work solution based on keywords, location and time parameters. With this information, users can quickly and easily search through job offers that best fit their needs. Here are some tips on how to use this dataset to its fullest potential:
Start by identifying what type of job offer you want to find. The keyword column will help you narrow down your search by allowing you to search for job postings that contain the word or phrase you are looking for.
Next, consider where the job is located – the Location column tells you where in the world each posting is from so make sure it’s somewhere that suits your needs!
Finally, consider when the position is available – look at the Time frame column which gives an indication of when each posting was made as well as if it’s a full-time/ part-time role or even if it’s a casual/temporary position from day one so make sure it meets your requirements first before applying!
Additionally, if details such as hours per week or further schedule information are important criteria then there is also info provided under Horari and Temps Oferta columns too! Now that all three criteria have been ticked off - key words, location and time frame - then take a look at Empresa (Company Name) and Nom_Oferta (Post Name) columns too in order to get an idea of who will be employing you should you land the gig!
All these pieces of data put together should give any motivated individual all they need in order to seek out an optimal work solution - keep hunting good luck!
- Machine learning can be used to groups job offers in order to facilitate the identification of similarities and differences between them. This could allow users to specifically target their search for a work solution.
- The data can be used to compare job offerings across different areas or types of jobs, enabling users to make better informed decisions in terms of their career options and goals.
- It may also provide an insight into the local job market, enabling companies and employers to identify where there is potential for new opportunities or possible trends that simply may have previously gone unnoticed
If you use this dataset in your research, please credit the original authors. Data Source
License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.
File: web_scraping_information_offers.csv | Column name | Description | |:-----------------|:------------------------------------| | Nom_Oferta | Name of the job offer. (String) | | Empresa | Company offering the job. (String) | | Ubicació | Location of the job offer. (String) | | Temps_Oferta | Time of the job offer. (String) | | Horari | Schedule of the job offer. (String) |
If you use this dataset in your research, please credit the original authors. If you use this dataset in your research, please credit .
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
About MCH Data Connect The MCH Data Connect provides public health professionals, researchers, practitioners, policy makers and students with a comprehensive catalog of maternal and child health data resources. Users can access a variety of databases, data sets, interactive tools, and maps related to their area of interest. Maternal and Child Health The MCH Data Connect uses a broad definition of Maternal and Child Health, including the influence of access to health care, health, health behaviors, education, violence, environmental conditions, demographics, and policy on the health of women, men, children, youth, families and communities. Topics Topics included in the MCH Data Connect: health care policy, experience of health care, family planning, sexual and reproductive health, economics, politics, social services, violence, and health behaviors, among others. Data Resources Data resources described in this catalog include data sets, statistics, interactive tables, interactive maps, and databases. Many of the data sources are available for public consumption, though specific databases may require th e user to purchase or apply for the dataset. Basic Search Locate the "Search Studies" highlighted box above the list of resource on the MCH Data Connect homepage. Leave "Cataloging Information" as the default basic search command. To search, enter the keyword, topic or area of interest in field box (next to "Cataloging Information") to obtain a list of resources that apply to your search. Access Resource Once the search is completed, a list of resources will appea r. The first line provides a brief summary. To get more information (including producer, background, user functionality and data sources) about the specific resource, click on the underlined/ blue hyperlinked title. Once the resource description is opened, click on the link that says “Click here to access data from site” to go directly to the resource's web page. Advanced Search Click on the "Advanced Search" link located in the "Search Studies" highlighted box above the list of resources on the MCH Data Connect homepage. From the Search Scope drop down lists, enter either Keyword or Abstract (these are the most detailed fields used by the MCH Data Connect). Enter multiple search terms to use the “and” searching criterion. For example, to search for resources related to diabetes and exercise, the user would select "Keyword" from the drop d own list, "contains" and then enter "diabetes" in the field box. The user would repeat the first two steps to enter "exercise" in the next field box. Collections The Topic Folders section provides a list of broad categories that include many resources found in the MCH Data Connect. The files of the Topic Folders are on the left side of the homepage. Clicking on a file folder will result in a list of the resources that are related to the topic. The Topic Folders offer a starting place for your search. You can narrow your search further by performing either of the previous two searching techniques within a collection. Qu estions or Comments? For questions, comments, or if you think we missed a useful information tool, please contact us via email at mchdataconnect@gmail.com. Glossary Some terms you will see on this website are unique to the cataloging service, Dataverse; The MCH Data Connect uses them differently. Please see below for a glossary of terms you will find at MCH Data Connect. Please note that interactive tools, datasets, and reports are referred to as “resources.” Te rms Dataverse, the program used to develop the MCH Data Connect Study, resource containing relevant public health data and/or information Collection, broad categories into which resources have been classified How to Cite, used as the resource title by MCH Data Connect Study Global ID, unique code given to each resource Producer, the agency or entity that produces and maintains the resource< /p> Deposit Date, date when resource was added to the MCH Data Connect Provenance, will always be MCH Data Connect Abstract and Scope, contains resource summary and geographic unit information Abstract, summary of the resource Background, information about the purpose and development of the resource User Functionality, what users can do with the data (i.e. download, customize charts) Dat a Notes, information about data sources, years and samples (if applicable) Abstract Date, month and year that resource was added to MCH Data Connect Keyword, specific variables, topics, or words that the resource addresses/encompasses Geographic Unit, level at which data is available Title, name of specific resource Keyword Vocabulary, “link:” clicking on “link” will take user to an external website relate d to the keyword term. The following terms are not used by the MCH Data Connect Dataverse: Topic Classification; Topic Classification Vocabulary; Other ID; Author; Distributor; Funding Agency; Production Date; Distribution Date; Time Period Covered Start; Time...
Author: Víctor Yeste. Universitat Politècnica de Valencia.The object of this study is the design of a cybermetric methodology whose objectives are to measure the success of the content published in online media and the possible prediction of the selected success variables.In this case, due to the need to integrate data from two separate areas, such as web publishing and the analysis of their shares and related topics on Twitter, has opted for programming as you access both the Google Analytics v4 reporting API and Twitter Standard API, always respecting the limits of these.The website analyzed is hellofriki.com. It is an online media whose primary intention is to solve the need for information on some topics that provide daily a vast number of news in the form of news, as well as the possibility of analysis, reports, interviews, and many other information formats. All these contents are under the scope of the sections of cinema, series, video games, literature, and comics.This dataset has contributed to the elaboration of the PhD Thesis:Yeste Moreno, VM. (2021). Diseño de una metodología cibermétrica de cálculo del éxito para la optimización de contenidos web [Tesis doctoral]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/176009Data have been obtained from each last-minute news article published online according to the indicators described in the doctoral thesis. All related data are stored in a database, divided into the following tables:tesis_followers: User ID list of media account followers.tesis_hometimeline: data from tweets posted by the media account sharing breaking news from the web.status_id: Tweet IDcreated_at: date of publicationtext: content of the tweetpath: URL extracted after processing the shortened URL in textpost_shared: Article ID in WordPress that is being sharedretweet_count: number of retweetsfavorite_count: number of favoritestesis_hometimeline_other: data from tweets posted by the media account that do not share breaking news from the web. Other typologies, automatic Facebook shares, custom tweets without link to an article, etc. With the same fields as tesis_hometimeline.tesis_posts: data of articles published by the web and processed for some analysis.stats_id: Analysis IDpost_id: Article ID in WordPresspost_date: article publication date in WordPresspost_title: title of the articlepath: URL of the article in the middle webtags: Tags ID or WordPress tags related to the articleuniquepageviews: unique page viewsentrancerate: input ratioavgtimeonpage: average visit timeexitrate: output ratiopageviewspersession: page views per sessionadsense_adunitsviewed: number of ads viewed by usersadsense_viewableimpressionpercent: ad display ratioadsense_ctr: ad click ratioadsense_ecpm: estimated ad revenue per 1000 page viewstesis_stats: data from a particular analysis, performed at each published breaking news item. Fields with statistical values can be computed from the data in the other tables, but total and average calculations are saved for faster and easier further processing.id: ID of the analysisphase: phase of the thesis in which analysis has been carried out (right now all are 1)time: "0" if at the time of publication, "1" if 14 days laterstart_date: date and time of measurement on the day of publicationend_date: date and time when the measurement is made 14 days latermain_post_id: ID of the published article to be analysedmain_post_theme: Main section of the published article to analyzesuperheroes_theme: "1" if about superheroes, "0" if nottrailer_theme: "1" if trailer, "0" if notname: empty field, possibility to add a custom name manuallynotes: empty field, possibility to add personalized notes manually, as if some tag has been removed manually for being considered too generic, despite the fact that the editor put itnum_articles: number of articles analysednum_articles_with_traffic: number of articles analysed with traffic (which will be taken into account for traffic analysis)num_articles_with_tw_data: number of articles with data from when they were shared on the media’s Twitter accountnum_terms: number of terms analyzeduniquepageviews_total: total page viewsuniquepageviews_mean: average page viewsentrancerate_mean: average input ratioavgtimeonpage_mean: average duration of visitsexitrate_mean: average output ratiopageviewspersession_mean: average page views per sessiontotal: total of ads viewedadsense_adunitsviewed_mean: average of ads viewedadsense_viewableimpressionpercent_mean: average ad display ratioadsense_ctr_mean: average ad click ratioadsense_ecpm_mean: estimated ad revenue per 1000 page viewsTotal: total incomeretweet_count_mean: average incomefavorite_count_total: total of favoritesfavorite_count_mean: average of favoritesterms_ini_num_tweets: total tweets on the terms on the day of publicationterms_ini_retweet_count_total: total retweets on the terms on the day of publicationterms_ini_retweet_count_mean: average retweets on the terms on the day of publicationterms_ini_favorite_count_total: total of favorites on the terms on the day of publicationterms_ini_favorite_count_mean: average of favorites on the terms on the day of publicationterms_ini_followers_talking_rate: ratio of followers of the media Twitter account who have recently published a tweet talking about the terms on the day of publicationterms_ini_user_num_followers_mean: average followers of users who have spoken of the terms on the day of publicationterms_ini_user_num_tweets_mean: average number of tweets published by users who spoke about the terms on the day of publicationterms_ini_user_age_mean: average age in days of users who have spoken of the terms on the day of publicationterms_ini_ur_inclusion_rate: URL inclusion ratio of tweets talking about terms on the day of publicationterms_end_num_tweets: total tweets on terms 14 days after publicationterms_ini_retweet_count_total: total retweets on terms 14 days after publicationterms_ini_retweet_count_mean: average retweets on terms 14 days after publicationterms_ini_favorite_count_total: total bookmarks on terms 14 days after publicationterms_ini_favorite_count_mean: average of favorites on terms 14 days after publicationterms_ini_followers_talking_rate: ratio of media Twitter account followers who have recently posted a tweet talking about the terms 14 days after publicationterms_ini_user_num_followers_mean: average followers of users who have spoken of the terms 14 days after publicationterms_ini_user_num_tweets_mean: average number of tweets published by users who have spoken about the terms 14 days after publicationterms_ini_user_age_mean: the average age in days of users who have spoken of the terms 14 days after publicationterms_ini_ur_inclusion_rate: URL inclusion ratio of tweets talking about terms 14 days after publication.tesis_terms: data of the terms (tags) related to the processed articles.stats_id: Analysis IDtime: "0" if at the time of publication, "1" if 14 days laterterm_id: Term ID (tag) in WordPressname: Name of the termslug: URL of the termnum_tweets: number of tweetsretweet_count_total: total retweetsretweet_count_mean: average retweetsfavorite_count_total: total of favoritesfavorite_count_mean: average of favoritesfollowers_talking_rate: ratio of followers of the media Twitter account who have recently published a tweet talking about the termuser_num_followers_mean: average followers of users who were talking about the termuser_num_tweets_mean: average number of tweets published by users who were talking about the termuser_age_mean: average age in days of users who were talking about the termurl_inclusion_rate: URL inclusion ratio
This dataset captures how many voter registration applications each agency has distributed, how many applications agency staff sent to the Board of Elections, how many staff each agency trained to distribute voter registration applications, whether or not the agency hosts a link to voting.nyc on its website and if so, how many clicks that link received during the reporting period.
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This Kaggle dataset comes from an output dataset that powers my March Madness Data Analysis dashboard in Domo. - Click here to view this dashboard: Dashboard Link - Click here to view this dashboard features in a Domo blog post: Hoops, Data, and Madness: Unveiling the Ultimate NCAA Dashboard
This dataset offers one the most robust resource you will find to discover key insights through data science and data analytics using historical NCAA Division 1 men's basketball data. This data, sourced from KenPom, goes as far back as 2002 and is updated with the latest 2025 data. This dataset is meticulously structured to provide every piece of information that I could pull from this site as an open-source tool for analysis for March Madness.
Key features of the dataset include: - Historical Data: Provides all historical KenPom data from 2002 to 2025 from the Efficiency, Four Factors (Offense & Defense), Point Distribution, Height/Experience, and Misc. Team Stats endpoints from KenPom's website. Please note that the Height/Experience data only goes as far back as 2007, but every other source contains data from 2002 onward. - Data Granularity: This dataset features an individual line item for every NCAA Division 1 men's basketball team in every season that contains every KenPom metric that you can possibly think of. This dataset has the ability to serve as a single source of truth for your March Madness analysis and provide you with the granularity necessary to perform any type of analysis you can think of. - 2025 Tournament Insights: Contains all seed and region information for the 2025 NCAA March Madness tournament. Please note that I will continually update this dataset with the seed and region information for previous tournaments as I continue to work on this dataset.
These datasets were created by downloading the raw CSV files for each season for the various sections on KenPom's website (Efficiency, Offense, Defense, Point Distribution, Summary, Miscellaneous Team Stats, and Height). All of these raw files were uploaded to Domo and imported into a dataflow using Domo's Magic ETL. In these dataflows, all of the column headers for each of the previous seasons are standardized to the current 2025 naming structure so all of the historical data can be viewed under the exact same field names. All of these cleaned datasets are then appended together, and some additional clean up takes place before ultimately creating the intermediate (INT) datasets that are uploaded to this Kaggle dataset. Once all of the INT datasets were created, I joined all of the tables together on the team name and season so all of these different metrics can be viewed under one single view. From there, I joined an NCAAM Conference & ESPN Team Name Mapping table to add a conference field in its full length and respective acronyms they are known by as well as the team name that ESPN currently uses. Please note that this reference table is an aggregated view of all of the different conferences a team has been a part of since 2002 and the different team names that KenPom has used historically, so this mapping table is necessary to map all of the teams properly and differentiate the historical conferences from their current conferences. From there, I join a reference table that includes all of the current NCAAM coaches and their active coaching lengths because the active current coaching length typically correlates to a team's success in the March Madness tournament. I also join another reference table to include the historical post-season tournament teams in the March Madness, NIT, CBI, and CIT tournaments, and I join another reference table to differentiate the teams who were ranked in the top 12 in the AP Top 25 during week 6 of the respective NCAA season. After some additional data clean-up, all of this cleaned data exports into the "DEV _ March Madness" file that contains the consolidated view of all of this data.
This dataset provides users with the flexibility to export data for further analysis in platforms such as Domo, Power BI, Tableau, Excel, and more. This dataset is designed for users who wish to conduct their own analysis, develop predictive models, or simply gain a deeper understanding of the intricacies that result in the excitement that Division 1 men's college basketball provides every year in March. Whether you are using this dataset for academic research, personal interest, or professional interest, I hope this dataset serves as a foundational tool for exploring the vast landscape of college basketball's most riveting and anticipated event of its season.
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
By Huggingface Hub [source]
This dataset contains a collection of questions and answers that have been contextualized to reveal subtle implications and insights. It is focused on helping researchers gain a deeper understanding of how semantics, context, and other factors affect how people interpret and respond to various conversations about different topics. By exploring this dataset, researchers will be able to uncover the underlying principles governing conversation styles, which can then be applied to better understand attitudes among different groups. With its comprehensive coverage of questions from a variety of sources around the web, this dataset offers an invaluable resource for those looking to sleep analyze discourse in terms of sentiment analysis or opinion mining
For more datasets, click here.
- 🚨 Your notebook can be here! 🚨!
How to Use This Dataset
This dataset contains a collection of contextualized questions and answers extracted from various sources around the web, which can be useful for exploring implications and insights. To get started with the dataset:
- Read through the headings on each column in order to understand the data that has been collected - this will help you identify which pieces of information are relevant for your research project.
- Explore each column and view what types of responses have been given in response to particular questions or topics - this will give you an idea as to how people interpret specific topics differently when presented with different contexts or circumstances.
- Next, analyze the responses looking for any patterns or correlations between responses on different topics or contexts - this can help reveal implications and insights previously unknown to you about a particular subject matter. You can also use any data visualization tools such as Tableau or PowerBI to gain deeper understanding into the results and trends within your data set!
- Finally, use these findings to better inform your project by tailoring future questions around any patterns discovered within your analysis!
- To understand the nature of public debates and how people express their opinions in different contexts.
- To better comprehend the implicit attitudes and assumptions inherent in language use, providing insight into discourse norms on a range of issues.
- To gain insight into the use of rhetorical devices, such as exaggeration and deceptive tactics, used to influence public opinion on important topics
If you use this dataset in your research, please credit the original authors. Data Source
License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.
File: train.csv | Column name | Description | |:--------------|:-----------------------------------------------------------------------------| | context | The context in which the question was asked and the answer was given. (Text) |
If you use this dataset in your research, please credit the original authors. If you use this dataset in your research, please credit Huggingface Hub.
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
The dataset provides 12 months (August 2016 to August 2017) of obfuscated Google Analytics 360 data from the Google Merchandise Store , a real ecommerce store that sells Google-branded merchandise, in BigQuery. It’s a great way analyze business data and learn the benefits of using BigQuery to analyze Analytics 360 data Learn more about the data The data includes The data is typical of what an ecommerce website would see and includes the following information:Traffic source data: information about where website visitors originate, including data about organic traffic, paid search traffic, and display trafficContent data: information about the behavior of users on the site, such as URLs of pages that visitors look at, how they interact with content, etc. Transactional data: information about the transactions on the Google Merchandise Store website.Limitations: All users have view access to the dataset. This means you can query the dataset and generate reports but you cannot complete administrative tasks. Data for some fields is obfuscated such as fullVisitorId, or removed such as clientId, adWordsClickInfo and geoNetwork. “Not available in demo dataset” will be returned for STRING values and “null” will be returned for INTEGER values when querying the fields containing no data.This public dataset is hosted in Google BigQuery and is included in BigQuery's 1TB/mo of free tier processing. This means that each user receives 1TB of free BigQuery processing every month, which can be used to run queries on this public dataset. Watch this short video to learn how to get started quickly using BigQuery to access public datasets. What is BigQuery
Dit bestand bevat de data die zijn verzameld in het kader van het proefschrift van Sanne Elling: ‘Evaluating website quality: Five studies on user-focused evaluation methods’.Summary:The benefits of evaluating websites among potential users are widely acknowledged. There are several methods that can be used to evaluate the websites’ quality from a users’ perspective. In current practice, many evaluations are executed with inadequate methods that lack research-based validation. This thesis aims to gain more insight into evaluation methodology and to contribute to a higher standard of website evaluation in practice. A first way to evaluate website quality is measuring the users’ opinions. This is often done with questionnaires, which gather opinions in a cheap, fast, and easy way. However, many questionnaires seem to miss a solid statistical basis and a justification of the choice of quality dimensions and questions. We therefore developed the ‘Website Evaluation Questionnaire’ (WEQ), which was specifically designed for the evaluation of governmental websites. In a study in online and laboratory settings the WEQ has proved to be a valid and reliable instrument. A way to gather more specific user opinions, is inviting participants to review website pages. Participants provide their comments by clicking on a feedback button, marking a problematic segment, and formulating their feedback.There has been debate about the extent to which users are able to provide relevant feedback. The results of our studies showed that participants were able to provide useful feedback. They signalled many relevant problems that indeed were experienced by users who needed to find information on the website. Website quality can also be measured during participants’ task performance. A frequently used method is the concurrent think-aloud method (CTA), which involves participants who verbalize their thoughts while performing tasks. There have been doubts on the usefulness and exhaustiveness of participants’ verbalizations. Therefore, we have combined CTA and eye tracking in order to examine the cognitive processes that participants do and do not verbalize. The results showed that the participants’ verbalizations provided substantial information in addition to the directly observable user problems. There was also a rather high percentage of silences (27%) during which interesting observations could be made about the users’ processes and obstacles. A thorough evaluation should therefore combine verbalizations and (eye tracking) observations. In a retrospective think-aloud (RTA) evaluation participants verbalize their thoughts afterwards while watching a recording of their performance. A problem with RTA is that participants not always remember the thoughts they had during their task performance. We therefore complemented the dynamic screen replay of their actions (pages visited and mouse movements) with a dynamic gaze replay of the participants’ eye movements.Contrary to our expectations, no differences were found between the two conditions. It is not possible to draw conclusions on the single best method. The value of a specific method is strongly influenced by the goals and context of an evaluation. Also, the outcomes of the evaluation not only depend on the method, but also on other choices during the evaluation, such as participant selection, tasks, and the subsequent analysis.
Attribution 3.0 (CC BY 3.0)https://creativecommons.org/licenses/by/3.0/
License information was derived automatically
From 1 July 2024, the dataset will no longer publicly distinguish between relevant qualifications or training courses or approved qualifications or training courses.
From 1 March 2024, the dataset will be updated to include 5 new fields and 1 existing field will also be updated (see help file for details).
From 24 August 2023, the dataset will be updated to include 1 new field, ABLE_TO_PROVIDE_TFAS, (see help file for details).
We have replaced the .xlsx file resources for all our datasets. This was required due to the API and web page search functionality no longer being supported for .xlsx files on the Data.Gov platform.
From 10 January 2022, the field ADV_FASEA _APPROVED_QUAL will be renamed to ADV_APPROVED_QUAL.
From 21 November 2019, the dataset will be updated to include 7 new fields (see help file for details)
These fields are included in conjunction with the professional standards reforms for financial advisers. More information can be found on the ASIC website https://asic.gov.au/regulatory-resources/financial-services/professional-standards-for-financial-advisers-reforms/.
Note: For most advisers the new fields will be unpopulated on 21 November 2019. As advisers provide this data to ASIC it will appear in the dataset.
ASIC is Australia’s corporate, markets and financial services regulator. ASIC contributes to Australia’s economic reputation and wellbeing by ensuring that Australia’s financial markets are fair and transparent, supported by confident and informed investors and consumers.
Australian Financial Services Licensees are required to keep the details of their financial advisers up to date on ASIC's Financial Advisers Register. Information contained in the register is made available to the public to search via ASIC's Moneysmart website.
Select data from the Financial Advisers Register will be uploaded each week to www.data.gov.au. The data made available will be a snapshot of the register at a point in time. Legislation prescribes the type of information ASIC is allowed to disclose to the public.
The information included in the downloadable dataset is:
Additional information about financial advisers can be found via ASIC's website. Accessing some information may attract a fee.
More information about searching ASIC's registers.
Dataset Summary
Persian data of this dataset is a collection of 400k blog posts (RohanAiLab/persian_blog). these posts have been gathered from more than 10 websites. This dataset can be used in different NLP tasks like language modeling, creating tokenizer and text generation tasks.
To see Persian data in Viewer tab click here
English data of this dataset is merged from english-wiki-corpus dataset. Note: If you need only Persian corpus click here Note: The data for both Persian… See the full description on the dataset page: https://huggingface.co/datasets/ali619/corpus-dataset-normalized-for-persian-and-english.
Each drainage area is considered a Hydrologic Unit (HU) and is given a Hydrologic Unit Code (HUC) which serves as the unique identifier for the area. HUC 2s, 6s, 8s, 10s, & 12s, define the drainage Regions, Subregions, Basins, Subbasins, Watersheds and Subwatersheds, respectively, across the United States. Their boundaries are defined by hydrologic and topographic criteria that delineate an area of land upstream from a specific point on a river and are determined solely upon science based hydrologic principles, not favoring any administrative boundaries, special projects, or a particular program or agency. The Watershed Boundary Dataset is delineated and georeferenced to the USGS 1:24,000 scale topographic basemap.Hydrologic Units are delineated to nest in a multi-level, hierarchical drainage system with corresponding HUCs, so that as you move from small scale to large scale the HUC digits increase in increments of two. For example, the very largest HUCs have 2 digits, and thus are referred to as HUC 2s, and the very smallest HUCs have 12 digits, and thus are referred to as HUC 12s.Dataset SummaryPhenomenon Mapped: Watersheds in the United States, as delineated by the Watershed Boundary Dataset (WBD)Geographic Extent: Contiguous United States, Alaska, Hawaii, Puerto Rico, Guam, US Virgin Islands, Northern Marianas Islands and American SamoaProjection: Web MercatorUpdate Frequency: AnnualVisible Scale: Visible at all scales, however USGS recommends this dataset should not be used for scales of 1:24,000 or larger.Source: United States Geological Survey (WBD)Data Vintage: January 7, 2025What can you do with this layer?This layer is suitable for both visualization and analysis acrossthe ArcGIS system. This layer can be combined with your data and other layers from the ArcGIS Living Atlas of the World in ArcGIS Online and ArcGIS Pro to create powerful web maps that can be used alone or in a story map or other application. Because this layer is part of the ArcGIS Living Atlas of the World it is easy to add to your map:In ArcGIS Online, you can add this layer to a map by selecting Add then Browse Living Atlas Layers. A window will open. Type "Watershed Boundary Dataset" in the search box and browse to the layer. Select the layer then click Add to Map. In ArcGIS Pro, open a map and select Add Data from the Map Tab. Select Data at the top of the drop down menu. The Add Data dialog box will open on the left side of the box, expand Portal if necessary, then select Living Atlas. Type "Watershed Boundary Dataset" in the search box, browse to the layer then click OK.Questions?Please leave a comment below if you have a question about this layer, and we will get back to you as soon as possible.
Each drainage area is considered a Hydrologic Unit (HU) and is given a Hydrologic Unit Code (HUC) which serves as the unique identifier for the area. HUC 2s, 6s, 8s, 10s, & 12s, define the drainage Regions, Subregions, Basins, Subbasins, Watersheds and Subwatersheds, respectively, across the United States. Their boundaries are defined by hydrologic and topographic criteria that delineate an area of land upstream from a specific point on a river and are determined solely upon science based hydrologic principles, not favoring any administrative boundaries, special projects, or a particular program or agency. The Watershed Boundary Dataset is delineated and georeferenced to the USGS 1:24,000 scale topographic basemap.Hydrologic Units are delineated to nest in a multi-level, hierarchical drainage system with corresponding HUCs, so that as you move from small scale to large scale the HUC digits increase in increments of two. For example, the very largest HUCs have 2 digits, and thus are referred to as HUC 2s, and the very smallest HUCs have 12 digits, and thus are referred to as HUC 12s.Dataset SummaryPhenomenon Mapped: Watersheds in the United States, as delineated by the Watershed Boundary Dataset (WBD)Geographic Extent: Contiguous United States, Alaska, Hawaii, Puerto Rico, Guam, US Virgin Islands, Northern Marianas Islands and American SamoaProjection: Web MercatorUpdate Frequency: AnnualVisible Scale: Visible at all scales, however USGS recommends this dataset should not be used for scales of 1:24,000 or larger.Source: United States Geological Survey (WBD)Data Vintage: January 7, 2025What can you do with this layer?This layer is suitable for both visualization and analysis acrossthe ArcGIS system. This layer can be combined with your data and other layers from the ArcGIS Living Atlas of the World in ArcGIS Online and ArcGIS Pro to create powerful web maps that can be used alone or in a story map or other application. Because this layer is part of the ArcGIS Living Atlas of the World it is easy to add to your map:In ArcGIS Online, you can add this layer to a map by selecting Add then Browse Living Atlas Layers. A window will open. Type "Watershed Boundary Dataset" in the search box and browse to the layer. Select the layer then click Add to Map. In ArcGIS Pro, open a map and select Add Data from the Map Tab. Select Data at the top of the drop down menu. The Add Data dialog box will open on the left side of the box, expand Portal if necessary, then select Living Atlas. Type "Watershed Boundary Dataset" in the search box, browse to the layer then click OK.Questions?Please leave a comment below if you have a question about this layer, and we will get back to you as soon as possible.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
A Graph Convolutional Neural Network-based Method for Predicting Computational Intensity of GeocomputationThis is the implementation for the paper "A Graph Convolutional Neural Network-based Method for Predicting Computational Intensity of Geocomputation".The framework is Learning-based Computing Framework for Geospatial data(LCF-G).Prediction, ParallelComputation and SampleGeneration.This paper includes three case studies, each corresponding to a folder. Each folder contains four subfolders: data, CIThe data folder contains geospatail data.The CIPrediction folder contains model training code.The ParallelComputation folder contains geographic computation code.The SampleGeneration folder contains code for sample generation.Case 1: Generation of DEM from point cloud datastep 1: Data downloadDataset 1 has been uploaded to the directory 1point2dem/data. The other two datasets, Dataset 2 and Dataset 3, can be downloaded from the following website:OpenTopographyBelow are the steps for downloading Dataset 2 and Dataset 3, along with the query parameters:Dataset 2:Visit OpenTopography Website: Go to Dataset 2 Download Link.https://portal.opentopography.org/lidarDataset?opentopoID=OTLAS.112018.2193.1Coordinates & Classification:In the section "1. Coordinates & Classification", select the option "Manually enter selection coordinates".Set the coordinates as follows: Xmin = 1372495.692761,Ymin = 5076006.86821,Xmax = 1378779.529766,Ymax = 5085586.39531Point Cloud Data Download:Under section "2. Point Cloud Data Download", choose the option "Point cloud data in LAS format".Submit:Click on "SUBMIT" to initiate the download.Dataset 3:Visit OpenTopography Website:Go to Dataset 3 Download Link: https://portal.opentopography.org/lidarDataset?opentopoID=OTLAS.052016.26912.1Coordinates & Classification:In the section "1. Coordinates & Classification", select the option "Manually enter selection coordinates".Set the coordinates as follows:Xmin = 470047.153826,Ymin = 4963418.512121,Xmax = 479547.16556,Ymax = 4972078.92768Point Cloud Data Download:Under section "2. Point Cloud Data Download", choose the option "Point cloud data in LAS format".Submit:Click on "SUBMIT" to initiate the download.step 2: Sample generationThis step involves data preparation, and samples can be generated using the provided code. Since the samples have already been uploaded to 1point2dem/SampleGeneration/data, this step is optional.cd 1point2dem/SampleGenerationg++ PointCloud2DEMSampleGeneration.cpp -o PointCloud2DEMSampleGenerationmpiexec -n {number_processes} ./PointCloud2DEMSampleGeneration ../data/pcd path/to/outputstep 3: Model trainingThis step involves training three models (GAN, CNN, GAT). The model results are saved in 1point2dem/SampleGeneration/result, and the results for Table 3 in the paper are derived from this output.cd 1point2dem/CIPredictionpython -u point_prediction.py --model [GCN|ChebNet|GATNet]step 4: Parallel computationThis step uses the trained models to optimize parallel computation. The results for Figures 11-13 in the paper are generated from the output of this command.cd 1point2dem/ParallelComputationg++ ParallelPointCloud2DEM.cpp -o ParallelPointCloud2DEMmpiexec -n {number_processes} ./ParallelPointCloud2DEM ../data/pcdCase 2: Spatial intersection of vector datastep 1: Data downloadSome data from the paper has been uploaded to 2intersection/data. The remaining OSM data can be downloaded from GeoFabrik. Below are the download steps and parameters:Directly click the following link to download the OSM data: GeoFabrik - Czech Republic OSM Datastep 2: Sample generationThis step involves data preparation, and samples can be generated using the provided code. Since the samples have already been uploaded to 2intersection/SampleGeneration/data, this step is optional.cd 2intersection/SampleGenerationg++ ParallelIntersection.cpp -o ParallelIntersectionmpiexec -n {number_processes} ./ParallelIntersection ../data/shpfile ../data/shpfilestep 3: Model trainingThis step involves training three models (GAN, CNN, GAT). The model results are saved in 2intersection/SampleGeneration/result, and the results for Table 5 in the paper are derived from this output.cd 2intersection/CIPredictionpython -u vector_prediction.py --model [GCN|ChebNet|GATNet]step 4: Parallel computationThis step uses the trained models to optimize parallel computation. The results for Figures 14-16 in the paper are generated from the output of this command.cd 2intersection/ParallelComputationg++ ParallelIntersection.cpp -o ParallelIntersectionmpiexec -n {number_processes} ./ParallelIntersection ../data/shpfile1 ../data/shpfile2Case 3: WOfS analysis using raster datastep 1: Data downloadSome data from the paper has been uploaded to 3wofs/data. The remaining data can be downloaded from http://openge.org.cn/advancedRetrieval?type=dataset. Below are the query parameters:Product Selection: Select LC08_L1TP
and LC08_L1GT
Latitude and Longitude Selection:Minimum Longitude: 112.5,Maximum Longitude: 115.5, Minimum Latitude: 29.5, Maximum Latitude: 31.5Time Range: 2013-01-01 to 2018-12-31Other parameters: Defaultstep 2: Sample generationThis step involves data preparation, and samples can be generated using the provided code. Since the samples have already been uploaded to 3wofs/SampleGeneration/data, this step is optional.cd 3wofs/SampleGenerationsbt packeagespark-submit --master {host1,host2,host3} --class whu.edu.cn.core.cube.raster.WOfSSampleGeneration path/to/package.jarstep 3: Model trainingThis step involves training three models (GAN, CNN, GAT). The model results are saved in 3wofs/SampleGeneration/result, and the results for Table 6 in the paper are derived from this output.cd 3wofs/CIPredictionpython -u raster_prediction.py --model [GCN|ChebNet|GATNet]step 4: Parallel computationThis step uses the trained models to optimize parallel computation. The results for Figures 18, 19 in the paper are generated from the output of this command.cd 3wofs/ParallelComputationsbt packeagespark-submit --master {host1,host2,host3} --class whu.edu.cn.core.cube.raster.WOfSOptimizedByDL path/to/package.jar path/to/outputStatement about Case 3The experiment Case 3 presented in this paper was conducted with improvements made on the GeoCube platform.Code Name: GeoCubeCode Link: GeoCube Source CodeLicense Information: The GeoCube project is openly available under the CC BY 4.0 license.The GeoCube project is licensed under CC BY 4.0, which is the Creative Commons Attribution 4.0 International License, allowing anyone to freely share, modify, and distribute the platform's code.Citation:Gao, Fan (2022). A multi-source spatio-temporal data cube for large-scale geospatial analysis. figshare. Software. https://doi.org/10.6084/m9.figshare.15032847.v1Clarification Statement:The authors of this code are not affiliated with this manuscript. The innovations and steps in Case 3, including data download, sample generation, and parallel computation optimization, were independently developed and are not dependent on the GeoCube’s code.RequirementsThe codes use the following dependencies with Python 3.8torch==2.0.0torch_geometric==2.5.3networkx==2.6.3pyshp==2.3.1tensorrt==8.6.1matplotlib==3.7.2scipy==1.10.1scikit-learn==1.3.0geopandas==0.13.2
The data on the use of the data sets on the OGD portal BL (data.bl.ch) are collected and published by the specialist and coordination office OGD BL. Contains the day the usage was measured.dataset_title: The title of the dataset_id record: The technical ID of the dataset.visitors: Specifies the number of daily visitors to the record. Visitors are recorded by counting the unique IP addresses that recorded access on the day of the survey. The IP address represents the network address of the device from which the portal was accessed.interactions: Includes all interactions with any record on data.bl.ch. A visitor can trigger multiple interactions. Interactions include clicks on the website (searching datasets, filters, etc.) as well as API calls (downloading a dataset as a JSON file, etc.).RemarksOnly calls to publicly available datasets are shown.IP addresses and interactions of users with a login of the Canton of Basel-Landschaft - in particular of employees of the specialist and coordination office OGD - are removed from the dataset before publication and therefore not shown.Calls from actors that are clearly identifiable as bots by the user agent header are also not shown.Combinations of dataset and date for which no use occurred (Visitors == 0 & Interactions == 0) are not shown.Due to synchronization problems, data may be missing by the day.
The NSF Public Access Repository contains an initial collection of journal publications and the final accepted version of the peer-reviewed manuscript or the version of record. To do this, NSF draws upon services provided by the publisher community including the Clearinghouse of Open Research for the United States, CrossRef, and International Standard Serial Number. When clicking on a Digital Object Identifier number, you will be taken to an external site maintained by the publisher. Some full text articles may not be available without a charge during the embargo, or administrative interval. Some links on this page may take you to non-federal websites. Their policies may differ from this website.
https://www.icpsr.umich.edu/web/ICPSR/studies/38908/termshttps://www.icpsr.umich.edu/web/ICPSR/studies/38908/terms
The Child Care and Development Fund (CCDF) provides federal money to states and territories to provide assistance to low-income families, to obtain quality child care so they can work, attend training, or receive education. Within the broad federal parameters, States and Territories set the detailed policies. Those details determine whether a particular family will or will not be eligible for subsidies, how much the family will have to pay for the care, how families apply for and retain subsidies, the maximum amounts that child care providers will be reimbursed, and the administrative procedures that providers must follow. Thus, while CCDF is a single program from the perspective of federal law, it is in practice a different program in every state and territory. The CCDF Policies Database project is a comprehensive, up-to-date database of CCDF policy information that supports the needs of a variety of audiences through (1) analytic data files, (2) a project website and search tool, and (3) an annual report (Book of Tables). These resources are made available to researchers, administrators, and policymakers with the goal of addressing important questions concerning the effects of child care subsidy policies and practices on the children and families served. A description of the data files, project website and search tool, and Book of Tables is provided below: 1. Detailed, longitudinal analytic data files provide CCDF policy information for all 50 states, the District of Columbia, and the United States territories and outlying areas that capture the policies actually in effect at a point in time, rather than proposals or legislation. They capture changes throughout each year, allowing users to access the policies in place at any point in time between October 2009 and the most recent data release. The data are organized into 32 categories with each category of variables separated into its own dataset. The categories span five general areas of policy including: Eligibility Requirements for Families and Children (Datasets 1-5) Family Application, Terms of Authorization, and Redetermination (Datasets 6-13) Family Payments (Datasets 14-18) Policies for Providers, Including Maximum Reimbursement Rates (Datasets 19-27) Overall Administrative and Quality Information Plans (Datasets 28-32) The information in the data files is based primarily on the documents that caseworkers use as they work with families and providers (often termed "caseworker manuals"). The caseworker manuals generally provide much more detailed information on eligibility, family payments, and provider-related policies than the CCDF Plans submitted by states and territories to the federal government. The caseworker manuals also provide ongoing detail for periods in between CCDF Plan dates. Each dataset contains a series of variables designed to capture the intricacies of the rules covered in the category. The variables include a mix of categorical, numeric, and text variables. Most variables have a corresponding notes field to capture additional details related to that particular variable. In addition, each category has an additional notes field to capture any information regarding the rules that is not already outlined in the category's variables. Beginning with the 2020 files, the analytic data files are supplemented by four additional data files containing select policy information featured in the annual reports (prior to 2020, the full detail of the annual reports was reproduced as data files). The supplemental data files are available as 4 datasets (Datasets 33-36) and present key aspects of the differences in CCDF-funded programs across all states and territories as of October 1 of each year (2009-2022). The files include variables that are calculated using several variables from the analytic data files (Datasets 1-32) (such as copayment amounts for example family situations) and information that is part of the annual project reports (the annual Book of Tables) but not stored in the full database (such as summary market rate survey information from the CCDF plans). 2. The project website and search tool provide access to a point-and-click user interface. Users can select from the full set of public data to create custom tables. The website also provides access to the full range of reports and products released under the CCDF Policies Data
Each drainage area is considered a Hydrologic Unit (HU) and is given a Hydrologic Unit Code (HUC) which serves as the unique identifier for the area. HUC 2s, 6s, 8s, 10s, & 12s, define the drainage Regions, Subregions, Basins, Subbasins, Watersheds and Subwatersheds, respectively, across the United States. Their boundaries are defined by hydrologic and topographic criteria that delineate an area of land upstream from a specific point on a river and are determined solely upon science based hydrologic principles, not favoring any administrative boundaries, special projects, or a particular program or agency. The Watershed Boundary Dataset is delineated and georeferenced to the USGS 1:24,000 scale topographic basemap.Hydrologic Units are delineated to nest in a multi-level, hierarchical drainage system with corresponding HUCs, so that as you move from small scale to large scale the HUC digits increase in increments of two. For example, the very largest HUCs have 2 digits, and thus are referred to as HUC 2s, and the very smallest HUCs have 12 digits, and thus are referred to as HUC 12s.Dataset SummaryPhenomenon Mapped: Watersheds in the United States, as delineated by the Watershed Boundary Dataset (WBD)Geographic Extent: Contiguous United States, Alaska, Hawaii, Puerto Rico, Guam, US Virgin Islands, Northern Marianas Islands and American SamoaProjection: Web MercatorUpdate Frequency: AnnualVisible Scale: Visible at all scales, however USGS recommends this dataset should not be used for scales of 1:24,000 or larger.Source: United States Geological Survey (WBD)Data Vintage: January 7, 2025What can you do with this layer?This layer is suitable for both visualization and analysis acrossthe ArcGIS system. This layer can be combined with your data and other layers from the ArcGIS Living Atlas of the World in ArcGIS Online and ArcGIS Pro to create powerful web maps that can be used alone or in a story map or other application. Because this layer is part of the ArcGIS Living Atlas of the World it is easy to add to your map:In ArcGIS Online, you can add this layer to a map by selecting Add then Browse Living Atlas Layers. A window will open. Type "Watershed Boundary Dataset" in the search box and browse to the layer. Select the layer then click Add to Map. In ArcGIS Pro, open a map and select Add Data from the Map Tab. Select Data at the top of the drop down menu. The Add Data dialog box will open on the left side of the box, expand Portal if necessary, then select Living Atlas. Type "Watershed Boundary Dataset" in the search box, browse to the layer then click OK.Questions?Please leave a comment below if you have a question about this layer, and we will get back to you as soon as possible.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset contains a collection of around 2,000 HTML pages: these web pages contain the search results obtained in return to queries for different products, searched by a set of synthetic users surfing Google Shopping (US version) from different locations, in July, 2016. Each file in the collection has a name where there is indicated the location from where the search has been done, the userID, and the searched product: no_email_LOCATION_USERID.PRODUCT.shopping_testing.#.html The locations are Philippines (PHI), United States (US), India (IN). The userIDs: 26 to 30 for users searching from Philippines, 1 to 5 from US, 11 to 15 from India. Products have been choice following 130 keywords (e.g., MP3 player, MP4 Watch, Personal organizer, Television, etc.). In the following, we describe how the search results have been collected. Each user has a fresh profile. The creation of a new profile corresponds to launch a new, isolated, web browser client instance and open the Google Shopping US web page. To mimic real users, the synthetic users can browse, scroll pages, stay on a page, and click on links. A fully-fledged web browser is used to get the correct desktop version of the website under investigation. This is because websites could be designed to behave according to user agents, as witnessed by the differences between the mobile and desktop versions of the same website. The prices are the retail ones displayed by Google Shopping in US dollars (thus, excluding shipping fees). Several frameworks have been proposed for interacting with web browsers and analysing results from search engines. This research adopts OpenWPM. OpenWPM is automatised with Selenium to efficiently create and manage different users with isolated Firefox and Chrome client instances, each of them with their own associated cookies. The experiments run, on average, 24 hours. In each of them, the software runs on our local server, but the browser's traffic is redirected to the designated remote servers (i.e., to India), via tunneling in SOCKS proxies. This way, all commands are simultaneously distributed over all proxies. The experiments adopt the Mozilla Firefox browser (version 45.0) for the web browsing tasks and run under Ubuntu 14.04. Also, for each query, we consider the first page of results, counting 40 products. Among them, the focus of the experiments is mostly on the top 10 and top 3 results. Due to connection errors, one of the Philippine profiles have no associated results. Also, for Philippines, a few keywords did not lead to any results: videocassette recorders, totes, umbrellas. Similarly, for US, no results were for totes and umbrellas. The search results have been analyzed in order to check if there were evidence of price steering, based on users' location. One term of usage applies: In any research product whose findings are based on this dataset, please cite @inproceedings{DBLP:conf/ircdl/CozzaHPN19, author = {Vittoria Cozza and Van Tien Hoang and Marinella Petrocchi and Rocco {De Nicola}}, title = {Transparency in Keyword Faceted Search: An Investigation on Google Shopping}, booktitle = {Digital Libraries: Supporting Open Science - 15th Italian Research Conference on Digital Libraries, {IRCDL} 2019, Pisa, Italy, January 31 - February 1, 2019, Proceedings}, pages = {29--43}, year = {2019}, crossref = {DBLP:conf/ircdl/2019}, url = {https://doi.org/10.1007/978-3-030-11226-4_3}, doi = {10.1007/978-3-030-11226-4_3}, timestamp = {Fri, 18 Jan 2019 23:22:50 +0100}, biburl = {https://dblp.org/rec/bib/conf/ircdl/CozzaHPN19}, bibsource = {dblp computer science bibliography, https://dblp.org} }
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
PATCIT: A Comprehensive Dataset of Patent Citations [Website, Newsletter, GitHub]
Patents are at the crossroads of many innovation nodes: science, industry, products, competition, etc. Such interactions can be identified through citations in a broad sense.
It is now common to use front-page patent citations to study some aspects of the innovation system. However, there is much more buried in the Non Patent Literature (NPL) citations and in the patent text itself.
Good news: Natural Language Processing (NLP) tools now enable social scientists to excavate and structure this long hidden information. That's the purpose of this project
IN PRACTICE
A detailed presentation of the current state of the project is available in our March 2020 presentation.
So far, we have:
parsed and consolidated the 27 million NPL citations classified as bibliographical references.
extracted, parsed and consolidated in-text bibliographical references and patent citations from the body of all time USPTO patents.
The latest version of the dataset is the v0.15. It is made of the v0.1 of the US contextual citations dataset and v0.2 of the front-page NPL citations dataset.
Give it a try! The dataset is publicly available on Google Cloud BigQuery, just click here.
FEATURES
Open
Comprehensive
Highest standards
TagX Web Browsing Clickstream Data: Unveiling Digital Behavior Across North America and EU Unique Insights into Online User Behavior TagX Web Browsing clickstream Data offers an unparalleled window into the digital lives of 1 million users across North America and the European Union. This comprehensive dataset stands out in the market due to its breadth, depth, and stringent compliance with data protection regulations. What Makes Our Data Unique?
Extensive Geographic Coverage: Spanning two major markets, our data provides a holistic view of web browsing patterns in developed economies. Large User Base: With 300K active users, our dataset offers statistically significant insights across various demographics and user segments. GDPR and CCPA Compliance: We prioritize user privacy and data protection, ensuring that our data collection and processing methods adhere to the strictest regulatory standards. Real-time Updates: Our clickstream data is continuously refreshed, providing up-to-the-minute insights into evolving online trends and user behaviors. Granular Data Points: We capture a wide array of metrics, including time spent on websites, click patterns, search queries, and user journey flows.
Data Sourcing: Ethical and Transparent Our web browsing clickstream data is sourced through a network of partnered websites and applications. Users explicitly opt-in to data collection, ensuring transparency and consent. We employ advanced anonymization techniques to protect individual privacy while maintaining the integrity and value of the aggregated data. Key aspects of our data sourcing process include:
Voluntary user participation through clear opt-in mechanisms Regular audits of data collection methods to ensure ongoing compliance Collaboration with privacy experts to implement best practices in data anonymization Continuous monitoring of regulatory landscapes to adapt our processes as needed
Primary Use Cases and Verticals TagX Web Browsing clickstream Data serves a multitude of industries and use cases, including but not limited to:
Digital Marketing and Advertising:
Audience segmentation and targeting Campaign performance optimization Competitor analysis and benchmarking
E-commerce and Retail:
Customer journey mapping Product recommendation enhancements Cart abandonment analysis
Media and Entertainment:
Content consumption trends Audience engagement metrics Cross-platform user behavior analysis
Financial Services:
Risk assessment based on online behavior Fraud detection through anomaly identification Investment trend analysis
Technology and Software:
User experience optimization Feature adoption tracking Competitive intelligence
Market Research and Consulting:
Consumer behavior studies Industry trend analysis Digital transformation strategies
Integration with Broader Data Offering TagX Web Browsing clickstream Data is a cornerstone of our comprehensive digital intelligence suite. It seamlessly integrates with our other data products to provide a 360-degree view of online user behavior:
Social Media Engagement Data: Combine clickstream insights with social media interactions for a holistic understanding of digital footprints. Mobile App Usage Data: Cross-reference web browsing patterns with mobile app usage to map the complete digital journey. Purchase Intent Signals: Enrich clickstream data with purchase intent indicators to power predictive analytics and targeted marketing efforts. Demographic Overlays: Enhance web browsing data with demographic information for more precise audience segmentation and targeting.
By leveraging these complementary datasets, businesses can unlock deeper insights and drive more impactful strategies across their digital initiatives. Data Quality and Scale We pride ourselves on delivering high-quality, reliable data at scale:
Rigorous Data Cleaning: Advanced algorithms filter out bot traffic, VPNs, and other non-human interactions. Regular Quality Checks: Our data science team conducts ongoing audits to ensure data accuracy and consistency. Scalable Infrastructure: Our robust data processing pipeline can handle billions of daily events, ensuring comprehensive coverage. Historical Data Availability: Access up to 24 months of historical data for trend analysis and longitudinal studies. Customizable Data Feeds: Tailor the data delivery to your specific needs, from raw clickstream events to aggregated insights.
Empowering Data-Driven Decision Making In today's digital-first world, understanding online user behavior is crucial for businesses across all sectors. TagX Web Browsing clickstream Data empowers organizations to make informed decisions, optimize their digital strategies, and stay ahead of the competition. Whether you're a marketer looking to refine your targeting, a product manager seeking to enhance user experience, or a researcher exploring digital trends, our cli...