Web traffic statistics for the several City-Parish websites, brla.gov, city.brla.gov, Red Stick Ready, GIS, Open Data etc. Information provided by Google Analytics.
Data about the usage of the WPRDC site and its various datasets, obtained by combining Google Analytics statistics with information from the WPRDC's data portal.
The Google Analytics extension for CKAN enables integration with Google Analytics for tracking and analyzing usage of your CKAN data portal. It enhances CKAN installations by embedding Google Analytics tracking code for page views, recording resource downloads as events, and logging API call events. Furthermore, it can import statistical data from Google Analytics back into CKAN to display download counts on resource pages. Key Features: Page View Tracking: Automatically inserts the Google Analytics asynchronous tracking code into page headers to monitor basic site traffic and user behavior. Resource Download Event Tracking: Tracks resource downloads as events within Google Analytics, allowing for analysis of which resources are most popular and frequently accessed. The resource_prefix configuration option allows for easier filtering of resource downloads within the Google Analytics interface. API Call Event Tracking: Records specific API calls as events, offering insight into how users and applications are interacting with the CKAN API (note the readme doesn't specify which API calls are tracked). Custom Event Tracking: Provides a mechanism for other extensions to implement custom event tracking using Google Analytics, which provides a high level of extensibility. Statistics Retrieval: Retrieves statistical data from Google Analytics and integrates it into CKAN, enabling the display of download counts for each resource on the package page, offering immediate feedback on data usage.
The trak extension for CKAN enhances the platform's tracking capabilities by providing tools to import Google Analytics data and modify the presentation of page view statistics. It introduces a paster command for importing page view data from exported Google Analytics CSV files, enabling users to supplement CKAN's built-in tracking. The extension also includes template customizations to alter how page view counts are displayed on dataset and resource listing pages. Key Features: Google Analytics Data Import: Imports page view data directly from a stripped-down CSV of Google Analytics data using a dedicated paster command (csv2table). The CSV should contain a list of page views, where each row starts with '/'. The PageViews column is expected to be the 3rd column. Customizable Page View Display: Changes the default presentation of page view statistics within CKAN, removing the minimum view count restriction (default is 10) so all views can be seen and modifies UI elements. Altered Page Tracking Stats: Alters the placement of page tracking statistics, moving them below Package Data (on dataset list pages) and Resource Data (on resource list pages) for better integration of tracking data. UI/UX Enhancements: Replaces the flame icon typically used for page tracking and substitutes it with more subtle background styling to modernize the presentation of tracking data. Backend Data Manipulation Uses a 'floor date' of 2011-01-01 for page view calculation. Entries are made in the trackingraw table for each view, with a unique UUID. Integration with CKAN: The extension integrates into CKAN's core functionalities by introducing a new paster command and modifying existing templates for displaying page view statistics. It relies on CKAN's built-in tracking to be enabled, but supplements its capabilities with imported data and presentation adjustments. After importing data using the csv2table paster command, the standard tracking update and search-index rebuild paster tasks need to be run to process the imported data and update the search index.. Benefits & Impact: By importing data from Google Analytics, the trak extension allows administrators to see a holistic view of page views. It changes the user experience to facilitate tracking statistics in a more integrated fashion. This allows for a better understanding of the impact and utilization of resources within the CKAN instance, based on Google Analytics data.
The ckanext-tayside extension enhances CKAN by adding particular features related to data management. Primarily, it allows for the calculation and display of dataset and resource download statistics by pulling data from Google Analytics, and facilitates notifications to dataset maintainers regarding update frequency. Additionally, contains tooling to modify the CSS styles using LESS. Key Features: Google Analytics Integration: Enables the import of download statistics for datasets and resources from Google Analytics, providing insights into data usage. Update Frequency Checks: Allows administrators to check the update frequency of datasets and automatically notify maintainers via email, ensuring data remains current. Customizable Styling with LESS: Uses LESS files for CSS styling, providing a structured and maintainable approach to customizing the CKAN interface. A compilation process is required to turn LESS files to CSS. Technical Integration: The GA extension is required to generate the credentials file used for the Google Analytics integration. Involves the use of command-line tools for importing data from Google Analytics and checking update frequencies. LESS styles are applied through custom CSS compiled with LESS, allowing modification of the default CKAN theme. Requires adding tayside to the ckan.plugins setting in the CKAN config file (production.ini). Benefits & Impact: Implementing the ckanext-tayside extension provides several benefits: enhanced understanding of data usage through Google Analytics integration, improved data freshness through update frequency notifications, and a structured approach to customizing the look and feel of CKAN through LESS styling. The extension makes easier the maintenance of a healthy and up-to-date data catalogue.
BestPlace is an innovative retail data and analytics tool created explicitly for medium and enterprise-level CPG/FMCG companies. It's designed to revolutionize your retail data analysis approach by adding a strategic location-based perspective to your existing database. This perspective enriches your data landscape and allows your business to understand better and cater to shopping behavior. An In-Depth Approach to Retail Analytics Unlike conventional analytics tools, BestPlace delves deep into each store location details, providing a comprehensive analysis of your retail database. We leverage unique tools and methodologies to extract, analyze, and compile data. Our processes have been accurately designed to provide a holistic view of your business, equipping you with the information you need to make data-driven data-backed decisions. Amplifying Your Database with BestPlace At BestPlace, we understand the importance of a robust and informative retail database design. We don't just add new stores to your database; we enrich each store with vital characteristics and factors. These enhancements come from open cartographic sources such as Google Maps and our proprietary GIS database, all carefully collected and curated by our experienced data analysts. Store Features We enrich your retail database with an array of store features, which include but are not limited to: Number of reviews Average ratings Operational hours Categories relevant to each point Our attention to detail ensures your retail database becomes a powerful tool for understanding customer interactions and preferences.
Extensive Use Cases BestPlace's capabilities stretch across various applications, offering value in areas such as: Competition Analysis: Identify your competitors, analyze their performance, and understand your standing in the market with our extensive POI database and retail data analytics capabilities. New Location Search: Use our rich retail store database to identify ideal locations for store expansions based on foot traffic data, proximity to key points, and potential customer demographics.
Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
OpenDataNI website usage data, collated via Google Analytics, for the period from the site's launch on 26th November 2015 onward.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Analysis of ‘Website Analytics’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://catalog.data.gov/dataset/ecee4df3-8149-4b74-8927-428ea920b758 on 13 February 2022.
--- Dataset description provided by original source is as follows ---
Web traffic statistics for the several City-Parish websites, brla.gov, city.brla.gov, Red Stick Ready, GIS, Open Data etc. Information provided by Google Analytics.
--- Original source retains full ownership of the source dataset ---
Open Database License (ODbL) v1.0https://www.opendatacommons.org/licenses/odbl/1.0/
License information was derived automatically
This dataset presents the indicators of use of the data.sncf.com website, and the use of datasets. This data comes from the Opendatasoft Back Office, Google Analytics for the period February 2020 to March 2021 and PIWIK Analytics since July 2021. Monthly update.
Contains Gallup data from countries that are home to more than 98% of the world's population through a state-of-the-art Web-based portal. Gallup Analytics puts Gallup's best global intelligence in users' hands to help them better understand the strengths and challenges of the world's countries and regions. Users can access Gallup's U.S. Daily tracking and World Poll data to compare residents' responses region by region and nation by nation to questions on topics such as economic conditions, government and business, health and wellbeing, infrastructure, and education.
The Gallup Analytics Database is accessed through the Cornell University Libraries here. In addition, a CUL subscription also allows access to the Gallup Respondent Level Data. For access please refer to the documentation below and then request the variables you need here.
Before requesting data from the World Poll, please see the Getting Started guide and the Worldwide Research Methodology and Codebook (You will need to request access). The Codebook will give you information about all available variables in the datasets. There are other guides available as well in the google folder. You can also access information about questions asked and variables using the Gallup World Poll Reference Tool. You will need to create your user account to access the tool. This will only give you access to information about the questions asked and variables. It will not give you access to the data.
For further documentation and information see this site from New York University Libraries. The Gallup documentation for the World Poll methodology is also available under the Data and Documentation tab.
In addition to the World Poll and Daily Tracking Poll, also available are the Gallup Covid-19 Survey, Gallup Poll Social Series Surveys, Race Relations Survey, Confidence in Institutions Survey, Honesty and Ethics in Professions Survey, and Religion Battery.
The process for getting access to respondent-level data from the Gallup U.S. Daily Tracking is similar to the World Poll Survey. There is no comparable discovery tool for U.S. Daily Tracking poll questions, however. Users need to consult the codebooks and available variables across years.
The COVID-19 web survey began on March 13, 2020 with daily random samples of U.S. adults, aged 18 and older who are members of the Gallup Panel. Before requesting data, please see the Gallup Panel COVID-19 Survey Methodology and Codebook.
The Gallup Poll Social Series (GPSS) dataset is a set of public opinion surveys designed to monitor U.S. adults’ views on numerous social, economic, and political topics. More information is available on the Gallup website: https://www.gallup.com/175307/gallup-poll-social-series-methodology.aspx As each month has a unique codebook, contact CCSS-ResearchSupport@cornell.edu to discuss your interests and start the data request process.
Starting in 1973, Gallup started measuring the confidence level in several US institutions like Congress, Presidency, Supreme Court, Police, etc. The included dataset includes data beginning in 1973 and data is collected once per year. Users should consult the list of available variables.
The Race Relations Poll includes topics that were previously represented in the GPSS Minority Relations Survey that ran through 2016. The Race Relations Survey was conducted November 2018. Users should consult the codebook for this poll before making their request.
The Honesty and Ethics in Professions Survey – Starting in 1976, Gallup started measuring US perceptions of the honesty and ethics of a list of professions. The included dataset was added to the collection in March 2023 and includes data ranging from 1976-2022. Documentation for this collection is located here and will require you to request access.
Religion Battery: Consolidated list of items focused on religion in the US from 1999-2022. Documentation for this collection is located here and will require you to request access.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Monthly statistics for pages viewed by visitors to the Queensland Government website—Seniors franchise. Source: Google Analytics Monthly statistics for pages viewed by visitors to the Queensland Government website—Seniors franchise. Source: Google Analytics
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset supports the working paper, "Repository optimisation & techniques to improve discoverability and web impact : an evaluation", currently under review for publication and available as a preprint at: https://doi.org/10.17868/65389/.
The dataset comprises a single OpenDocument Spreadsheet (.ods) format file containing seven data sheets of data pertaining to COUNTER compliant usage statistics, search query traffic from Google Search Console, web traffic data for Google Analytics and Google Scholar, and usage statistics from IRStats2. All data relate to the EPrints repository, Strathprints, based at the University of Strathclyde.
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Version update: The originally uploaded versions of the CSV files in this dataset included an extra column, "Unnamed: 0," which is not RAMP data and was an artifact of the process used to export the data to CSV format. This column has been removed from the revised dataset. The data are otherwise the same as in the first version.
The Repository Analytics and Metrics Portal (RAMP) is a web service that aggregates use and performance use data of institutional repositories. The data are a subset of data from RAMP, the Repository Analytics and Metrics Portal (http://rampanalytics.org), consisting of data from all participating repositories for the calendar year 2020. For a description of the data collection, processing, and output methods, please see the "methods" section below.
Methods Data Collection
RAMP data are downloaded for participating IR from Google Search Console (GSC) via the Search Console API. The data consist of aggregated information about IR pages which appeared in search result pages (SERP) within Google properties (including web search and Google Scholar).
Data are downloaded in two sets per participating IR. The first set includes page level statistics about URLs pointing to IR pages and content files. The following fields are downloaded for each URL, with one row per URL:
url: This is returned as a 'page' by the GSC API, and is the URL of the page which was included in an SERP for a Google property.
impressions: The number of times the URL appears within the SERP.
clicks: The number of clicks on a URL which took users to a page outside of the SERP.
clickThrough: Calculated as the number of clicks divided by the number of impressions.
position: The position of the URL within the SERP.
date: The date of the search.
Following data processing describe below, on ingest into RAMP a additional field, citableContent, is added to the page level data.
The second set includes similar information, but instead of being aggregated at the page level, the data are grouped based on the country from which the user submitted the corresponding search, and the type of device used. The following fields are downloaded for combination of country and device, with one row per country/device combination:
country: The country from which the corresponding search originated.
device: The device used for the search.
impressions: The number of times the URL appears within the SERP.
clicks: The number of clicks on a URL which took users to a page outside of the SERP.
clickThrough: Calculated as the number of clicks divided by the number of impressions.
position: The position of the URL within the SERP.
date: The date of the search.
Note that no personally identifiable information is downloaded by RAMP. Google does not make such information available.
More information about click-through rates, impressions, and position is available from Google's Search Console API documentation: https://developers.google.com/webmaster-tools/search-console-api-original/v3/searchanalytics/query and https://support.google.com/webmasters/answer/7042828?hl=en
Data Processing
Upon download from GSC, the page level data described above are processed to identify URLs that point to citable content. Citable content is defined within RAMP as any URL which points to any type of non-HTML content file (PDF, CSV, etc.). As part of the daily download of page level statistics from Google Search Console (GSC), URLs are analyzed to determine whether they point to HTML pages or actual content files. URLs that point to content files are flagged as "citable content." In addition to the fields downloaded from GSC described above, following this brief analysis one more field, citableContent, is added to the page level data which records whether each page/URL in the GSC data points to citable content. Possible values for the citableContent field are "Yes" and "No."
The data aggregated by the search country of origin and device type do not include URLs. No additional processing is done on these data. Harvested data are passed directly into Elasticsearch.
Processed data are then saved in a series of Elasticsearch indices. Currently, RAMP stores data in two indices per participating IR. One index includes the page level data, the second index includes the country of origin and device type data.
About Citable Content Downloads
Data visualizations and aggregations in RAMP dashboards present information about citable content downloads, or CCD. As a measure of use of institutional repository content, CCD represent click activity on IR content that may correspond to research use.
CCD information is summary data calculated on the fly within the RAMP web application. As noted above, data provided by GSC include whether and how many times a URL was clicked by users. Within RAMP, a "click" is counted as a potential download, so a CCD is calculated as the sum of clicks on pages/URLs that are determined to point to citable content (as defined above).
For any specified date range, the steps to calculate CCD are:
Filter data to only include rows where "citableContent" is set to "Yes."
Sum the value of the "clicks" field on these rows.
Output to CSV
Published RAMP data are exported from the production Elasticsearch instance and converted to CSV format. The CSV data consist of one "row" for each page or URL from a specific IR which appeared in search result pages (SERP) within Google properties as described above. Also as noted above, daily data are downloaded for each IR in two sets which cannot be combined. One dataset includes the URLs of items that appear in SERP. The second dataset is aggregated by combination of the country from which a search was conducted and the device used.
As a result, two CSV datasets are provided for each month of published data:
page-clicks:
The data in these CSV files correspond to the page-level data, and include the following fields:
url: This is returned as a 'page' by the GSC API, and is the URL of the page which was included in an SERP for a Google property.
impressions: The number of times the URL appears within the SERP.
clicks: The number of clicks on a URL which took users to a page outside of the SERP.
clickThrough: Calculated as the number of clicks divided by the number of impressions.
position: The position of the URL within the SERP.
date: The date of the search.
citableContent: Whether or not the URL points to a content file (ending with pdf, csv, etc.) rather than HTML wrapper pages. Possible values are Yes or No.
index: The Elasticsearch index corresponding to page click data for a single IR.
repository_id: This is a human readable alias for the index and identifies the participating repository corresponding to each row. As RAMP has undergone platform and version migrations over time, index names as defined for the previous field have not remained consistent. That is, a single participating repository may have multiple corresponding Elasticsearch index names over time. The repository_id is a canonical identifier that has been added to the data to provide an identifier that can be used to reference a single participating repository across all datasets. Filtering and aggregation for individual repositories or groups of repositories should be done using this field.
Filenames for files containing these data end with “page-clicks”. For example, the file named 2020-01_RAMP_all_page-clicks.csv contains page level click data for all RAMP participating IR for the month of January, 2020.
country-device-info:
The data in these CSV files correspond to the data aggregated by country from which a search was conducted and the device used. These include the following fields:
country: The country from which the corresponding search originated.
device: The device used for the search.
impressions: The number of times the URL appears within the SERP.
clicks: The number of clicks on a URL which took users to a page outside of the SERP.
clickThrough: Calculated as the number of clicks divided by the number of impressions.
position: The position of the URL within the SERP.
date: The date of the search.
index: The Elasticsearch index corresponding to country and device access information data for a single IR.
repository_id: This is a human readable alias for the index and identifies the participating repository corresponding to each row. As RAMP has undergone platform and version migrations over time, index names as defined for the previous field have not remained consistent. That is, a single participating repository may have multiple corresponding Elasticsearch index names over time. The repository_id is a canonical identifier that has been added to the data to provide an identifier that can be used to reference a single participating repository across all datasets. Filtering and aggregation for individual repositories or groups of repositories should be done using this field.
Filenames for files containing these data end with “country-device-info”. For example, the file named 2020-01_RAMP_all_country-device-info.csv contains country and device data for all participating IR for the month of January, 2020.
References
Google, Inc. (2021). Search Console APIs. Retrieved from https://developers.google.com/webmaster-tools/search-console-api-original.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Monthly statistics for pages viewed by visitors to the Queensland Government website—Employment and jobs franchise. Source: Google Analytics Monthly statistics for pages viewed by visitors to the Queensland Government website—Employment and jobs franchise. Source: Google Analytics
https://www.dataflix.com/data360/license/https://www.dataflix.com/data360/license/
The Dataflix COVID dataset is a centralized repository of up-to-date and curated data focused on key tracking metics and U.S. census data. The dataset is publicly-readable & accessible on Google BigQuery – ready for analysis, analytics and machine learning initiatives. The dataset is built on data sourced from trusted sources like CSSE at Johns Hopkins University and government agencies, covering a wide range of metrics including confirmed cases, new cases, % population, mortality rate and deaths, aggregated at various geographic levels including city, county, state and country. New data is published on daily basis. Our objective is to make structured COVID data available for organizations and individuals to help in the fight against COVID-19. Example, health authorities will be able to build reports & dashboards to efficiently deploy vital resources like hospital beds and ventilators as they track the spread of the disease. Or epidemiologists can use the dataset to complement their existing models & datasets, and generate better forecasts of hotspots and trends. Weitere Informationen
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Monthly Google Analytics statistics for datasets accessed by visitors to Queensland Government’s publications portal.
ODC Public Domain Dedication and Licence (PDDL) v1.0http://www.opendatacommons.org/licenses/pddl/1.0/
License information was derived automatically
This data shows the number of hits each page gets on the City of Bloomington website. The data is pulled from Google analytics.
Instructions for exporting data from Google Analytics are available here
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Monthly statistics for pages viewed by visitors to the Queensland Government website—Your rights, crime and the law franchise. Source: Google Analytics
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Since the dawn of human life on the face of the earth, the global population has been booming. The population was estimated to be 1 billion people in the year 1800. The figure had increased to a new high of 6 billion humans by the turn of the twentieth century. Day in and day out, 227,000 people are being added to the world; it is projected that by the end of the 21st century, the world's population may exceed 11 billion.
As per reports, as a consequence of the unsustainable increase in population and a lack of access to adequate health care, food, and shelter, the number of genetic disorder ailments have increased. Hereditary illnesses are becoming more common due to a lack of understanding about the need for genetic testing. Often kids die as a result of these illnesses, thus genetic testing during pregnancy is critical.
You are hired as a Machine Learning Engineer from a government agency. You are given a dataset that contains medical information about children who have genetic disorders. Your task is to predict the following:
Genetic disorder Disorder subclass
The dataset folder contains the following files: train.csv: 22083 x 45 test.csv: 9465 x 43 sample_submission.csv: 5 x 3
Check details of each attribute of the dataset here
The participants are encouraged to submit their solution at here
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Monthly statistics for pages viewed by visitors to the Queensland Government website—People with disability franchise. Source: Google Analytics
Web traffic statistics for the several City-Parish websites, brla.gov, city.brla.gov, Red Stick Ready, GIS, Open Data etc. Information provided by Google Analytics.