Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
This Website Statistics dataset has four resources showing usage of the Lincolnshire Open Data website. Web analytics terms used in each resource are defined in their accompanying Metadata file.
Website Usage Statistics: This document shows a statistical summary of usage of the Lincolnshire Open Data site for the latest calendar year.
Website Statistics Summary: This dataset shows a website statistics summary for the Lincolnshire Open Data site for the latest calendar year.
Webpage Statistics: This dataset shows statistics for individual Webpages on the Lincolnshire Open Data site by calendar year.
Dataset Statistics: This dataset shows cumulative totals for Datasets on the Lincolnshire Open Data site that have also been published on the national Open Data site Data.Gov.UK - see the Source link.
Note: Website and Webpage statistics (the first three resources above) show only UK users, and exclude API calls (automated requests for datasets). The Dataset Statistics are confined to users with javascript enabled, which excludes web crawlers and API calls.
These Website Statistics resources are updated annually in January by the Lincolnshire County Council Business Intelligence team. For any enquiries about the information contact opendata@lincolnshire.gov.uk.
This online application gives manufacturers the ability to compare Iowa to other states on a number of different topics including: business climate, education, operating costs, quality of life and workforce.
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
On a quest to compare different cryptoexchanges, I came up with the idea to compare metrics across multiple platforms (at the moment just two). CoinGecko and CoinMarketCap are two of the biggest websites for monitoring both exchanges and cryptoprojects. In response to over-inflated volumes faked by crypto exchanges, both websites came up with independent metrics for assessing the worth of a given exchange.
Collected on May 10, 2020
CoinGecko's data is a bit more holistic, containing metrics across a multitude of areas (you can read more in the original blog post here. The data from CoinGecko consists of the following:
-Exchange Name -Trust Score (on a scale of N/A-10) -Type (centralized/decentralized) -AML (risk: How well prepared are they to handle financial crime?) -API Coverage (Blanket Measure that includes: (1) Tickers Data (2) Historical Trades Data (3) Order Book Data (4) Candlestick/OHLC (5) WebSocket API (6) API Trading (7) Public Documentation -API Last Updated (When was the API last updated?) -Bid Ask Spread (Average buy/sell spread across all pairs) -Candlestick (Available/Not) -Combined Orderbook Percentile (See above link) -Estimated_Reserves (estimated holdings of major crypto) -Grade_Score (Overall API score) -Historical Data (available/not) -Jurisdiction Risk (risk: risk of Terrorist activity/bribery/corruption?) -KYC Procedures (risk: Know Your Customer?) -License and Authorization (risk: has exchange sought regulatory approval?) -Liquidity (don't confuse with "CMC Liquidity". THIS column is a combo of (1) Web traffic & Reported Volume (2) Order book spread (3) Trading Activity (4) Trust Score on Trading Pairs -Negative News (risk: any bad news?) -Normalized Trading Volume (Trading Volume normalized to web traffic) -Normalized Volume Percentile (see above blog link) -Orderbook (available/not) -Public Documentation (got well documented API available to everyone?) -Regulatory Compliance (risk rating from compliance perspective) -Regulatory last updated (last time regulatory metrics were updated) -Reported Trading Volume (volume as listed by the exchange) -Reported Normalized Trading Volume (Ratio of normalized to reported volume [0-1]) -Sanctions (risk: risk of sanctions?) -Scale (based on: (1) Normalized Trading Volume Percentile (2) Normalized Order Book Depth Percentile -Senior Public Figure (risk: does exchange have transparent public relations? etc) -Tickers (tick tick tick...) -Trading via API (can data be traded through the API?) -Websocket (got websockets?)
-Green Pairs (Percentage of trading pairs deemed to have good liquidity) -Yellow Pairs (Percentage of trading pairs deemed to have fair liquidity -Red Pairs (Percentage of trading pairs deemed to have poor liquidity) -Unknown Pairs (percentage of trading pairs that do not have sufficient order book data)
~
Again, CoinMarketCap only has one metric (that was recently updated and scales from 1-1000, 1000 being very liquid and 1 not. You can go check the article out for yourself. In the dataset, this is the "CMC Liquidity" column, not to be confused with the "Liquidity" column, which refers to the CoinGecko Metric!
Thanks to coingecko and cmc for making their data scrapable :)
[CMC, you should try to give us a little more access to the figures that define your metric. Thanks!]
Your data will be in front of the world's largest data science community. What questions do you want to see answered?
Nursing Home Compare has detailed information about every Medicare and Medicaid nursing home in the country. A nursing home is a place for people who can’t be cared for at home and need 24-hour nursing care. These are the official datasets used on the Medicare.gov Nursing Home Compare Website provided by the Centers for Medicare & Medicaid Services. These data allow you to compare the quality of care at every Medicare and Medicaid-certified nursing home in the country, including over 15,000 nationwide.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset was used in the Kaggle Wikipedia Web Traffic forecasting competition. It contains 145063 daily time series representing the number of hits or web traffic for a set of Wikipedia pages from 2015-07-01 to 2017-09-10.
The original dataset contains missing values. They have been simply replaced by zeros.
https://pasteur.epa.gov/license/sciencehub-license.htmlhttps://pasteur.epa.gov/license/sciencehub-license.html
Monthly site compare scripts and output used to generate the model/ob plots and statistics in the manuscript. The AQS hourly site compare output files are not included as they were too large to store on ScienceHub. The files contain paired model/ob values for the various air quality networks.
This dataset is associated with the following publication: Appel, W., S. Napelenok, K. Foley, H. Pye, C. Hogrefe, D. Luecken, J. Bash, S. Roselle, J. Pleim, H. Foroutan, B. Hutzell, G. Pouliot, G. Sarwar, K. Fahey, B. Gantt, D. Kang, R. Mathur, D. Schwede, T. Spero, D. Wong, J. Young, and N. Heath. Description and evaluation of the Community Multiscale Air Quality (CMAQ) modeling system version 5.1. Geoscientific Model Development. Copernicus Publications, Katlenburg-Lindau, GERMANY, 10: 1703-1732, (2017).
This data about nola.gov provides a window into how people are interacting with the the City of New Orleans online. The data comes from a unified Google Analytics account for New Orleans. We do not track individuals and we anonymize the IP addresses of all visitors.
A multidisciplinary repository of public data sets such as the Human Genome and US Census data that can be seamlessly integrated into AWS cloud-based applications. AWS is hosting the public data sets at no charge for the community. Anyone can access these data sets from their Amazon Elastic Compute Cloud (Amazon EC2) instances and start computing on the data within minutes. Users can also leverage the entire AWS ecosystem and easily collaborate with other AWS users. If you have a public domain or non-proprietary data set that you think is useful and interesting to the AWS community, please submit a request and the AWS team will review your submission and get back to you. Typically the data sets in the repository are between 1 GB to 1 TB in size (based on the Amazon EBS volume limit), but they can work with you to host larger data sets as well. You must have the right to make the data freely available.
This Dataset contains information related to web marketing analytics. it contains information such as sessions, session duration, bounces, time on page, unique page that gives insight into web performance
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This is a dataset of Tor cell file extracted from browsing simulation using Tor Browser. The simulations cover both desktop and mobile webpages. The data collection process was using WFP-Collector tool (https://github.com/irsyadpage/WFP-Collector). All the neccessary configuration to perform the simulation as detailed in the tool repository.The webpage URL is selected by using the first 100 website based on: https://dataforseo.com/free-seo-stats/top-1000-websites.Each webpage URL is visited 90 times for each deskop and mobile browsing mode.
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
This dataset tracks influencer marketing campaigns across major social media platforms, providing a robust foundation for analyzing campaign effectiveness, engagement, reach, and sales outcomes. Each record represents a unique campaign and includes details such as the campaign’s platform (Instagram, YouTube, TikTok, Twitter), influencer category (e.g., Fashion, Tech, Fitness), campaign type (Product Launch, Brand Awareness, Giveaway, etc.), start and end dates, total user engagements, estimated reach, product sales, and campaign duration. The dataset structure supports diverse analyses, including ROI calculation, campaign benchmarking, and influencer performance comparison.
Columns:
- campaign_id
: Unique identifier for each campaign
- platform
: Social media platform where the campaign ran
- influencer_category
: Niche or industry focus of the influencer
- campaign_type
: Objective or style of the campaign
- start_date
, end_date
: Campaign time frame
- engagements
: Total user interactions (likes, comments, shares, etc.)
- estimated_reach
: Estimated number of unique users exposed to the campaign
- product_sales
: Number of products sold as a result of the campaign
- campaign_duration_days
: Duration of the campaign in days
import pandas as pd
df = pd.read_csv('influencer_marketing_roi_dataset.csv', parse_dates=['start_date', 'end_date'])
print(df.head())
print(df.info())
# Overview of campaign types and platforms
print(df['campaign_type'].value_counts())
print(df['platform'].value_counts())
# Summary statistics
print(df[['engagements', 'estimated_reach', 'product_sales']].describe())
# Average engagements and sales by platform
platform_stats = df.groupby('platform')[['engagements', 'product_sales']].mean()
print(platform_stats)
# Top influencer categories by product sales
top_categories = df.groupby('influencer_category')['product_sales'].sum().sort_values(ascending=False)
print(top_categories)
# Assume a fixed campaign cost for demonstration
df['campaign_cost'] = 500 + df['estimated_reach'] * 0.01 # Example formula
# Calculate ROI: (Revenue - Cost) / Cost
# Assume each product sold yields $40 revenue
df['revenue'] = df['product_sales'] * 40
df['roi'] = (df['revenue'] - df['campaign_cost']) / df['campaign_cost']
# View campaigns with highest ROI
top_roi = df.sort_values('roi', ascending=False).head(10)
print(top_roi[['campaign_id', 'platform', 'roi']])
import matplotlib.pyplot as plt
import seaborn as sns
# Engagements vs. Product Sales scatter plot
plt.figure(figsize=(8,6))
sns.scatterplot(data=df, x='engagements', y='product_sales', hue='platform', alpha=0.6)
plt.title('Engagements vs. Product Sales by Platform')
plt.xlabel('Engagements')
plt.ylabel('Product Sales')
plt.legend()
plt.show()
# Average ROI by Influencer Category
category_roi = df.groupby('influencer_category')['roi'].mean().sort_values()
category_roi.plot(kind='barh', color='teal')
plt.title('Average ROI by Influencer Category')
plt.xlabel('Average ROI')
plt.show()
# Campaigns over time
df['month'] = df['start_date'].dt.to_period('M')
monthly_sales = df.groupby('month')['product_sales'].sum()
monthly_sales.plot(figsize=(10,4), marker='o', title='Monthly Product Sales from Influencer Campaigns')
plt.ylabel('Product Sales')
plt.show()
Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
Internet use in the UK annual estimates by age, sex, disability, ethnic group, economic activity and geographical location, including confidence intervals.
https://domainmetadata.com/termshttps://domainmetadata.com/terms
Download new, active & historic .compare domains — updated multiple times daily
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Historical Dataset of Web Academy is provided by PublicSchoolReview and contain statistics on metrics:Total Students Trends Over Years (2005-2007),Distribution of Students By Grade Trends,American Indian Student Percentage Comparison Over Years (2006-2007),Asian Student Percentage Comparison Over Years (2005-2007),Hispanic Student Percentage Comparison Over Years (2005-2007),Black Student Percentage Comparison Over Years (2005-2007),White Student Percentage Comparison Over Years (2005-2007),Diversity Score Comparison Over Years (2005-2007),Free Lunch Eligibility Comparison Over Years (2006-2007)
This dataset collection comprises a series of related data tables sourced from the website of 'Tilastokeskus' (Statistics Finland), based in Finland. The tables within this collection contain data retrieved from the Statistics Finland's service interface (WFS). The content of the tables is organized in a structured format with rows and columns, showcasing a correlation between different sets of data. The collection, while primarily intended for statistical analysis, can be utilized in a variety of ways, depending on the specific needs of the user. This dataset is licensed under CC BY 4.0 (Creative Commons Attribution 4.0, https://creativecommons.org/licenses/by/4.0/deed.fi).
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
By [source]
This dataset collects job offers from web scraping which are filtered according to specific keywords, locations and times. This data gives users rich and precise search capabilities to uncover the best working solution for them. With the information collected, users can explore options that match with their personal situation, skillset and preferences in terms of location and schedule. The columns provide detailed information around job titles, employer names, locations, time frames as well as other necessary parameters so you can make a smart choice for your next career opportunity
For more datasets, click here.
- 🚨 Your notebook can be here! 🚨!
This dataset is a great resource for those looking to find an optimal work solution based on keywords, location and time parameters. With this information, users can quickly and easily search through job offers that best fit their needs. Here are some tips on how to use this dataset to its fullest potential:
Start by identifying what type of job offer you want to find. The keyword column will help you narrow down your search by allowing you to search for job postings that contain the word or phrase you are looking for.
Next, consider where the job is located – the Location column tells you where in the world each posting is from so make sure it’s somewhere that suits your needs!
Finally, consider when the position is available – look at the Time frame column which gives an indication of when each posting was made as well as if it’s a full-time/ part-time role or even if it’s a casual/temporary position from day one so make sure it meets your requirements first before applying!
Additionally, if details such as hours per week or further schedule information are important criteria then there is also info provided under Horari and Temps Oferta columns too! Now that all three criteria have been ticked off - key words, location and time frame - then take a look at Empresa (Company Name) and Nom_Oferta (Post Name) columns too in order to get an idea of who will be employing you should you land the gig!
All these pieces of data put together should give any motivated individual all they need in order to seek out an optimal work solution - keep hunting good luck!
- Machine learning can be used to groups job offers in order to facilitate the identification of similarities and differences between them. This could allow users to specifically target their search for a work solution.
- The data can be used to compare job offerings across different areas or types of jobs, enabling users to make better informed decisions in terms of their career options and goals.
- It may also provide an insight into the local job market, enabling companies and employers to identify where there is potential for new opportunities or possible trends that simply may have previously gone unnoticed
If you use this dataset in your research, please credit the original authors. Data Source
License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.
File: web_scraping_information_offers.csv | Column name | Description | |:-----------------|:------------------------------------| | Nom_Oferta | Name of the job offer. (String) | | Empresa | Company offering the job. (String) | | Ubicació | Location of the job offer. (String) | | Temps_Oferta | Time of the job offer. (String) | | Horari | Schedule of the job offer. (String) |
If you use this dataset in your research, please credit the original authors. If you use this dataset in your research, please credit .
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This file contains data extracted from DPC, “Statistics for: www.cinemacontext.nl” (https://dpc.uba.uva.nl/awstats/awstats.pl?config=www.cinemacontext.nl). The statistics for the Cinema Context website are collected by the Digital Production Center (DPC) of the University Library Amsterdam (UBA), the organization that hosts and maintains the database and web interface. DPC collects the web statistics with the program Advanced Web Statistics (AWStats, version 7.0). The extracted data in this spreadsheet support the analysis of the use of Cinema Context for the article 'Writing Cinema Histories with Digital Databases. The Case of Cinema Context’, authored by Julia Noordegraaf, Kathleen Lotze and Jaap Boter. Tijdschrift voor Mediageschiedenis vol. 21, no. 2 (2018), 106-126. Http://www.tijdschriftmediageschiedenis.nl/index.php/tmg/article/view/369.
The total amount of data created, captured, copied, and consumed globally is forecast to increase rapidly, reaching *** zettabytes in 2024. Over the next five years up to 2028, global data creation is projected to grow to more than *** zettabytes. In 2020, the amount of data created and replicated reached a new high. The growth was higher than previously expected, caused by the increased demand due to the COVID-19 pandemic, as more people worked and learned from home and used home entertainment options more often. Storage capacity also growing Only a small percentage of this newly created data is kept though, as just * percent of the data produced and consumed in 2020 was saved and retained into 2021. In line with the strong growth of the data volume, the installed base of storage capacity is forecast to increase, growing at a compound annual growth rate of **** percent over the forecast period from 2020 to 2025. In 2020, the installed base of storage capacity reached *** zettabytes.
http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/
Africa - Population and Internet users statistics
What's inside is more than just rows and columns. Make it easy for others to get started by describing how you acquired the data and what time period it represents, too.
Source: https://data.humdata.org/dataset/africa-population-and-internet-users-statistics Last updated at https://data.humdata.org/organization/openafrica : 2019-09-11
Population and income profile - totals, median household.
Splitgraph serves as an HTTP API that lets you run SQL queries directly on this data to power Web applications. For example:
See the Splitgraph documentation for more information.
Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
This Website Statistics dataset has four resources showing usage of the Lincolnshire Open Data website. Web analytics terms used in each resource are defined in their accompanying Metadata file.
Website Usage Statistics: This document shows a statistical summary of usage of the Lincolnshire Open Data site for the latest calendar year.
Website Statistics Summary: This dataset shows a website statistics summary for the Lincolnshire Open Data site for the latest calendar year.
Webpage Statistics: This dataset shows statistics for individual Webpages on the Lincolnshire Open Data site by calendar year.
Dataset Statistics: This dataset shows cumulative totals for Datasets on the Lincolnshire Open Data site that have also been published on the national Open Data site Data.Gov.UK - see the Source link.
Note: Website and Webpage statistics (the first three resources above) show only UK users, and exclude API calls (automated requests for datasets). The Dataset Statistics are confined to users with javascript enabled, which excludes web crawlers and API calls.
These Website Statistics resources are updated annually in January by the Lincolnshire County Council Business Intelligence team. For any enquiries about the information contact opendata@lincolnshire.gov.uk.