This file contains 5 years of daily time series data for several measures of traffic on a statistical forecasting teaching notes website whose alias is statforecasting.com. The variables have complex seasonality that is keyed to the day of the week and to the academic calendar. The patterns you you see here are similar in principle to what you would see in other daily data with day-of-week and time-of-year effects. Some good exercises are to develop a 1-day-ahead forecasting model, a 7-day ahead forecasting model, and an entire-next-week forecasting model (i.e., next 7 days) for unique visitors.
The variables are daily counts of page loads, unique visitors, first-time visitors, and returning visitors to an academic teaching notes website. There are 2167 rows of data spanning the date range from September 14, 2014, to August 19, 2020. A visit is defined as a stream of hits on one or more pages on the site on a given day by the same user, as identified by IP address. Multiple individuals with a shared IP address (e.g., in a computer lab) are considered as a single user, so real users may be undercounted to some extent. A visit is classified as "unique" if a hit from the same IP address has not come within the last 6 hours. Returning visitors are identified by cookies if those are accepted. All others are classified as first-time visitors, so the count of unique visitors is the sum of the counts of returning and first-time visitors by definition. The data was collected through a traffic monitoring service known as StatCounter.
This file and a number of other sample datasets can also be found on the website of RegressIt, a free Excel add-in for linear and logistic regression which I originally developed for use in the course whose website generated the traffic data given here. If you use Excel to some extent as well as Python or R, you might want to try it out on this dataset.
The visitors to government websites for each hour in the last 24 hours.
The FDOT Annual Average Daily Traffic feature class provides spatial information on Annual Average Daily Traffic section breaks for the state of Florida. In addition, it provides affiliated traffic information like KFCTR, DFCTR and TFCTR among others. This dataset is maintained by the Transportation Data & Analytics office (TDA). The source spatial data for this hosted feature layer was created on: 07/12/2025.Download Data: Enter Guest as Username to download the source shapefile from here: https://ftp.fdot.gov/file/d/FTP/FDOT/co/planning/transtat/gis/shapefiles/aadt.zip
Click Web Traffic Combined with Transaction Data: A New Dimension of Shopper Insights
Consumer Edge is a leader in alternative consumer data for public and private investors and corporate clients. Click enhances the unparalleled accuracy of CE Transact by allowing investors to delve deeper and browse further into global online web traffic for CE Transact companies and more. Leverage the unique fusion of web traffic and transaction datasets to understand the addressable market and understand spending behavior on consumer and B2B websites. See the impact of changes in marketing spend, search engine algorithms, and social media awareness on visits to a merchant’s website, and discover the extent to which product mix and pricing drive or hinder visits and dwell time. Plus, Click uncovers a more global view of traffic trends in geographies not covered by Transact. Doubleclick into better forecasting, with Click.
Consumer Edge’s Click is available in machine-readable file delivery and enables: • Comprehensive Global Coverage: Insights across 620+ brands and 59 countries, including key markets in the US, Europe, Asia, and Latin America. • Integrated Data Ecosystem: Click seamlessly maps web traffic data to CE entities and stock tickers, enabling a unified view across various business intelligence tools. • Near Real-Time Insights: Daily data delivery with a 5-day lag ensures timely, actionable insights for agile decision-making. • Enhanced Forecasting Capabilities: Combining web traffic indicators with transaction data helps identify patterns and predict revenue performance.
Use Case: Analyze Year Over Year Growth Rate by Region
Problem A public investor wants to understand how a company’s year-over-year growth differs by region.
Solution The firm leveraged Consumer Edge Click data to: • Gain visibility into key metrics like views, bounce rate, visits, and addressable spend • Analyze year-over-year growth rates for a time period • Breakout data by geographic region to see growth trends
Metrics Include: • Spend • Items • Volume • Transactions • Price Per Volume
Inquire about a Click subscription to perform more complex, near real-time analyses on public tickers and private brands as well as for industries beyond CPG like: • Monitor web traffic as a leading indicator of stock performance and consumer demand • Analyze customer interest and sentiment at the brand and sub-brand levels
Consumer Edge offers a variety of datasets covering the US, Europe (UK, Austria, France, Germany, Italy, Spain), and across the globe, with subscription options serving a wide range of business needs.
Consumer Edge is the Leader in Data-Driven Insights Focused on the Global Consumer
Annual average daily traffic is the total volume for the year divided by 365 days. The traffic count year is from October 1st through September 30th. Very few locations in California are actually counted continuously. Traffic Counting is generally performed by electronic counting instruments moved from location throughout the State in a program of continuous traffic count sampling. The resulting counts are adjusted to an estimate of annual average daily traffic by compensating for seasonal influence, weekly variation and other variables which may be present. Annual ADT is necessary for presenting a statewide picture of traffic flow, evaluating traffic trends, computing accident rates. planning and designing highways and other purposes.Traffic Census Program Page
The census count of vehicles on city streets is normally reported in the form of Average Daily Traffic (ADT) counts. These counts provide a good estimate for the actual number of vehicles on an average weekday at select street segments. Specific block segments are selected for a count because they are deemed as representative of a larger segment on the same roadway. ADT counts are used by transportation engineers, economists, real estate agents, planners, and others professionals for planning and operational analysis. The frequency for each count varies depending on City staff’s needs for analysis in any given area. This report covers the counts taken in our City during the past 12 years approximately.
AADT represents current (most recent) Annual Average Daily Traffic on sampled road systems. This information is displayed using the Traffic Count Locations Active feature class as of the annual HPMS freeze in January. Historical AADT is found in another table. Please note that updates to this dataset are on an annual basis, therefore the data may not match ground conditions or may not be available for new roadways. Resource Contact: Christy Prentice, Traffic Forecasting & Analysis (TFA), http://www.dot.state.mn.us/tda/contacts.html#TFA
Check other metadata records in this package for more information on Annual Average Daily Traffic Locations Information.
Link to ESRI Feature Service:
Annual Average Daily Traffic Locations in Minnesota: Annual Average Daily Traffic Locations
The data on the use of the data sets on the OGD portal BL (data.bl.ch) are collected and published by the specialist and coordination office OGD BL. Contains the day the usage was measured.dataset_title: The title of the dataset_id record: The technical ID of the dataset.visitors: Specifies the number of daily visitors to the record. Visitors are recorded by counting the unique IP addresses that recorded access on the day of the survey. The IP address represents the network address of the device from which the portal was accessed.interactions: Includes all interactions with any record on data.bl.ch. A visitor can trigger multiple interactions. Interactions include clicks on the website (searching datasets, filters, etc.) as well as API calls (downloading a dataset as a JSON file, etc.).RemarksOnly calls to publicly available datasets are shown.IP addresses and interactions of users with a login of the Canton of Basel-Landschaft - in particular of employees of the specialist and coordination office OGD - are removed from the dataset before publication and therefore not shown.Calls from actors that are clearly identifiable as bots by the user agent header are also not shown.Combinations of dataset and date for which no use occurred (Visitors == 0 & Interactions == 0) are not shown.Due to synchronization problems, data may be missing by the day.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Author: Víctor Yeste. Universitat Politècnica de Valencia.The object of this study is the design of a cybermetric methodology whose objectives are to measure the success of the content published in online media and the possible prediction of the selected success variables.In this case, due to the need to integrate data from two separate areas, such as web publishing and the analysis of their shares and related topics on Twitter, has opted for programming as you access both the Google Analytics v4 reporting API and Twitter Standard API, always respecting the limits of these.The website analyzed is hellofriki.com. It is an online media whose primary intention is to solve the need for information on some topics that provide daily a vast number of news in the form of news, as well as the possibility of analysis, reports, interviews, and many other information formats. All these contents are under the scope of the sections of cinema, series, video games, literature, and comics.This dataset has contributed to the elaboration of the PhD Thesis:Yeste Moreno, VM. (2021). Diseño de una metodología cibermétrica de cálculo del éxito para la optimización de contenidos web [Tesis doctoral]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/176009Data have been obtained from each last-minute news article published online according to the indicators described in the doctoral thesis. All related data are stored in a database, divided into the following tables:tesis_followers: User ID list of media account followers.tesis_hometimeline: data from tweets posted by the media account sharing breaking news from the web.status_id: Tweet IDcreated_at: date of publicationtext: content of the tweetpath: URL extracted after processing the shortened URL in textpost_shared: Article ID in WordPress that is being sharedretweet_count: number of retweetsfavorite_count: number of favoritestesis_hometimeline_other: data from tweets posted by the media account that do not share breaking news from the web. Other typologies, automatic Facebook shares, custom tweets without link to an article, etc. With the same fields as tesis_hometimeline.tesis_posts: data of articles published by the web and processed for some analysis.stats_id: Analysis IDpost_id: Article ID in WordPresspost_date: article publication date in WordPresspost_title: title of the articlepath: URL of the article in the middle webtags: Tags ID or WordPress tags related to the articleuniquepageviews: unique page viewsentrancerate: input ratioavgtimeonpage: average visit timeexitrate: output ratiopageviewspersession: page views per sessionadsense_adunitsviewed: number of ads viewed by usersadsense_viewableimpressionpercent: ad display ratioadsense_ctr: ad click ratioadsense_ecpm: estimated ad revenue per 1000 page viewstesis_stats: data from a particular analysis, performed at each published breaking news item. Fields with statistical values can be computed from the data in the other tables, but total and average calculations are saved for faster and easier further processing.id: ID of the analysisphase: phase of the thesis in which analysis has been carried out (right now all are 1)time: "0" if at the time of publication, "1" if 14 days laterstart_date: date and time of measurement on the day of publicationend_date: date and time when the measurement is made 14 days latermain_post_id: ID of the published article to be analysedmain_post_theme: Main section of the published article to analyzesuperheroes_theme: "1" if about superheroes, "0" if nottrailer_theme: "1" if trailer, "0" if notname: empty field, possibility to add a custom name manuallynotes: empty field, possibility to add personalized notes manually, as if some tag has been removed manually for being considered too generic, despite the fact that the editor put itnum_articles: number of articles analysednum_articles_with_traffic: number of articles analysed with traffic (which will be taken into account for traffic analysis)num_articles_with_tw_data: number of articles with data from when they were shared on the media’s Twitter accountnum_terms: number of terms analyzeduniquepageviews_total: total page viewsuniquepageviews_mean: average page viewsentrancerate_mean: average input ratioavgtimeonpage_mean: average duration of visitsexitrate_mean: average output ratiopageviewspersession_mean: average page views per sessiontotal: total of ads viewedadsense_adunitsviewed_mean: average of ads viewedadsense_viewableimpressionpercent_mean: average ad display ratioadsense_ctr_mean: average ad click ratioadsense_ecpm_mean: estimated ad revenue per 1000 page viewsTotal: total incomeretweet_count_mean: average incomefavorite_count_total: total of favoritesfavorite_count_mean: average of favoritesterms_ini_num_tweets: total tweets on the terms on the day of publicationterms_ini_retweet_count_total: total retweets on the terms on the day of publicationterms_ini_retweet_count_mean: average retweets on the terms on the day of publicationterms_ini_favorite_count_total: total of favorites on the terms on the day of publicationterms_ini_favorite_count_mean: average of favorites on the terms on the day of publicationterms_ini_followers_talking_rate: ratio of followers of the media Twitter account who have recently published a tweet talking about the terms on the day of publicationterms_ini_user_num_followers_mean: average followers of users who have spoken of the terms on the day of publicationterms_ini_user_num_tweets_mean: average number of tweets published by users who spoke about the terms on the day of publicationterms_ini_user_age_mean: average age in days of users who have spoken of the terms on the day of publicationterms_ini_ur_inclusion_rate: URL inclusion ratio of tweets talking about terms on the day of publicationterms_end_num_tweets: total tweets on terms 14 days after publicationterms_ini_retweet_count_total: total retweets on terms 14 days after publicationterms_ini_retweet_count_mean: average retweets on terms 14 days after publicationterms_ini_favorite_count_total: total bookmarks on terms 14 days after publicationterms_ini_favorite_count_mean: average of favorites on terms 14 days after publicationterms_ini_followers_talking_rate: ratio of media Twitter account followers who have recently posted a tweet talking about the terms 14 days after publicationterms_ini_user_num_followers_mean: average followers of users who have spoken of the terms 14 days after publicationterms_ini_user_num_tweets_mean: average number of tweets published by users who have spoken about the terms 14 days after publicationterms_ini_user_age_mean: the average age in days of users who have spoken of the terms 14 days after publicationterms_ini_ur_inclusion_rate: URL inclusion ratio of tweets talking about terms 14 days after publication.tesis_terms: data of the terms (tags) related to the processed articles.stats_id: Analysis IDtime: "0" if at the time of publication, "1" if 14 days laterterm_id: Term ID (tag) in WordPressname: Name of the termslug: URL of the termnum_tweets: number of tweetsretweet_count_total: total retweetsretweet_count_mean: average retweetsfavorite_count_total: total of favoritesfavorite_count_mean: average of favoritesfollowers_talking_rate: ratio of followers of the media Twitter account who have recently published a tweet talking about the termuser_num_followers_mean: average followers of users who were talking about the termuser_num_tweets_mean: average number of tweets published by users who were talking about the termuser_age_mean: average age in days of users who were talking about the termurl_inclusion_rate: URL inclusion ratio
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset is available on Brisbane City Council’s open data website – data.brisbane.qld.gov.au. The site provides additional features for viewing and interacting with the data and for downloading the data in various formats.
Traffic Volume for Key Brisbane Corridors. Includes traffic volumes, travel times and incidents.
This dataset will no longer be updated. Data is being published in a new format in a new dataset called Traffic Management — Key Corridor — Monthly Performance Report.
Information on Traffic Management is available on the Brisbane City Council website.
This dataset contains the following resources:1. Traffic Volume for Key Brisbane Corridors.
Excel file containing: * 6-Month Average Daily, AM & PM Peak Traffic Volume * Network Daily Traffic Volume Comparison * 6-Month Average AM & PM Peak Travel Time * Network Travel Time Comparison * Incident Data * Note: volume day of the week and TT day of week was discontinued and is not included from Jul-Dec 2015
Excel file containing: * 6-Month Average Daily, AM & PM Peak Traffic Volume * Network Daily Traffic Volume Comparison * 6-Month Average AM & PM Peak Travel Time * Network Travel Time Comparison * Incident Data * Average daily traffic volume for each day of the week (veh/day) * Travel time per kilometre by day of the week (mm:ss/km)
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Traffic Volume for Key Brisbane Corridors. Includes traffic volumes, travel times and incidents.This dataset will no longer be updated. Data is being published in a new format in a new dataset called Traffic Management — Key Corridor — Monthly Performance Report.Information on Traffic Management is available on the Brisbane City Council website.This dataset contains the following resources:Traffic Volume for Key Brisbane Corridors.
Excel file containing:
6-Month Average Daily, AM & PM Peak Traffic Volume Network Daily Traffic Volume Comparison 6-Month Average AM & PM Peak Travel Time Network Travel Time Comparison Incident Data Note: volume day of the week and TT day of week was discontinued and is not included from Jul-Dec 2015Traffic Volume for Key Brisbane Corridors.
Excel file containing:
6-Month Average Daily, AM & PM Peak Traffic Volume Network Daily Traffic Volume Comparison 6-Month Average AM & PM Peak Travel Time Network Travel Time Comparison Incident Data Average daily traffic volume for each day of the week (veh/day) Travel time per kilometre by day of the week (mm:ss/km)
In accordance with Law No. 92-1444 of 31 December 1992 on noise control and the Environmental Code (Articles L. 571-10 and R. 571-32 to R. 571-43), in each department, the Prefect identifies and classifies land transport infrastructure according to its noise and traffic characteristics. On the basis of this classification, it shall determine, after consulting the municipalities, the sectors affected by noise, the levels of noise to be taken into account for the construction of buildings and the technical requirements likely to reduce them. The sectors thus determined and the requirements relating to the acoustic characteristics applicable to them are set out in the annexes to the local planning plans (LDPs) of the municipalities concerned. Article R. 571-33 of the Environmental Code specifies the infrastructures concerned by the sound classification: — roadways where the annual average daily traffic, or provided for in the study or the impact statement, exceeds 5000 vehicles per day; — intercity rail lines with average daily traffic exceeding 50 trains; — clean public transport lines with an average daily traffic of more than 100 buses; — urban rail lines with average daily traffic exceeding 100 trains.
The sound classification map and the prefectural decrees can be found on the website of the state departments in the department.
Only the documents annexed to the Prefectural Orders are authentic.
Data set containing traffic to the Montreal.ca portal, namely the number of visitors and pages viewed per day of the year.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This feature class is updated every business day using Python scripts and the WellNet database. Please disregard the "Date Updated" field as it does not keep in sync with DWR's internal enterprise geodatabase updates. The NDWR's water monitoring database contains information related to sites for groundwater measurements. These data are used by NDWR to assess the condition of the groundwater and surface water systems over time and are available to the public on NDWR’s website. Groundwater measurement sites are chosen based on physical location and access considerations, permit terms, and to maximize the distribution of measurement points in a given basin.Groundwater monitoring sites are typically chosen based on spatial location, access, and period of record considerations. When possible NDWR tries to have a distribution of monitoring locations within a given hydrographic area. The entity who does the monitoring depends on the site – for example, some mines have well fields where they collect data and submit those data to NDWR as a condition of their monitoring plan – and some sites are monitored by NDWR staff annually or more frequently. While people can volunteer to have their well monitored, more often the NDWR staff who measure water levels recommend an additional site or staff in the office recommend alternate sites. The Chief of the Hydrology Section will review the recommendations and make a final decision on adding/changing a site. This dataset is updated every business day from a non-spatial SQL Server database using lat/long coordinates to display location. This feature class participates in a relationship class with a groundwater measure table joined using the sitename field. This dataset contains both active and inactive sites. Measurement data is provided by reporting agencies and by regular site visits from NDWR staff. For website access, please see the Water Levels site at water.nv.gov/WaterLevelData.aspx
This dataset represents the road counts carried out on the various sections of the departmental roads (RD) of the Isère. These data are collected and analysed by the Mobility Department and integrated into the Departmental Road Information System. They are a real decision support tool to feed the Isère road master plan and adapt the policies for the operation, maintenance and modernisation of departmental road infrastructure. The data collected make it possible to assess the traffic of light vehicles and heavy goods vehicles. For each collection point, the count indicates the annual average daily traffic (AWTM) “obtained by calculating the yearly average of the number of vehicles circulating on the observed section, in all directions, during a day”. The number of heavy goods vehicles in the traffic composition accompanies the TMJA. The data collected make it possible to produce each year the “Annual Daily Traffic Maps” (TMJA) made available on the website isere.fr The counts are obtained from two types of traffic surveys carried out on the roadway: * Via permanent counting stations that report their data year-round: nearly 100 counting stations currently deployed * Via ad hoc surveys (road meters, pneumatic tubes temporarily installed depending on the importance of the tracks or for the purposes of studies of specific projects, safety operations...): between 100 & 300 one-off surveys organised per year The number of counting points therefore varies according to the years and needs for one-off surveys. This dataset offers a traffic history since 2009. Due to the health context linked to Covid, the years 2020 and 2021 were marked by a drastic decline in car traffic on departmental roads during the lockdown periods. It has not been possible to consolidate consistent figures for these years.
This is a point GIS dataset representing Traffic Volumes (Annual Average Daily Traffic (AADT)) on the California Department of Transportation (Caltrans) state highway network.
Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
London Borough level tourism trip estimates (thousands). The ‘top-down’ nature of the Local Area Tourism Impact (LATI) model (starting with London data) means it is best suited to disaggregate expenditure. However, tourism trips were also disaggregated for comparative purposes using the estimated proportions of spending by overseas, domestic and day visitors in the boroughs. Since the trip estimates are derived from data on trips to London they do not account for trips to different boroughs by visitors whilst in London. Indicative borough level day visitor/tourist estimates for 2007 were derived from the LDA’s own experimental London level day visitor estimates. As such the borough level day visitor estimates should be treated with caution and the 2007 day visitor estimates are not comparable with those from previous years. They are intended only to give a best estimate of the scale of day visitor tourism in each borough from the currently available data. Further tourism data for UK regions covering trends in visits, nights, and spend to London by visitors from overseas is available on the Visit Britain website. Analyse data by age, purpose, duration, and quarter. This dataset is no longer updated.
This data is a breakdown of all the moving violations (tickets) issued in every precinct throughout the city.This data is collected because the City Council passed Local Law #11 in 2011 and required the NYPD to post it.This data is scheduled to run every month by ITB and is posted on the NYPD website. Each record represents a moving violation issued to a motorist by summons type and what precinct it was issued in. This data can be used to see if poor driving in your resident precinct is being enforced.The limitations of the data is that it is just a stick count of violation without any street locations, time of day or day of the week.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Dear Scientist!This database contains data collected due to conducting study: "Analysis of the route safety of abnormal vehicle from the perspective of traffic parameters and infrastructure characteristics with the use of web technologies and machine learning" funded by National Science Centre Poland (Grant reference 2021/05/X/ST8/01669). The structure of files is arising from the aims of the study and numerous of sources needed to tailor suitable data possible to use as an input layer for neural network. You can find a following folders and files:1. Road_Parameters_Data (.csv) - which is data colleced by author before the study (2021). Here you can find information about technical quality and types of main roads located in Mazovia province (Poland). The source of data was Polish General Directorate for National Roads and Motorways. 2. Google_Maps_Data (.json) - here you can find the data, which was collected using the authors’ webservice created using the Python language, which downloaded the said data in the Distance Matrix API service on Google Maps at two-hour intervals from 25 May 2022 to 22 June 2022. The application retrieved the TRAFFIC FACTOR parameter, which was a ratio of actual time of travel divided by historical time of travel for particular roads.3. Geocoding_Roads_Data (.json) - in this folder you can find data gained from reverse geocoding approach based on geographical coordinates and the request parameter latlng were employed. As a result, Google Maps returned a response containing the postal code for the field types defined as postal_code and the name of the lowest possible level of the territorial unit for the field administrative_area_level. 4. Population_Density_Data (.csv) - here you can find date for territorial units, which were assigned to individual records were used to search the database of the Polish Postal Service using the authors' original web service written in the Python programming language. The records which contained a postal code were assigned the name of the municipality which corresponded to it. Finally, postal codes and names of territorial units were compared with the database of the Statistics Poland (GUS) containing information on population density for individual municipalities and assigned to existing records from the database.5. Roads_Incidents_Data (.json) - in this folder you can find a data collected by a webservice, which was programmed in the Python language and used for analysing the reported obstructions available on the website of the General Directorate for National Roads and Motorways. In the event of traffic obstruction emergence in the Mazovia Province, the application, on the basis of the number and kilometre of the road on which it occurred, could associate it later with appropriate records based on the links parameters. The data was colleced from 26 May to 22 June 2022.6. Weather_For_Roads_Data (.json) - here you can find the data concerning the weather conditions on the roads occurring at days of the study. To make this feasible, a webservice was programmed in the Python language, by means of which the selected items from the response returned by the www.timeanddate.com server for the corresponding input parameters were retrieved – geographical coordinates of the midpoint between the nodes of the particular roads. The data was colleced for day between 27 May and 22 June 2022.7. data_v_1 (.csv) - collected only data for road parameters8. data_v_2 (.csv) - collected data for road parameters + population density9. data_v_3 (.json) - collected data for road parameters + population density + traffic10. data_v_4 (.json) - collected data for road parameters + population density + traffic + weather + road incidents11. data_v_5 (.csv) - collected VALIDATED and cleaned data for road parameters + population density + traffic + weather + road incidents. At this stage, the road sections for which the parameter traffic factor was assessed to have been estimated incorrectly were eliminated. These were combinations for which the value of the traffic factor remained the same regardless the time of day or which took several of the same values during the course of the whole study. Moreover, it was also assumed that the final database should consist of road sections for traffic factor less than 1.2 constitute at least 10% of all results. Thus, the sections with no tendency to become congested and characterized by a small number of road traffic users were eliminated.Good luck with your research!Igor Betkier, PhD
The EDGAR log file data set provides information on internet search traffic for EDGAR filings through SEC.gov. The data sets contain information extracted from log files from the EDGAR Archive on SEC.gov, and the information can be used to infer user access statistics.
The current version of this dataset covers search traffic from January 1, 2014 through December 31, 2016.
Due to the substantial volume of the raw EDGAR Log Files data set, we (Stanford GSB) implemented a series of transformations aimed at reducing its size while retaining essential information needed for research. Below is a summary of the modifications applied to the raw data, resulting in the four tables currently available in this Redivis dataset:
raw_single_day_per_year
:
%3C!-- --%3E
aggregated_{YEAR}
:
code
of value '200' with doc/extention
values ending in htm
, txt
, xml
, pdf
, sgml
, html
, or xsd
%3C!-- --%3E
%3Cstrong%3Ecik%3C/strong%3E
, %3Cstrong%3Etime%3C/strong%3E
, %3Cstrong%3Eidx%3C/strong%3E
, %3Cstrong%3Esize%3C/strong%3E
, and **%3Cstrong%3Ebrowser%3C/strong%3E
. Our reasoning for removal of these fields: cik
can be obtained through merging with our EDGAR Filings dataset using accession
; idx
shouldn't change over time for the same doc
can be manually recreated via transform of doc
; browser is NULL
in more than 99.99% of rows across logs and is fully NULL
for many dates; size
varies according to doc
which we have aggregated to reduce size; time
does not have a time zone specified and daily data granularity is likely sufficient for research purposes%3Cstrong%3Edoc_count%3C/strong%3E
to represent the number of times a IP viewed a filing each day while keeping the same browser metadata/parameters%3C!-- --%3E
raw_{YEAR}
:
%3C!-- --%3E
From the SEC Edgar Log Website:
%3C!-- --%3E
This file contains 5 years of daily time series data for several measures of traffic on a statistical forecasting teaching notes website whose alias is statforecasting.com. The variables have complex seasonality that is keyed to the day of the week and to the academic calendar. The patterns you you see here are similar in principle to what you would see in other daily data with day-of-week and time-of-year effects. Some good exercises are to develop a 1-day-ahead forecasting model, a 7-day ahead forecasting model, and an entire-next-week forecasting model (i.e., next 7 days) for unique visitors.
The variables are daily counts of page loads, unique visitors, first-time visitors, and returning visitors to an academic teaching notes website. There are 2167 rows of data spanning the date range from September 14, 2014, to August 19, 2020. A visit is defined as a stream of hits on one or more pages on the site on a given day by the same user, as identified by IP address. Multiple individuals with a shared IP address (e.g., in a computer lab) are considered as a single user, so real users may be undercounted to some extent. A visit is classified as "unique" if a hit from the same IP address has not come within the last 6 hours. Returning visitors are identified by cookies if those are accepted. All others are classified as first-time visitors, so the count of unique visitors is the sum of the counts of returning and first-time visitors by definition. The data was collected through a traffic monitoring service known as StatCounter.
This file and a number of other sample datasets can also be found on the website of RegressIt, a free Excel add-in for linear and logistic regression which I originally developed for use in the course whose website generated the traffic data given here. If you use Excel to some extent as well as Python or R, you might want to try it out on this dataset.