Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
The dataset contains information about web requests to a single website. It's a time series dataset, which means it tracks data over time, making it great for machine learning analysis.
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Explore our detailed website traffic dataset featuring key metrics like page views, session duration, bounce rate, traffic source, and conversion rates.
Facebook
TwitterWeb traffic statistics for the several City-Parish websites, brla.gov, city.brla.gov, Red Stick Ready, GIS, Open Data etc. Information provided by Google Analytics.
Facebook
TwitterAccording to research from SimilarWeb, 1DM+, the download manager app led the list of trending apps among the paid ones on the India Google Play Store as of June 2021. uTorrent Pro followed at rank six during the same time period.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset was used in the Kaggle Wikipedia Web Traffic forecasting competition. It contains 145063 daily time series representing the number of hits or web traffic for a set of Wikipedia pages from 2015-07-01 to 2017-09-10.
The original dataset contains missing values. They have been simply replaced by zeros.
Facebook
TwitterDaily utilization metrics for data.lacity.org and geohub.lacity.org. Updated monthly
Facebook
Twitterhttps://choosealicense.com/licenses/odc-by/https://choosealicense.com/licenses/odc-by/
🍷 FineWeb
15 trillion tokens of the finest data the 🌐 web has to offer
What is it?
The 🍷 FineWeb dataset consists of more than 18.5T tokens (originally 15T tokens) of cleaned and deduplicated english web data from CommonCrawl. The data processing pipeline is optimized for LLM performance and ran on the 🏭 datatrove library, our large scale data processing library. 🍷 FineWeb was originally meant to be a fully open replication of 🦅 RefinedWeb, with a release… See the full description on the dataset page: https://huggingface.co/datasets/HuggingFaceFW/fineweb.
Facebook
TwitterThis dataset was created by Merve Afranur ARTAR
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The dataset is a set of network traffic traces in pcap/csv format captured from a single user. The traffic is classified in 5 different activities (Video, Bulk, Idle, Web, and Interactive) and the label is shown in the filename. There is also a file (mapping.csv) with the mapping of the host's IP address, the csv/pcap filename and the activity label.
Activities:
Interactive: applications that perform real-time interactions in order to provide a suitable user experience, such as editing a file in google docs and remote CLI's sessions by SSH. Bulk data transfer: applications that perform a transfer of large data volume files over the network. Some examples are SCP/FTP applications and direct downloads of large files from web servers like Mediafire, Dropbox or the university repository among others. Web browsing: contains all the generated traffic while searching and consuming different web pages. Examples of those pages are several blogs and new sites and the moodle of the university. Vídeo playback: contains traffic from applications that consume video in streaming or pseudo-streaming. The most known server used are Twitch and Youtube but the university online classroom has also been used. Idle behaviour: is composed by the background traffic generated by the user computer when the user is idle. This traffic has been captured with every application closed and with some opened pages like google docs, YouTube and several web pages, but always without user interaction.
The capture is performed in a network probe, attached to the router that forwards the user network traffic, using a SPAN port. The traffic is stored in pcap format with all the packet payload. In the csv file, every non TCP/UDP packet is filtered out, as well as every packet with no payload. The fields in the csv files are the following (one line per packet): Timestamp, protocol, payload size, IP address source and destination, UDP/TCP port source and destination. The fields are also included as a header in every csv file.
The amount of data is stated as follows:
Bulk : 19 traces, 3599 s of total duration, 8704 MBytes of pcap files Video : 23 traces, 4496 s, 1405 MBytes Web : 23 traces, 4203 s, 148 MBytes Interactive : 42 traces, 8934 s, 30.5 MBytes Idle : 52 traces, 6341 s, 0.69 MBytes
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
YouTube flows
Facebook
Twitterhttps://choosealicense.com/licenses/odc-by/https://choosealicense.com/licenses/odc-by/
📀 Falcon RefinedWeb
Falcon RefinedWeb is a massive English web dataset built by TII and released under an ODC-By 1.0 license. See the 📓 paper on arXiv for more details. RefinedWeb is built through stringent filtering and large-scale deduplication of CommonCrawl; we found models trained on RefinedWeb to achieve performance in-line or better than models trained on curated datasets, while only relying on web data. RefinedWeb is also "multimodal-friendly": it contains links and alt… See the full description on the dataset page: https://huggingface.co/datasets/tiiuae/falcon-refinedweb.
Facebook
Twitterhttps://choosealicense.com/licenses/odc-by/https://choosealicense.com/licenses/odc-by/
📚 FineWeb-Edu
1.3 trillion tokens of the finest educational data the 🌐 web has to offer
Paper: https://arxiv.org/abs/2406.17557
What is it?
📚 FineWeb-Edu dataset consists of 1.3T tokens and 5.4T tokens (FineWeb-Edu-score-2) of educational web pages filtered from 🍷 FineWeb dataset. This is the 1.3 trillion version. To enhance FineWeb's quality, we developed an educational quality classifier using annotations generated by LLama3-70B-Instruct. We then… See the full description on the dataset page: https://huggingface.co/datasets/HuggingFaceFW/fineweb-edu.
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This application is intended for informational purposes only and is not an operational product. The tool provides the capability to access, view and interact with satellite imagery, and shows the latest view of Earth as it appears from space.For additional imagery from NOAA's GOES East and GOES West satellites, please visit our Imagery and Data page or our cooperative institute partners at CIRA and CIMSS.This website should not be used to support operational observation, forecasting, emergency, or disaster mitigation operations, either public or private. In addition, we do not provide weather forecasts on this site — that is the mission of the National Weather Service. Please contact them for any forecast questions or issues. Using the MapsWhat does the Layering Options icon mean?The Layering Options widget provides a list of operational layers and their symbols, and allows you to turn individual layers on and off. The order in which layers appear in this widget corresponds to the layer order in the map. The top layer ‘checked’ will indicate what you are viewing in the map, and you may be unable to view the layers below.Layers with expansion arrows indicate that they contain sublayers or subtypes.What does the Time Slider icon do?The Time Slider widget enables you to view temporal layers in a map, and play the animation to see how the data changes over time. Using this widget, you can control the animation of the data with buttons to play and pause, go to the previous time period, and go to the next time period.Do these maps work on mobile devices and different browsers?Yes!Why are there black stripes / missing data on the map?NOAA Satellite Maps is for informational purposes only and is not an operational product; there are times when data is not available.Why does the imagery load slowly?This map viewer does not load pre-generated web-ready graphics and animations like many satellite imagery apps you may be used to seeing. Instead, it downloads geospatial data from our data servers through a Map Service, and the app in your browser renders the imagery in real-time. Each pixel needs to be rendered and geolocated on the web map for it to load.How can I get the raw data and download the GIS World File for the images I choose?The geospatial data Map Service for the NOAA Satellite Maps GOES satellite imagery is located on our Satellite Maps ArcGIS REST Web Service ( available here ).We support open information sharing and integration through this RESTful Service, which can be used by a multitude of GIS software packages and web map applications (both open and licensed).Data is for display purposes only, and should not be used operationally.Are there any restrictions on using this imagery?NOAA supports an open data policy and we encourage publication of imagery from NOAA Satellite Maps; when doing so, please cite it as "NOAA" and also consider including a permalink (such as this one) to allow others to explore the imagery.For acknowledgment in scientific journals, please use:We acknowledge the use of imagery from the NOAA Satellite Maps application: LINKThis imagery is not copyrighted. You may use this material for educational or informational purposes, including photo collections, textbooks, public exhibits, computer graphical simulations and internet web pages. This general permission extends to personal web pages. About this satellite imageryWhat am I looking at in these maps?In this map you are seeing the past 24 hours (updated approximately every 10 minutes) of the Western Hemisphere and Pacific Ocean, as seen by the NOAA GOES East (GOES-16) and GOES West (GOES-18) satellites. In this map you can also view four different ‘layers’. The views show ‘GeoColor’, ‘infrared’, and ‘water vapor’.This maps shows the coverage area of the GOES East and GOES West satellites. GOES East, which orbits the Earth from 75.2 degrees west longitude, provides a continuous view of the Western Hemisphere, from the West Coast of Africa to North and South America. GOES West, which orbits the Earth at 137.2 degrees west longitude, sees western North and South America and the central and eastern Pacific Ocean all the way to New Zealand.What does the GOES GeoColor imagery show?The 'Merged GeoColor’ map shows the coverage area of the GOES East and GOES West satellites and includes the entire Western Hemisphere and most of the Pacific Ocean. This imagery uses a combination of visible and infrared channels and is updated approximately every 15 minutes in real time. GeoColor imagery approximates how the human eye would see Earth from space during daylight hours, and is created by combining several of the spectral channels from the Advanced Baseline Imager (ABI) – the primary instrument on the GOES satellites. The wavelengths of reflected sunlight from the red and blue portions of the spectrum are merged with a simulated green wavelength component, creating RGB (red-green-blue) imagery. At night, infrared imagery shows high clouds as white and low clouds and fog as light blue. The static city lights background basemap is derived from a single composite image from the Visible Infrared Imaging Radiometer Suite (VIIRS) Day Night Band. For example, temporary power outages will not be visible. Learn more.What does the GOES infrared map show?The 'GOES infrared' map displays heat radiating off of clouds and the surface of the Earth and is updated every 15 minutes in near real time. Higher clouds colorized in orange often correspond to more active weather systems. This infrared band is one of 12 channels on the Advanced Baseline Imager, the primary instrument on both the GOES East and West satellites. on the GOES the multiple GOES East ABI sensor’s infrared bands, and is updated every 15 minutes in real time. Infrared satellite imagery can be "colorized" or "color-enhanced" to bring out details in cloud patterns. These color enhancements are useful to meteorologists because they signify “brightness temperatures,” which are approximately the temperature of the radiating body, whether it be a cloud or the Earth’s surface. In this imagery, yellow and orange areas signify taller/colder clouds, which often correlate with more active weather systems. Blue areas are usually “clear sky,” while pale white areas typically indicate low-level clouds. During a hurricane, cloud top temperatures will be higher (and colder), and therefore appear dark red. This imagery is derived from band #13 on the GOES East and GOES West Advanced Baseline Imager.How does infrared satellite imagery work?The infrared (IR) band detects radiation that is emitted by the Earth’s surface, atmosphere and clouds, in the “infrared window” portion of the spectrum. The radiation has a wavelength near 10.3 micrometers, and the term “window” means that it passes through the atmosphere with relatively little absorption by gases such as water vapor. It is useful for estimating the emitting temperature of the Earth’s surface and cloud tops. A major advantage of the IR band is that it can sense energy at night, so this imagery is available 24 hours a day.What do the colors on the infrared map represent?In this imagery, yellow and orange areas signify taller/colder clouds, which often correlate with more active weather systems. Blue areas are clear sky, while pale white areas indicate low-level clouds, or potentially frozen surfaces. Learn more about this weather imagery.What does the GOES water vapor map layer show?The GOES ‘water vapor’ map displays the concentration and location of clouds and water vapor in the atmosphere and shows data from both the GOES East and GOES West satellites. Imagery is updated approximately every 15 minutes in real time. Water vapor imagery, which is useful for determining locations of moisture and atmospheric circulations, is created using a wavelength of energy sensitive to the content of water vapor in the atmosphere. In this imagery, green-blue and white areas indicate the presence of high water vapor or moisture content, whereas dark orange and brown areas indicate little or no moisture present. This imagery is derived from band #10 on the GOES East and GOES West Advanced Baseline Imager.What do the colors on the water vapor map represent?In this imagery, green-blue and white areas indicate the presence of high water vapor or moisture content, whereas dark orange and brown areas indicate less moisture present. Learn more about this water vapor imagery.About the satellitesWhat are the GOES satellites?NOAA’s most sophisticated Geostationary Operational Environmental Satellites (GOES), known as the GOES-R Series, provide advanced imagery and atmospheric measurements of Earth’s Western Hemisphere, real-time mapping of lightning activity, and improved monitoring of solar activity and space weather.The first satellite in the series, GOES-R, now known as GOES-16, was launched in 2016 and is currently operational as NOAA’s GOES East satellite. In 2018, NOAA launched another satellite in the series, GOES-T, which joined GOES-16 in orbit as GOES-18. GOES-17 became operational as GOES West in January 2023.Together, GOES East and GOES West provide coverage of the Western Hemisphere and most of the Pacific Ocean, from the west coast of Africa all the way to New Zealand. Each satellite orbits the Earth from about 22,200 miles away.
Facebook
Twitterhttps://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
The Repository Analytics and Metrics Portal (RAMP) is a web service that aggregates use and performance use data of institutional repositories. The data are a subset of data from RAMP, the Repository Analytics and Metrics Portal (http://rampanalytics.org), consisting of data from all participating repositories for the calendar year 2018. For a description of the data collection, processing, and output methods, please see the "methods" section below. Note that the RAMP data model changed in August, 2018 and two sets of documentation are provided to describe data collection and processing before and after the change.
Methods
RAMP Data Documentation – January 1, 2017 through August 18, 2018
Data Collection
RAMP data were downloaded for participating IR from Google Search Console (GSC) via the Search Console API. The data consist of aggregated information about IR pages which appeared in search result pages (SERP) within Google properties (including web search and Google Scholar).
Data from January 1, 2017 through August 18, 2018 were downloaded in one dataset per participating IR. The following fields were downloaded for each URL, with one row per URL:
url: This is returned as a 'page' by the GSC API, and is the URL of the page which was included in an SERP for a Google property.
impressions: The number of times the URL appears within the SERP.
clicks: The number of clicks on a URL which took users to a page outside of the SERP.
clickThrough: Calculated as the number of clicks divided by the number of impressions.
position: The position of the URL within the SERP.
country: The country from which the corresponding search originated.
device: The device used for the search.
date: The date of the search.
Following data processing describe below, on ingest into RAMP an additional field, citableContent, is added to the page level data.
Note that no personally identifiable information is downloaded by RAMP. Google does not make such information available.
More information about click-through rates, impressions, and position is available from Google's Search Console API documentation: https://developers.google.com/webmaster-tools/search-console-api-original/v3/searchanalytics/query and https://support.google.com/webmasters/answer/7042828?hl=en
Data Processing
Upon download from GSC, data are processed to identify URLs that point to citable content. Citable content is defined within RAMP as any URL which points to any type of non-HTML content file (PDF, CSV, etc.). As part of the daily download of statistics from Google Search Console (GSC), URLs are analyzed to determine whether they point to HTML pages or actual content files. URLs that point to content files are flagged as "citable content." In addition to the fields downloaded from GSC described above, following this brief analysis one more field, citableContent, is added to the data which records whether each URL in the GSC data points to citable content. Possible values for the citableContent field are "Yes" and "No."
Processed data are then saved in a series of Elasticsearch indices. From January 1, 2017, through August 18, 2018, RAMP stored data in one index per participating IR.
About Citable Content Downloads
Data visualizations and aggregations in RAMP dashboards present information about citable content downloads, or CCD. As a measure of use of institutional repository content, CCD represent click activity on IR content that may correspond to research use.
CCD information is summary data calculated on the fly within the RAMP web application. As noted above, data provided by GSC include whether and how many times a URL was clicked by users. Within RAMP, a "click" is counted as a potential download, so a CCD is calculated as the sum of clicks on pages/URLs that are determined to point to citable content (as defined above).
For any specified date range, the steps to calculate CCD are:
Filter data to only include rows where "citableContent" is set to "Yes."
Sum the value of the "clicks" field on these rows.
Output to CSV
Published RAMP data are exported from the production Elasticsearch instance and converted to CSV format. The CSV data consist of one "row" for each page or URL from a specific IR which appeared in search result pages (SERP) within Google properties as described above.
The data in these CSV files include the following fields:
url: This is returned as a 'page' by the GSC API, and is the URL of the page which was included in an SERP for a Google property.
impressions: The number of times the URL appears within the SERP.
clicks: The number of clicks on a URL which took users to a page outside of the SERP.
clickThrough: Calculated as the number of clicks divided by the number of impressions.
position: The position of the URL within the SERP.
country: The country from which the corresponding search originated.
device: The device used for the search.
date: The date of the search.
citableContent: Whether or not the URL points to a content file (ending with pdf, csv, etc.) rather than HTML wrapper pages. Possible values are Yes or No.
index: The Elasticsearch index corresponding to page click data for a single IR.
repository_id: This is a human readable alias for the index and identifies the participating repository corresponding to each row. As RAMP has undergone platform and version migrations over time, index names as defined for the index field have not remained consistent. That is, a single participating repository may have multiple corresponding Elasticsearch index names over time. The repository_id is a canonical identifier that has been added to the data to provide an identifier that can be used to reference a single participating repository across all datasets. Filtering and aggregation for individual repositories or groups of repositories should be done using this field.
Filenames for files containing these data follow the format 2018-01_RAMP_all.csv. Using this example, the file 2018-01_RAMP_all.csv contains all data for all RAMP participating IR for the month of January, 2018.
Data Collection from August 19, 2018 Onward
RAMP data are downloaded for participating IR from Google Search Console (GSC) via the Search Console API. The data consist of aggregated information about IR pages which appeared in search result pages (SERP) within Google properties (including web search and Google Scholar).
Data are downloaded in two sets per participating IR. The first set includes page level statistics about URLs pointing to IR pages and content files. The following fields are downloaded for each URL, with one row per URL:
url: This is returned as a 'page' by the GSC API, and is the URL of the page which was included in an SERP for a Google property.
impressions: The number of times the URL appears within the SERP.
clicks: The number of clicks on a URL which took users to a page outside of the SERP.
clickThrough: Calculated as the number of clicks divided by the number of impressions.
position: The position of the URL within the SERP.
date: The date of the search.
Following data processing describe below, on ingest into RAMP a additional field, citableContent, is added to the page level data.
The second set includes similar information, but instead of being aggregated at the page level, the data are grouped based on the country from which the user submitted the corresponding search, and the type of device used. The following fields are downloaded for combination of country and device, with one row per country/device combination:
country: The country from which the corresponding search originated.
device: The device used for the search.
impressions: The number of times the URL appears within the SERP.
clicks: The number of clicks on a URL which took users to a page outside of the SERP.
clickThrough: Calculated as the number of clicks divided by the number of impressions.
position: The position of the URL within the SERP.
date: The date of the search.
Note that no personally identifiable information is downloaded by RAMP. Google does not make such information available.
More information about click-through rates, impressions, and position is available from Google's Search Console API documentation: https://developers.google.com/webmaster-tools/search-console-api-original/v3/searchanalytics/query and https://support.google.com/webmasters/answer/7042828?hl=en
Data Processing
Upon download from GSC, the page level data described above are processed to identify URLs that point to citable content. Citable content is defined within RAMP as any URL which points to any type of non-HTML content file (PDF, CSV, etc.). As part of the daily download of page level statistics from Google Search Console (GSC), URLs are analyzed to determine whether they point to HTML pages or actual content files. URLs that point to content files are flagged as "citable content." In addition to the fields downloaded from GSC described above, following this brief analysis one more field, citableContent, is added to the page level data which records whether each page/URL in the GSC data points to citable content. Possible values for the citableContent field are "Yes" and "No."
The data aggregated by the search country of origin and device type do not include URLs. No additional processing is done on these data. Harvested data are passed directly into Elasticsearch.
Processed data are then saved in a series of Elasticsearch indices. Currently, RAMP stores data in two indices per participating IR. One index includes the page level data, the second index includes the country of origin and device type data.
About Citable Content Downloads
Data visualizations and aggregations in RAMP dashboards present information about citable content downloads, or CCD. As a measure of use of institutional repository
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
WebSRC v1.0
WebSRC v1.0 is a dataset for reading comprehension on structural web pages. The task is to answer questions about web pages, which requires a system to have a comprehensive understanding of the spatial structure and logical structure. WebSRC consists of 6.4K web pages and 400K question-answer pairs about web pages. For each web page, we manually chose one segment from it and saved the corresponding HTML code, screenshot, and metadata like positions and sizes. Questions… See the full description on the dataset page: https://huggingface.co/datasets/X-LANCE/WebSRC_v1.0.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F16731800%2F3c26d0b5fbfc532a418f5931aee9db20%2FDesign%20sem%20nome.png?generation=1707340135796174&alt=media" alt="">
The process of extracting and analyzing supermarket data involves an intricate series of steps, from web scraping product details directly from the websites of leading supermarkets like Aldi, ASDA, Morrisons, Sainsbury's, and Tesco, to processing and analyzing this data for actionable insights. This comprehensive approach leverages Python's powerful libraries such as Pandas for data manipulation, Selenium for web scraping, and Urllib.3 for URL handling, ensuring a robust data extraction foundation.
Web scraping is the first critical step in this process. Customized functions are developed for each supermarket to systematically navigate through their web pages, extract essential product information like names, prices, price per unit, and images, and handle various common exceptions gracefully. This meticulous data collection is structured to restart automatically in case of any hitches, ensuring no data loss and maintaining the integrity of the extraction process.
Once the data is scraped, it undergoes a detailed processing phase. This involves consolidating the collected information into unified datasets, performing spatial joins to align data accurately, and applying category simplification for better analysis. Notably, for supermarkets like Tesco, additional steps are taken to incorporate Clubcard data, ensuring the most competitive prices are captured. This phase is critical for preparing the data for in-depth analysis by cleaning, structuring, and ensuring it is comprehensive.
Quality assurance plays a pivotal role throughout the process. A dedicated data quality script scrutinizes the extracted data for discrepancies, checks the completeness of the web scraping effort, and validates the processed data for any null values or inconsistencies. This step is crucial for ensuring the reliability of the data before it moves to the analysis stage.
The analysis of the data is multifaceted, focusing on pricing strategies, brand popularity, and product categorization. Through the use of tables, graphs, word clouds, and treemaps, the analysis reveals insights into pricing patterns, brand preferences, and category distributions. Additionally, a recommender system based on Singular Value Decomposition (SVD) enhances the analysis by providing personalized product recommendations, demonstrating the application of advanced machine learning techniques in understanding customer preferences.
Moreover, the analysis extends to price comparisons using TF-ID matrices and examines pricing psychology to uncover tactics used in product pricing. This nuanced analysis offers a deep dive into how pricing strategies might be influenced by psychological factors, competitive pressures, or inflation.
An interesting aspect of the analysis is monitoring price changes over time, which involves calculating average prices per category on a weekly basis and analyzing the percentage changes. This dynamic view of pricing helps in understanding market trends and making informed decisions.
Finally, the culmination of this extensive process is the deployment of the application to the cloud via Streamlit, facilitated through GitHub. This deployment not only makes the application accessible but also showcases the integration of various components into a streamlined, user-friendly interface.
In summary, the end-to-end process of web scraping, data processing, and analysis of supermarket data is a comprehensive effort that combines technical prowess with analytical insight. It underscores the power of Python in handling complex data tasks, the importance of data quality in analytical projects, and the potential of data analysis in unveiling market trends and consumer preferences, all while ensuring accessibility through cloud deployment. This meticulous approach not only aids in strategic decision-making but also sets a precedent for the application of data science in the retail industry.
Facebook
TwitterHtml format from the website
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F3925613%2Fd54c1b1057b7d00e4fbd490e6d043cfd%2Fx.png?generation=1579235092627859&alt=media" alt="">
Expected output
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F3925613%2F433b21f96b5774fb3e4803bbd1578910%2FScreen%20Shot%202563-01-17%20at%2011.13.24.png?generation=1579235019747328&alt=media" alt="">
What's inside is more than just rows and columns. Make it easy for others to get started by describing how you acquired the data and what time period it represents, too.
Your data will be in front of the world's largest data science community. What questions do you want to see answered?
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
The dataset contains information about web requests to a single website. It's a time series dataset, which means it tracks data over time, making it great for machine learning analysis.