The global number of internet users in was forecast to continuously increase between 2024 and 2029 by in total 1.3 billion users (+23.66 percent). After the fifteenth consecutive increasing year, the number of users is estimated to reach 7 billion users and therefore a new peak in 2029. Notably, the number of internet users of was continuously increasing over the past years.Depicted is the estimated number of individuals in the country or region at hand, that use the internet. As the datasource clarifies, connection quality and usage frequency are distinct aspects, not taken into account here.The shown data are an excerpt of Statista's Key Market Indicators (KMI). The KMI are a collection of primary and secondary indicators on the macro-economic, demographic and technological environment in up to 150 countries and regions worldwide. All indicators are sourced from international and national statistical offices, trade associations and the trade press and they are processed to generate comparable data sets (see supplementary notes under details for more information).Find more key insights for the number of internet users in countries like the Americas and Asia.
When asked about "Attitudes towards the internet", most Mexican respondents pick "It is important to me to have mobile internet access in any place" as an answer. 56 percent did so in our online survey in 2025. Looking to gain valuable insights about users of internet providers worldwide? Check out our reports on consumers who use internet providers. These reports give readers a thorough picture of these customers, including their identities, preferences, opinions, and methods of communication.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The World Wide Web is a complex interconnected digital ecosystem, where information and attention flow between platforms and communities throughout the globe. These interactions co-construct how we understand the world, reflecting and shaping public discourse. Unfortunately, researchers often struggle to understand how information circulates and evolves across the web because platform-specific data is often siloed and restricted by linguistic barriers. To address this gap, we present a comprehensive, multilingual dataset capturing all Wikipedia links shared in posts and comments on Reddit from 2020 to 2023, excluding those from private and NSFW subreddits. Each linked Wikipedia article is enriched with revision history, page view data, article ID, redirects, and Wikidata identifiers. Through a research agreement with Reddit, our dataset ensures user privacy while providing a query and ID mechanism that integrates with the Reddit and Wikipedia APIs. This enables extended analyses for researchers studying how information flows across platforms. For example, Reddit discussions use Wikipedia for deliberation and fact-checking which subsequently influences Wikipedia content, by driving traffic to articles or inspiring edits. By analyzing the relationship between information shared and discussed on these platforms, our dataset provides a foundation for examining the interplay between social media discourse and collaborative knowledge consumption and production.
The motivations for this dataset stem from the challenges researchers face in studying the flow of information across the web. While the World Wide Web enables global communication and collaboration, data silos, linguistic barriers, and platform-specific restrictions hinder our ability to understand how information circulates, evolves, and impacts public discourse. Wikipedia and Reddit, as major hubs of knowledge sharing and discussion, offer an invaluable lens into these processes. However, without comprehensive data capturing their interactions, researchers are unable to fully examine how platforms co-construct knowledge. This dataset bridges this gap, providing the tools needed to study the interconnectedness of social media and collaborative knowledge systems.
WikiReddit, a comprehensive dataset capturing all Wikipedia mentions (including links) shared in posts and comments on Reddit from 2020 to 2023, excluding those from private and NSFW (not safe for work) subreddits. The SQL database comprises 336K total posts, 10.2M comments, 1.95M unique links, and 1.26M unique articles spanning 59 languages on Reddit and 276 Wikipedia language subdomains. Each linked Wikipedia article is enriched with its revision history and page view data within a ±10-day window of its posting, as well as article ID, redirects, and Wikidata identifiers. Supplementary anonymous metadata from Reddit posts and comments further contextualizes the links, offering a robust resource for analysing cross-platform information flows, collective attention dynamics, and the role of Wikipedia in online discourse.
Data was collected from the Reddit4Researchers and Wikipedia APIs. No personally identifiable information is published in the dataset. Data from Reddit to Wikipedia is linked via the hyperlink and article titles appearing in Reddit posts.
Extensive processing with tools such as regex was applied to the Reddit post/comment text to extract the Wikipedia URLs. Redirects for Wikipedia URLs and article titles were found through the API and mapped to the collected data. Reddit IDs are hashed with SHA-256 for post/comment/user/subreddit anonymity.
We foresee several applications of this dataset and preview four here. First, Reddit linking data can be used to understand how attention is driven from one platform to another. Second, Reddit linking data can shed light on how Wikipedia's archive of knowledge is used in the larger social web. Third, our dataset could provide insights into how external attention is topically distributed across Wikipedia. Our dataset can help extend that analysis into the disparities in what types of external communities Wikipedia is used in, and how it is used. Fourth, relatedly, a topic analysis of our dataset could reveal how Wikipedia usage on Reddit contributes to societal benefits and harms. Our dataset could help examine if homogeneity within the Reddit and Wikipedia audiences shapes topic patterns and assess whether these relationships mitigate or amplify problematic engagement online.
The dataset is publicly shared with a Creative Commons Attribution 4.0 International license. The article describing this dataset should be cited: https://doi.org/10.48550/arXiv.2502.04942
Patrick Gildersleve will maintain this dataset, and add further years of content as and when available.
posts
Column Name | Type | Description |
---|---|---|
subreddit_id | TEXT | The unique identifier for the subreddit. |
crosspost_parent_id | TEXT | The ID of the original Reddit post if this post is a crosspost. |
post_id | TEXT | Unique identifier for the Reddit post. |
created_at | TIMESTAMP | The timestamp when the post was created. |
updated_at | TIMESTAMP | The timestamp when the post was last updated. |
language_code | TEXT | The language code of the post. |
score | INTEGER | The score (upvotes minus downvotes) of the post. |
upvote_ratio | REAL | The ratio of upvotes to total votes. |
gildings | INTEGER | Number of awards (gildings) received by the post. |
num_comments | INTEGER | Number of comments on the post. |
comments
Column Name | Type | Description |
---|---|---|
subreddit_id | TEXT | The unique identifier for the subreddit. |
post_id | TEXT | The ID of the Reddit post the comment belongs to. |
parent_id | TEXT | The ID of the parent comment (if a reply). |
comment_id | TEXT | Unique identifier for the comment. |
created_at | TIMESTAMP | The timestamp when the comment was created. |
last_modified_at | TIMESTAMP | The timestamp when the comment was last modified. |
score | INTEGER | The score (upvotes minus downvotes) of the comment. |
upvote_ratio | REAL | The ratio of upvotes to total votes for the comment. |
gilded | INTEGER | Number of awards (gildings) received by the comment. |
postlinks
Column Name | Type | Description |
---|---|---|
post_id | TEXT | Unique identifier for the Reddit post. |
end_processed_valid | INTEGER | Whether the extracted URL from the post resolves to a valid URL. |
end_processed_url | TEXT | The extracted URL from the Reddit post. |
final_valid | INTEGER | Whether the final URL from the post resolves to a valid URL after redirections. |
final_status | INTEGER | HTTP status code of the final URL. |
final_url | TEXT | The final URL after redirections. |
redirected | INTEGER | Indicator of whether the posted URL was redirected (1) or not (0). |
in_title | INTEGER | Indicator of whether the link appears in the post title (1) or post body (0). |
commentlinks
Column Name | Type | Description |
---|---|---|
comment_id | TEXT | Unique identifier for the Reddit comment. |
end_processed_valid | INTEGER | Whether the extracted URL from the comment resolves to a valid URL. |
end_processed_url | TEXT | The extracted URL from the comment. |
final_valid | INTEGER | Whether the final URL from the comment resolves to a valid URL after redirections. |
final_status | INTEGER | HTTP status code of the final |
http://www.gnu.org/licenses/old-licenses/gpl-2.0.en.htmlhttp://www.gnu.org/licenses/old-licenses/gpl-2.0.en.html
This is a huge dataset that contains every web series around the globe streaming right now at the date of the creation of the dataset.
This dataset can be used to answer the following questions: - Which streaming platform(s) can I find this web series on? - Average IMDb rating and other ratings - What is the genre of the title? - What is the synopsis? - How many seasons are there right now? - Which year this was produced?
When asked about "Attitudes towards the internet", most Japanese respondents pick "I'm concerned that my data is being misused on the internet" as an answer. 35 percent did so in our online survey in 2025. Looking to gain valuable insights about users of internet providers worldwide? Check out our reports on consumers who use internet providers. These reports give readers a thorough picture of these customers, including their identities, preferences, opinions, and methods of communication.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The DNS over HTTPS (DoH) is becoming a default option for domain resolution in modern privacy-aware software. Therefore, research has already focused on various aspects; however, a comprehensive dataset from an actual production network is still missing. In this paper, we present a novel dataset, which comprises multiple PCAP files of DoH traffic. The captured traffic is generated towards various DoH providers to cover differences of various DoH server implementations and configurations. In addition to generated traffic, we also provide real network traffic captured on high-speed backbone lines of a large Internet Service Provider with around half a million users. Network identifiers (excluding network identifiers of DoH resolvers) in the real network traffic (e.g., IP addresses and transmitted content) were anonymized, but still, the important characteristics of the traffic can still be obtained from the data that can be used, e.g., for network traffic classification research. The real network traffic dataset contains DoH and also non-DoH HTTPS traffic as observed at the collection points in the network.
This repository provides supplementary files for the "Collection of Datasets with DNS over HTTPS Traffic" :
─── supplementary_files | - Directory with supplementary files (scripts, DoH resolver list) used for dataset creation ├── chrome | - Generation scripts for Chrome browser and visited websites during generation ├── doh_resolvers | - The list of DoH resolvers used for filter creation during ISP backbone capture ├── firefox | - Generation scripts for Firefox browser and visited websites during generation └── pcap-anonymizer | - Anonymization script of real backbone captures
Collection of datasets:
DoH-Gen-F-AABBC --- https://doi.org/10.5281/zenodo.5957277
DoH-Gen-F-FGHOQS --- https://doi.org/10.5281/zenodo.5957121
DoH-Gen-F-CCDDD --- https://doi.org/10.5281/zenodo.5957420
DoH-Gen-C-AABBCC --- https://doi.org/10.5281/zenodo.5957465
DoH-Gen-C-DDD -- https://doi.org/10.5281/zenodo.5957676
DoH-Gen-C-CFGHOQS --- https://doi.org/10.5281/zenodo.5957659
DoH-Real-world --- https://doi.org/10.5281/zenodo.5956043
http://www.gnu.org/licenses/lgpl-3.0.htmlhttp://www.gnu.org/licenses/lgpl-3.0.html
The dataset has been introduced by the below-mentioned researches: E. C. P. Neto, S. Dadkhah, R. Ferreira, A. Zohourian, R. Lu, A. A. Ghorbani. "CICIoT2023: A real-time dataset and benchmark for large-scale attacks in IoT environment," Sensor (2023) – (submitted to Journal of Sensors). The present data contains different kinds of IoT intrusions. The categories of the IoT intrusions enlisted in the data are as follows: DDoS Brute Force Spoofing DoS Recon Web-based Mirai
There are several subcategories are present in the data for each kind of intrusion types in the IoT. The dataset contains 1191264 instances of network for intrusions and 47 features of each of the intrusions. The dataset can be used to prepare the predictive model through which different kind of intrusive attacks can be detected. The data is also suitable for designing the IDS system.
IMDB (Internet Movie Database) is one of the most popular web sites, or databases, about movies, TV shows and similar. IMDB's Top 250 lists also important feature for considering good movies. Rankings are calculated with users' votes. For more IMDB's pollmaster account shares previous years IMDB Top 250 lists. Top 250 lists changes all the time, so that the lists are created for December 31st, midnight PST of that year.
This dataset contains IMDB Top 250 lists from 1996 to 2020 with every movie's basic information; release year, ranking, score, stars, etc.
This data scraped from IMDB, and you can reach scraping part from here
Time travel... You can look into lists for last 25 years. Analyze best movies for voters, genre preferences, most successful directors, stars, ranking changings over time et cetera. There are lots of things to do. Be creative and visualize them.
The Southwest Alaska Network (SWAN) monitors freshwater systems in five national park units: Alagnak Wild River (ALAG), Aniakchak National Monument and Preserve (ANIA), Katmai National Park and Preserve (KATM), Kenai Fjords National Park (KEFJ), and Lake Clark National Park and Preserve (LACL). The metadata included in this file supplement water quality data from vertical lake profiles taken between 2008 and 2023 in KATM, LACL, and KEFJ. Vertical lake profile sampling was initiated in high priority “Tier 1” lakes in LACL in 2008 and expanded to other parks and lakes in subsequent years. At present, there are 105 total sites. Of the 31 Tier 1 sites in LACL, there are ten sites in each of three basins of Lake Clark and one on Kijik Lake. Of the 41 Tier 1 sites in KATM, there are ten sites at each of four basins of Naknek Lake and one on Lake Brooks. There are 33 lower priority “Tier 2” sites: 15 in LACL, 16 in KATM, and two in KEFJ. Tier 1 lakes are sampled annually in LACL and KATM, while Tier 2 lakes are sampled less frequently on a rotating basis. Vertical lake profiles are taken during the mid-summer index period when thermal stratification is most pronounced. Profiles are conducted using a water quality meter (or “sonde”) that simultaneously records multiple parameters: temperature, pH, dissolved oxygen, specific conductivity, and, beginning in 2014, turbidity. Sampling is conducted at 15 depth categories to a maximum depth of 50 m. Specifically, data are recorded at every meter from 0 to 5 m depth, and then at 5 m increments between 5 m and 50 m or the lake bottom, whichever is reached first. This method creates a vertical profile of each water quality parameter at each site. Additional measures of water clarity via Secchi disk are made in conjunction with each vertical lake profile. Other metadata include specifics about personnel, _location, timing, equipment, and weather. The vertical lake profile data that accompany the metadata in this file can be accessed and downloaded from the public AQUARIUS Water Data Portal (https://irma.nps.gov/aqwebportal/).
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
We present a dataset targeting a large set of popular pages (Alexa top-500), from probes from several ISPs networks, browsers software (Chrome, Firefox) and viewport combinations, for over 200,000 experiments realized in 2019.We purposely collect two distinct sets with two different tools, namely Web Page Test (WPT) and Web View (WV), varying a number of relevant parameters and conditions, for a total of 200K+ web sessions, roughly equally split among WV and WPT. Our dataset comprises variations in terms of geographical coverage, scale, diversity and representativeness (location, targets, protocol, browser, viewports, metrics).For Web Page Test, we used the online service www.webpagetest.org at different locations worldwide (Europe, Asia, USA) and private WPT instances in three locations in China (Beijing, Shanghai, Dongguan). The list of target URLs comprised the main pages and five random subpages from Alexa top-500 worldwide and China. We varied network conditions : native connections and 4G, FIOS, 3GFast, DSL, and custom shaping/loss conditions. The other elements in the configuration were fixed: Chrome browser on desktop with a fixed screen resolution, HTTP/2 protocol and IPv4.For Web View, we collected experiments from three machines located in France. We selected two versions of two browser families (Chrome 75/77, Firefox 63/68), two screen sizes (1920x1080, 1440x900), and employ different browser configurations (one half of the experiments activate the AdBlock plugin) from two different access technologies (fiber and ADSL). From a protocol standpoint, we used both IPv4 and IPv6, with HTTP/2 and QUIC, and performed repeated experiments with cached objects/DNS. Given the settings diversity, we restricted the number of websites to about 50 among the Alexa top-500 websites, to ensure statistical relevance of the collected samples for each page.The two archives IFIPNetworking2020_WebViewOrange.zip
and IFIPNetworking2020_Webpagetest.zip
correspond respectively to the Web View experiments and to the Web Page Test experiments.Each archive contains three files:- config.csv
: Description of parameters and conditions for each run,- metrics.csv
: Value of different metrics collected by the browser,- progressionCurves.csv
: Progression curves of the bytes progress as seen by the network, from 0 to 10 seconds by steps of 100 milliseconds,- listUrl
folder: Indexes the sets of urls.Regarding config.csv
, the columns are: - index: Index for this set of conditions, - location: Location of the machine, - listUrl: List of urls, located in the folder listUrl - browserUsed: Internet browser and version - terminal: Desktop or Mobile - collectionEnvironment: Identification of the collection environment - networkConditionsTrafficShaping (WPT only): Whether native condition or traffic shaping (4G, FIOS, 3GFast, DSL, or custom Emulator conditions) - networkConditionsBandwidth (WPT only): Bandwidth of the network - networkConditionsDelay (WPT only): Delay in the network - networkConditions (WV only): network conditions - ipMode (WV only): requested L3 protocol, - requestedProtocol (WV only): requested L7 protocol - adBlocker (WV only): Whether adBlocker is used or not - winSize (WV only): Window sizeRegarding metrics.csv
, the columns are: - id: Unique identification of an experiment (consisting of an index 'set of conditions' and an index 'current page') - DOM Content Loaded Event End (ms): DOM time, - First Paint (ms) (WV only): First paint time, - Load Event End (ms): Page Load Time from W3C, - RUM Speed Index (ms) (WV only): RUM Speed Index, - Speed Index (ms) (WPT only): Speed Index, - Time for Full Visual Rendering (ms) (WV only): Time for Full Visual Rendering - Visible portion (%) (WV only): Visible portion, - Time to First Byte (ms) (WPT only): Time to First Byte, - Visually Complete (ms) (WPT only): Visually Complete used to compute the Speed Index, - aatf: aatf using ATF-chrome-plugin - bi_aatf: bi_aatf using ATF-chrome-plugin - bi_plt: bi_plt using ATF-chrome-plugin - dom: dom using ATF-chrome-plugin - ii_aatf: ii_aatf using ATF-chrome-plugin - ii_plt: ii_plt using ATF-chrome-plugin - last_css: last_css using ATF-chrome-plugin - last_img: last_img using ATF-chrome-plugin - last_js: last_js using ATF-chrome-plugin - nb_ress_css: nb_ress_css using ATF-chrome-plugin - nb_ress_img: nb_ress_img using ATF-chrome-plugin - nb_ress_js: nb_ress_js using ATF-chrome-plugin - num_origins: num_origins using ATF-chrome-plugin - num_ressources: num_ressources using ATF-chrome-plugin - oi_aatf: oi_aatf using ATF-chrome-plugin - oi_plt: oi_plt using ATF-chrome-plugin - plt: plt using ATF-chrome-pluginRegarding progressionCurves.csv
, the columns are: - id: Unique identification of an experiment (consisting of an index 'set of conditions' and an index 'current page') - url: Url of the current page. SUBPAGE stands for a path. - run: Current run (linked with index of the config for WPT) - filename: Filename of the pcap - fullname: Fullname of the pcap - har_size: Size of the HAR for this experiment, - pagedata_size: Size of the page data report - pcap_size: Size of the pcap - App Byte Index (ms): Application Byte Index as computed from the har file (in the browser) - bytesIn_APP: Total bytes in as seen in the browser, - bytesIn_NET: Total bytes in as seen in the network, - X_BI_net: Network Byte Index computed from the pcap file (in the network) - X_bin_0_for_B_completion to X_bin_99_for_B_completion: X_bin_k_for_B_completion is the bytes progress reached after k*100 millisecondsIf you use these datasets in your research, you can reference to the appropriate paper:@inproceedings{qoeNetworking2020, title={Revealing QoE of Web Users from Encrypted Network Traffic}, author={Huet, Alexis and Saverimoutou, Antoine and Ben Houidi, Zied and Shi, Hao and Cai, Shengming and Xu, Jinchun and Mathieu, Bertrand and Rossi, Dario}, booktitle={2020 IFIP Networking Conference (IFIP Networking)}, year={2020}, organization={IEEE}}
When asked about "Attitudes towards the internet", most Chinese respondents pick "It is important to me to have mobile internet access in any place" as an answer. 48 percent did so in our online survey in 2025. Looking to gain valuable insights about users of internet providers worldwide? Check out our reports on consumers who use internet providers. These reports give readers a thorough picture of these customers, including their identities, preferences, opinions, and methods of communication.
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
This dataset is composed of photos of various resolution of 35'623 pages of printed books dating from the 15th to the 18th century. Each page has been attributed by experts from one to five labels corresponding to the font groups used in the text, with two extra-classes for non-textual content and fonts not present in the following list: Antiqua, Bastarda, Fraktur, Gotico Antiqua, Greek, Hebrew, Italic, Rotunda, Schwabacher, and Textura.
Note that to make downloading the dataset with slow or unreliable Internet connections easier, the dataset has been separated in several zip files. All zip files must be extracted in the same folder. The CSV files containing the labels should ideally be in the parent folder.
The labels are provided in two CSV files, one for training/tuning font group recognition methods, and the second one for evaluation purposes. Where several pages come from the same book, a special care has been taken to have all of them in the same subset.
The paper presenting this dataset in detail is "Dataset of Pages from Early Printed Books with Multiple Font Groups", accepted at the 5th International Workshop on Historical Document Imaging and Processing, Sydney, Australia.
We would like to thank the British Library (London), Bayerische Staatsbibliothek München, Staatsbibliothek zu Berlin, Universitätsbibliothek Erlangen, Universitätsbibliothek Heidelberg, Staats- und Universitäatsbibliothek Göttingen, Stadt- und Universitätsbibliothek Köln, Württembergische Landesbibliothek Stuttgart and Herzog August Bibliothek Wolfenbüttel for the data they sent us and kindly allowed us to use for this public dataset.
How frequently a word occurs in a language is an important piece of information for natural language processing and linguists. In natural language processing, very frequent words tend to be less informative than less frequent one and are often removed during preprocessing. Human language users are also sensitive to word frequency. How often a word is used affects language processing in humans. For example, very frequent words are read and understood more quickly and can be understood more easily in background noise.
This dataset contains the counts of the 333,333 most commonly-used single words on the English language web, as derived from the Google Web Trillion Word Corpus.
Data files were derived from the Google Web Trillion Word Corpus (as described by Thorsten Brants and Alex Franz, and distributed by the Linguistic Data Consortium) by Peter Norvig. You can find more information on these files and the code used to generate them here.
The code used to generate this dataset is distributed under the MIT License.
This ongoing dataset contains monthly precipitation measurements from a network of standard can rain gauges at the Jornada Experimental Range in Dona Ana County, New Mexico, USA. Precipitation physically collects within gauges during the month and is manually measured with a graduated cylinder at the end of each month. This network is maintained by USDA Agricultural Research Service personnel. This dataset includes 39 different locations but only 29 of them are current. Other precipitation data exist for this area, including event-based tipping bucket data with timestamps, but do not go as far back in time as this dataset. Resources in this dataset:Resource Title: Website Pointer to html file. File Name: Web Page, url: https://portal.edirepository.org/nis/mapbrowse?scope=knb-lter-jrn&identifier=210380001 Webpage with information and links to data files for download
http://www.gnu.org/licenses/gpl-3.0.en.htmlhttp://www.gnu.org/licenses/gpl-3.0.en.html
In order to improve the capacity of storage, exploration and processing of sensor data, a spatial DBMS was used and the Aquopts system was implemented.
In field surveys using different sensors on the aquatic environment, the existence of spatial attributes in the dataset is common, motivating the adoption of PostgreSQL and its spatial extension PostGIS. To enable the insertion of new data sets as well as new devices and sensing equipment, the database was modeled to support updates and provide structures for storing all the data collected in the field campaigns in conjunction with other possible future data sources. The database model provides resources to manage spatial and temporal data and allows flexibility to select and filter the dataset.
The data model ensures the storage integrity of the information related to the samplings performed during the field survey in an architecture that benefits the organization and management of the data. However, in addition to the storage specified on the data model, there are several procedures that need to be applied to the data to prepare it for analysis. Some validations are important to identify spurious data that may represent important sources of information about data quality. Other corrections are essential to tweak the data and eliminate undesirable effects. Some equations can be used to produce other factors that can be obtained from the combination of attributes. In general, the processing steps comprise a cycle of important operations that are directly related to the characteristics of the data set. Considering the data of the sensors stored in the database, an interactive prototype system, named Aquopts, was developed to perform the necessary standardization and basic corrections and produce useful data for analysis, according to the correction methods known in the literature.
The system provides resources for the analyst to automate the process of reading, inserting, integrating, interpolating, correcting, and other calculations that are always repeated after exporting field campaign data and producing new data sets. All operations and processing required for data integration and correction have been implemented from the PHP and Python language and are available from a Web interface, which can be accessed from any computer connected to the internet. The data access cab be access online (http://sertie.fct.unesp.br/aquopts), but the resources are restricted by registration and permissions for each user. After their identification, the system evaluates the access permissions and makes available the options of insertion of new datasets.
The source-code of the entire Aquopts system are available at: https://github.com/carmoafc/aquopts
The system and additional results were described on the official paper (under review)
2023 Updates to the National Incident Feature Service and Event Geodatabase For 2023, there are no schema updates and no major changes to GeoOps or the GISS Workflow! This is a conscious choice and is intended to provide a needed break for both users and administrators. Over the last 5 years, nearly every aspect of the GISS position has seen a major overhaul and while the advancements have been overwhelmingly positive, many of us are experiencing change fatigue. This is not to say there is no room for improvement. Many great suggestions were received throughout the season and in the GISS Survey, and they will be considered for inclusion in 2024. That there are no critical updates necessary also indicates that we have reached a level of maturity with the current state, and that is good news for everyone. Please continue to submit your ideas; they are appreciated and valuable insight, even if the change is not implemented. For information on 2023 AGOL updates please see the Create and Share Web Maps | NWCG page. There are three smaller changes worth noting this year: Standard Symbology is now the default on the NIFS For most workflows, the update will be seamless. All the Event Standard symbols are now supported in Field Maps and Map Viewer. Most users will now see the same symbols in all print and digital products. However, in AGOL some web apps do not support the complex line symbols. The simplified lines will still be present in the official Editing Apps (Operations, SITL, and GISS), and any custom apps built with the Web App Builder (WAB) interface. Experience Builder can be used for any new app creation. If you must use WAB or another app that cannot display the complex line symbology in the NIFS, please contact wildfireresponse@firenet.gov for guidance. Event Line now has Preconfigured Labels Labels on Event Line have historically been uncommon, but to speed their implementation when necessary, color-coded labels classes have been added to the NIFS and the lyrx files provided in the GIS Folder Structure. They can be disabled or modified as needed, should they interfere with any of your workflows. “Restricted” Folder added to GeoOps Folder Structure At the base level within the 2023_Template, a ‘restricted’ folder is now included. This folder should be used for all data and products that contain sensitive, restricted, or controlled-unclassified information. This will aid the DOCL and any future FOIA liaisons in protecting this information. When using OneDrive, this folder can optionally be password protected. Reminder: Sensitive Data is not allowed to be hosted within the NIFC Org.
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
A major goal of community ecology is understanding the processes responsible for generating biodiversity patterns along spatial and environmental gradients. In stream ecosystems, system specific conceptual frameworks have dominated research describing biodiversity change along longitudinal gradients of river networks. However, support for these conceptual frameworks has been mixed, mainly applicable to specific stream ecosystems and biomes, and these frameworks have placed less emphasis on general mechanisms driving biodiversity patterns. Rethinking biodiversity patterns and processes in stream ecosystems with a focus on the overarching mechanisms common across ecosystems will provide a more holistic understanding of why biodiversity patterns vary along river networks. In this study, we apply the Theory of Ecological Communities (TEC) conceptual framework to stream ecosystems to focus explicitly on the core ecological processes structuring communities: dispersal, speciation, niche selection, and ecological drift. Using a unique case study from high elevation networks of connected lakes and streams, we sampled stream invertebrate communities in the Sierra Nevada, CA to test established stream ecology frameworks and compared them to the TEC framework. Local diversity increased and β-diversity decreased moving downstream from the headwaters, consistent with the river continuum concept and the small but mighty framework of mountain stream biodiversity. Local diversity was also structured by distance below upstream lakes, where diversity increased with distance below upstream lakes, in support of the serial discontinuity concept. Despite some support for the biodiversity patterns predicted from the stream ecology frameworks, no single framework was fully supported, suggesting “context dependence”. By framing our results under the TEC, we found species diversity was structured by niche selection, where local diversity was highest in environmentally favorable sites. Local diversity was also highest in sites with small community sizes, countering predicted effects of ecological drift. Moreover, higher β-diversity in the headwaters was influenced by dispersal and niche selection, where environmentally harsh and spatially isolated sites exhibit higher community variation. Taken together our results suggest that combining system specific ecological frameworks with the TEC provides a powerful approach for inferring the mechanisms driving biodiversity patterns and provides a path toward generalization of biodiversity research across ecosystems. Methods Study Area The study area was located in the Sierra Nevada Mountains of eastern California (USA) and encompasses portions of Inyo National Forest and Sequoia-Kings Canyon National Park. Over the ice-free seasons (June-September), we sampled five distinct lake-stream networks, where each network was within a spatially distinct catchment and were treated as independent replicate systems (Fig. 3). The Kern (n=24) and Bubbs (n=26) networks were sampled in 2011, the Evolution (n=21) and Cascades (n=11) networks in 2018, and Rock Creek (n=36) in 2019. For each lake-stream network, streams were sampled throughout the network along a spatial gradient from headwaters downstream as well as along a spatial gradient downstream from lakes. Because the spatial distances of the river networks and the distance separating lakes naturally vary among networks as well as backcountry sampling constraints, the number of sites sampled along the distance from headwaters gradient varied (n=11 to n=36) and the downstream lake gradient varied (n=1 to n=9). This field system and the data collected naturally provide spatial gradients relevant to test stream ecology theories. In addition, this data is ideal for testing TEC processes because of the naturally varying gradients of community size, connectivity, and environmental heterogeneity present in our sampling design. Field Methods At each sampling location, we established transects in riffle sections of streams. At five equally spaced points along transects we measured stream depth and current velocity at mid-depth using a portable flow meter (Marsh-McBirney Flow Mate 2000). We then calculated stream discharge as the sum of the product of average depth x current velocity x width/5 over all transect points (Gordon et al. 2010; Herbst et al. 2018). A calibrated YSI multiparameter device was placed above transects to measure temperature, dissolved oxygen, conductivity, and pH. Benthic chlorophyll data was collected by scrubbing the entire surface area of three randomly selected cobble sized rocks (64-255 mm) of benthic algae (periphyton) with a toothbrush for 60 seconds (Herbst and Cooper 2010). Chlorophyll measurements were taken using a handheld fluorometer (Turner Designs Aquafluor), which measures raw fluorescence units. Florescent measurements were calibrated to chlorophyll concentration using a known concentration of Rhodamine. We standardized chlorophyll measurements by accounting for both the surface area of rocks and volume of water used to remove algae. Eight to twelve macroinvertebrate samples at each site were collected using a D-frame kick net (250 mm mesh, 30cm opening, 0.09m2 sample area) in riffle habitats, depending on the density of macroinvertebrate samples collected. We took samples by placing the net on the streambed, then turning and brushing all substrate by hand in the sampling area (30cm x 30cm) immediately above the net, with dislodged invertebrates being carried by currents into the net. All macroinvertebrate samples were preserved in 75% ethanol within 48 hours of sampling. Samples were sorted, identified, and counted in the laboratory. Taxa were identified to the finest taxonomic level possible, usually to genus or species for insects (excluding Chironomidae) and order or class for non-insects (Merritt, Cummins, and Berg 2019). The replicate samples taken at each site were pooled together and divided by the number of replicates and the area sampled to determine the density of invertebrate communities. Spatial Data Stream distance measurements were taken using the R package “riverdist”, which utilizes data from the USGS National Hydrological Dataset Flowline in order to determine pairwise distances from sampling sites along the river network (Tyers 2020). We determined distance below upstream lakes, with the closest upstream lake location being the outlet of the lake determined by the USGS Watershed Boundary Dataset. For sites where multiple upstream lakes were draining into streams, we defined the upstream lake as the closest upstream lake to sites that was also along the mainstem of the flowline. We determined distance from headwaters as the streamwise distance from sites to the uppermost portion (headwaters) of the mainstem of streams, where the headwaters of streams was determined by the endpoint (beginning) of the flowline in the USGS NHD Flowline Dataset (U.S. Geological Survey 2016). In cases where multiple headwater stream reaches corresponded to downstream sites, we defined the headwaters as the particular reach that accounted for the most discharge which was determined using USGS Flowline Dataset. Upstream lake area and perimeter measurements were determined using the USGS Watershed Boundary Dataset. Land-cover proportions were computed using the 2016 USGS National Land Cover Database (Jin et al. 2019).
Where exactly was that elementary school again that's closest to your home and that your children can easily reach without having to cross many streets? Can you reach your workplace entirely via bike paths? Will you have to wait at the construction site again next Sunday on the way to the sports field? There is a lot of data on the internet that can answer these and similar questions – but finding it is not always easy. OpenData.HRO is a web application that serves as a catalog for many useful datasets. The application is operated by the Hanseatic and University City of Rostock, which is also the owner and publisher of the data. You can use the application to search for, view, and download data for yourself and/or others. Depending on the type of dataset, OpenData.HRO also offers it as database content, providing you with some useful statistical and/or visualization tools. The present web application is based on the powerful open-source software CKAN, maintained and further developed by the Open Knowledge Foundation. Each dataset in CKAN consists of a description of the contained data as well as the data itself. The description includes important information such as the type of file formats in which the data is offered, the license under which it is provided, and the categories and subject areas to which it is assigned. The data and their descriptions can be updated or supplemented, with CKAN always recording all changes by means of automatic versioning. CKAN is used by a large number of data catalogs on the internet. The Data Hub, for example, is a publicly editable data catalog in the Wikipedia style. The British government uses CKAN to operate data.gov.uk – currently with approximately 8,000 government datasets. The official public data of most European countries are listed in the CKAN catalog on europeandataportal.eu. You can find a complete list of catalogs like this on dataportals.org, a page that is also operated with CKAN. Unless otherwise stated, the data on OpenData.HRO are subject to a free license. This means that you can freely use and exploit the data in compliance with the conditions set out in the terms of use (and they are anything but restrictive). Perhaps you would like to use the data on art in public spaces to build a smartphone app that helps to make a tour of Rostock culturally sophisticated? Go for it! Open Data promotes entrepreneurship, collaborative science, and transparent administration. You can learn more about Open Data in the Open Data Handbook. The Open Knowledge Foundation is a non-profit organization for the promotion of open knowledge: developing and improving CKAN is one of the ways to achieve this. If you would like to contribute to CKAN with design or code, you can join the developer mailing lists or visit the OKFN pages to learn more about CKAN and other projects. Translated from German Original Text: Wo genau war nochmal die Grundschule, die Ihrem zu Hause am nächsten ist und die Ihre Kinder gut erreichen können, ohne dabei viele Straßen überqueren zu müssen? Können Sie Ihren Arbeitsplatz durchgängig über Fahrradwege erreichen? Werden Sie nächsten Sonntag wieder an der Baustelle warten müssen auf dem Weg zum Sportplatz? Es gibt viele Daten im Internet, die solche und ähnliche Fragen beantworten können – allein sie zu finden ist nicht immer einfach. OpenData.HRO ist eine Web-Anwendung, die als Katalog für viele nützliche Daten dient. Betrieben wird die Anwendung von der Hanse- und Universitätsstadt Rostock, die zugleich Eigentümerin und Herausgeberin der Daten ist. Sie können die Anwendung nutzen, um für sich und/oder andere Daten zu suchen, anzuschauen und herunterzuladen. Abhängig von der Art eines Datensatzes bietet OpenData.HRO diesen auch als Datenbankinhalt an, sodass Ihnen einige nützliche Statistik- und/oder Visualisierungswerkzeuge zur Verfügung gestellt werden. Die vorliegende Web-Anwendung basiert auf der mächtigen Open-Source-Software CKAN, gepflegt und weiterentwickelt von der Open Knowledge Foundation. Jeder Datensatz in CKAN besteht aus einer Beschreibung der enthaltenen Daten sowie den Daten selbst. Zur Beschreibung zählen wichtige Informationen wie zum Beispiel die Art der Dateiformate, in denen die Daten angeboten werden, die Lizenz, unter der sie stehen, und die Kategorien und Themenbereiche, denen sie zugeordnet sind. Die Daten und deren Beschreibungen können aktualisiert oder ergänzt werden, wobei CKAN stets alle Änderungen aufzeichnet mittels einer automatischen Versionierung. CKAN wird von einer großen Anzahl an Datenkatalogen im Internet genutzt. The Data Hub zum Beispiel ist ein von der Öffentlichkeit bearbeitbarer Datenkatalog im Wikipedia-Stil. Die britische Regierung nutzt CKAN, um data.gov.uk zu betreiben – zur Zeit mit etwa 8.000 Regierungsdatensätzen. Die offiziellen öffentlichen Daten der meisten europäischen Staaten sind im CKAN-Katalog auf europeandataportal.eu gelistet. Sie finden eine vollständige Liste von Katalogen wie diesem auf dataportals.org, einer Seite, die ebenfalls mit CKAN betrieben wird. Sofern nicht anders angegeben unterliegen die Daten bei OpenData.HRO einer freien Lizenz. Das heißt, dass Sie die Daten unter Einhaltung der in den Nutzungsbedingungen festgelegten Konditionen (und die sind alles andere als restriktiv) beliebig verwenden und verwerten können. Vielleicht möchten Sie ja die Daten zur Kunst im öffentlichen Raum nutzen, um eine Smartphone-App zu bauen, die dabei hilft, einen Rundgang durch Rostock kulturell anspruchsvoll zu gestalten? Nur zu! Open Data fördert den Unternehmergeist, gemeinschaftliche Wissenschaft und transparentes Verwaltungshandeln. Mehr zu Open Data erfahren Sie im Open Data Handbook. Die Open Knowledge Foundation ist eine gemeinnützige Organisation zur Förderung von offenem Wissen: CKAN zu entwickeln und zu verbessern ist einer der Wege dies zu erreichen. Wenn Sie mit Design oder Code zu CKAN beitragen möchten, so können Sie den Entwickler-Mailinglisten beitreten oder die OKFN-Seiten besuchen, um mehr über CKAN und andere Projekte zu erfahren.
https://www.icpsr.umich.edu/web/ICPSR/studies/8379/termshttps://www.icpsr.umich.edu/web/ICPSR/studies/8379/terms
This dataset consists of cartographic data in digital line graph (DLG) form for the northeastern states (Connecticut, Maine, Massachusetts, New Hampshire, New York, Rhode Island and Vermont). Information is presented on two planimetric base categories, political boundaries and administrative boundaries, each available in two formats: the topologically structured format and a simpler format optimized for graphic display. These DGL data can be used to plot base maps and for various kinds of spatial analysis. They may also be combined with other geographically referenced data to facilitate analysis, for example the Geographic Names Information System.
The U.S. Geological Survey (USGS), in cooperation with the Missouri Department of Natural Resources (MDNR), collects data pertaining to the surface-water resources of Missouri. These data are collected as part of the Missouri Ambient Water-Quality Monitoring Network (AWQMN) and are stored and maintained by the USGS National Water Information System (NWIS) database. These data constitute a valuable source of reliable, impartial, and timely information for developing an improved understanding of the water resources of the State. Water-quality data collected between water years 1993 and 2017 were analyzed for long term trends and the network was investigated to identify data gaps or redundant data to assist MDNR on how to optimize the network in the future. This is a companion data release product to the Scientific Investigation Report: Richards, J.M., and Barr, M.N., 2021, General water-quality conditions, long-term trends, and network analysis at selected sites within the Ambient Water-Quality Monitoring Network in Missouri, water years 1993–2017: U.S. Geological Survey Scientific Investigations Report 2021–5079, 75 p., https://doi.org/10.3133/sir20215079. The following selected tables are included in this data release in compressed (.zip) format: AWQMN_EGRET_data.xlsx -- Data retrieved from the USGS National Water Information System database that was quality assured and conditioned for network analysis of the Missouri Ambient Water-Quality Monitoring Network AWQMN_R-QWTREND_data.xlsx -- Data retrieved from the USGS National Water Information System database that was quality assured and conditioned for analysis of flow-weighted trends for selected sites in the Missouri Ambient Water-Quality Monitoring Network AWQMN_R-QWTREND_outliers.xlsx -- Data flagged as outliers during analysis of flow-weighted trends for selected sites in the Missouri Ambient Water-Quality Monitoring Network AWQMN_R-QWTREND_outliers_quarterly.xlsx -- Data flagged as outliers during analysis of flow-weighted trends using a simulated quarterly sampling frequency dataset for selected sites in the Missouri Ambient Water-Quality Monitoring Network AWQMN_descriptive_statistics_WY1993-2017.xlsx -- Descriptive statistics for selected water-quality parameters at selected sites in the Missouri Ambient Water-Quality Monitoring Network The following selected graphics are included in this data release in .pdf format. Also included in this data release are web pages accessible for people with disabilities provided in compressed .zip format. The web pages present the same information as the .pdf files: Annual and seasonal discharge trends.pdf -- Graphics of discharge trends produced from the EGRET software for selected sites in the Missouri Ambient Water-Quality Monitoring Network. Graphics provided to support the interpretations in the Scientific Investigations Report. Annual_and_seasonal_discharge_trends_htm.zip -- Compressed web page presenting graphics of discharge trends produced from the EGRET software for selected sites in the Missouri Ambient Water-Quality Monitoring Network. Graphics provided to support the interpretations in the Scientific Investigations Report. Graphics of simulated quarterly sampling frequency trends.pdf -- Graphics of results of simulated quarterly sampling frequency trends produced by the R-QWTREND software at selected sites in the Missouri Ambient Water-Quality Monitoring Network. Graphics provided to support the interpretations in the Scientific Investigations Report. Graphics_of_simulated_quarterly_sampling_frequency_trends_htm.zip -- Compressed web page presenting graphics of results of simulated quarterly sampling frequency trends produced by the R-QWTREND software at selected sites in the Missouri Ambient Water-Quality Monitoring Network. Graphics provided to support the interpretations in the Scientific Investigations Report. Graphics of median parameter values.pdf -- Graphics of median values for selected parameters at selected sites in the Missouri Ambient Water-Quality Monitoring Network. Graphics provided to support the interpretations in the Scientific Investigations Report. Graphics_of_median_parameter_values_htm.zip -- Compressed web page presenting graphics of median values for selected parameters at selected sites in the Missouri Ambient Water-Quality Monitoring Network. Graphics provided to support the interpretations in the Scientific Investigations Report. Parameter value versus time.pdf -- Scatter plots of the value of selected parameters versus time at selected sites in the Missouri Ambient Water-Quality Monitoring Network. Graphics provided to support the interpretations in the Scientific Investigations Report. Parameter_value_versus_time_htm.zip -- Compressed web page presenting scatter plots of the value of selected parameters versus time at selected sites in the Missouri Ambient Water-Quality Monitoring Network. Graphics provided to support the interpretations in the Scientific Investigations Report. Parameter value versus discharge.pdf -- Scatter plots of the value of selected parameters versus discharge at selected sites in the Missouri Ambient Water-Quality Monitoring Network. Graphics provided to support the interpretations in the Scientific Investigations Report. Parameter_value_versus_discharge_htm.zip -- Compressed web page presenting scatter plots of the value of selected parameters versus discharge at selected sites in the Missouri Ambient Water-Quality Monitoring Network. Graphics provided to support the interpretations in the Scientific Investigations Report. Boxplot of parameter value distribution by season.pdf -- Seasonal boxplots of selected parameters from selected sites in the Missouri Ambient Water-Quality Monitoring Network. Seasons defined as Winter (December, January, and February), Spring (March, April, and May), Summer (June, July, and August), and Fall (September, October, and November). Graphics provided to support the interpretations in the Scientific Investigations Report. Boxplot_of_parameter_value_distribution_by_season_htm.zip -- Compressed web page presenting seasonal boxplots of selected parameters from selected sites in the Missouri Ambient Water-Quality Monitoring Network. Seasons defined as Winter (December, January, and February), Spring (March, April, and May), Summer (June, July, and August), and Fall (September, October, and November). Graphics provided to support the interpretations in the Scientific Investigations Report. Boxplot of sampled discharge compared with mean daily discharge.pdf -- Boxplots of the distribution of discharge collected at the time of sampling of selected parameters compared with the period of record discharge distribution from selected sites in the Missouri Ambient Water-Quality Monitoring Network. Graphics provided to support the interpretations in the Scientific Investigations Report. Boxplot_of_sampled_discharge_compared_with_mean_daily_discharge_htm.zip -- Compressed web page presenting boxplots of the distribution of discharge collected at the time of sampling of selected parameters compared with the period of record discharge distribution from selected sites in the Missouri Ambient Water-Quality Monitoring Network. Graphics provided to support the interpretations in the Scientific Investigations Report. Boxplot of parameter value distribution by month.pdf -- Monthly boxplots of selected parameters from selected sites in the Missouri Ambient Water-Quality Monitoring Network. Graphics provided to support the interpretations in the Scientific Investigations Report. Boxplot_of_parameter_value_distribution_by_month_htm.zip -- Compressed web page presenting monthly boxplots of selected parameters from selected sites in the Missouri Ambient Water-Quality Monitoring Network. Graphics provided to support the interpretations in the Scientific Investigations Report.
The global number of internet users in was forecast to continuously increase between 2024 and 2029 by in total 1.3 billion users (+23.66 percent). After the fifteenth consecutive increasing year, the number of users is estimated to reach 7 billion users and therefore a new peak in 2029. Notably, the number of internet users of was continuously increasing over the past years.Depicted is the estimated number of individuals in the country or region at hand, that use the internet. As the datasource clarifies, connection quality and usage frequency are distinct aspects, not taken into account here.The shown data are an excerpt of Statista's Key Market Indicators (KMI). The KMI are a collection of primary and secondary indicators on the macro-economic, demographic and technological environment in up to 150 countries and regions worldwide. All indicators are sourced from international and national statistical offices, trade associations and the trade press and they are processed to generate comparable data sets (see supplementary notes under details for more information).Find more key insights for the number of internet users in countries like the Americas and Asia.