55 datasets found

r
Amazon Daily Traffic Statistics 2025
redstagfulfillment.com
html
Updated May 19, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Red Stag Fulfillment (2025). Amazon Daily Traffic Statistics 2025 [Dataset]. https://redstagfulfillment.com/how-many-daily-visits-does-amazon-receive/
Explore at:
htmlAvailable download formats
Dataset updated
May 19, 2025
Dataset authored and provided by
Red Stag Fulfillment
Time period covered
2019 - 2025
Area covered
Global
Variables measured
Daily website visits, Monthly traffic volume, Geographic distribution, Seasonal traffic patterns, Traffic sources breakdown, Mobile vs desktop traffic split
Description
Comprehensive dataset analyzing Amazon's daily website visits, traffic patterns, seasonal trends, and comparative analysis with other ecommerce platforms based on May 2025 data.
Job Offers Web Scraping Search
kaggle.com
Updated Feb 11, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The Devastator (2023). Job Offers Web Scraping Search [Dataset]. https://www.kaggle.com/datasets/thedevastator/job-offers-web-scraping-search
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Feb 11, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
The Devastator
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Job Offers Web Scraping Search

Targeted Results to Find the Optimal Work Solution

By [source]

About this dataset

This dataset collects job offers from web scraping which are filtered according to specific keywords, locations and times. This data gives users rich and precise search capabilities to uncover the best working solution for them. With the information collected, users can explore options that match with their personal situation, skillset and preferences in terms of location and schedule. The columns provide detailed information around job titles, employer names, locations, time frames as well as other necessary parameters so you can make a smart choice for your next career opportunity

More Datasets

For more datasets, click here.

Featured Notebooks

🚨 Your notebook can be here! 🚨!

How to use the dataset

This dataset is a great resource for those looking to find an optimal work solution based on keywords, location and time parameters. With this information, users can quickly and easily search through job offers that best fit their needs. Here are some tips on how to use this dataset to its fullest potential:

Start by identifying what type of job offer you want to find. The keyword column will help you narrow down your search by allowing you to search for job postings that contain the word or phrase you are looking for.

Next, consider where the job is located – the Location column tells you where in the world each posting is from so make sure it’s somewhere that suits your needs!

Finally, consider when the position is available – look at the Time frame column which gives an indication of when each posting was made as well as if it’s a full-time/ part-time role or even if it’s a casual/temporary position from day one so make sure it meets your requirements first before applying!

Additionally, if details such as hours per week or further schedule information are important criteria then there is also info provided under Horari and Temps Oferta columns too! Now that all three criteria have been ticked off - key words, location and time frame - then take a look at Empresa (Company Name) and Nom_Oferta (Post Name) columns too in order to get an idea of who will be employing you should you land the gig!

All these pieces of data put together should give any motivated individual all they need in order to seek out an optimal work solution - keep hunting good luck!

Research Ideas

Machine learning can be used to groups job offers in order to facilitate the identification of similarities and differences between them. This could allow users to specifically target their search for a work solution.

The data can be used to compare job offerings across different areas or types of jobs, enabling users to make better informed decisions in terms of their career options and goals.

It may also provide an insight into the local job market, enabling companies and employers to identify where there is potential for new opportunities or possible trends that simply may have previously gone unnoticed

Acknowledgements

If you use this dataset in your research, please credit the original authors. Data Source

License

License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.

Columns

File: web_scraping_information_offers.csv | Column name | Description | |:-----------------|:------------------------------------| | Nom_Oferta | Name of the job offer. (String) | | Empresa | Company offering the job. (String) | | Ubicació | Location of the job offer. (String) | | Temps_Oferta | Time of the job offer. (String) | | Horari | Schedule of the job offer. (String) |

Acknowledgements

If you use this dataset in your research, please credit the original authors. If you use this dataset in your research, please credit .
H
MCH Data Connect
dataverse.harvard.edu
Updated Oct 31, 2009
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Harvard Dataverse (2009). MCH Data Connect [Dataset]. http://doi.org/10.7910/DVN/V2SJP4
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.7910/DVN/V2SJP4
Dataset updated
Oct 31, 2009
Dataset provided by
Harvard Dataverse
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
About MCH Data Connect The MCH Data Connect provides public health professionals, researchers, practitioners, policy makers and students with a comprehensive catalog of maternal and child health data resources. Users can access a variety of databases, data sets, interactive tools, and maps related to their area of interest. Maternal and Child Health The MCH Data Connect uses a broad definition of Maternal and Child Health, including the influence of access to health care, health, health behaviors, education, violence, environmental conditions, demographics, and policy on the health of women, men, children, youth, families and communities. Topics Topics included in the MCH Data Connect: health care policy, experience of health care, family planning, sexual and reproductive health, economics, politics, social services, violence, and health behaviors, among others. Data Resources Data resources described in this catalog include data sets, statistics, interactive tables, interactive maps, and databases. Many of the data sources are available for public consumption, though specific databases may require th e user to purchase or apply for the dataset. Basic Search Locate the "Search Studies" highlighted box above the list of resource on the MCH Data Connect homepage. Leave "Cataloging Information" as the default basic search command. To search, enter the keyword, topic or area of interest in field box (next to "Cataloging Information") to obtain a list of resources that apply to your search. Access Resource Once the search is completed, a list of resources will appea r. The first line provides a brief summary. To get more information (including producer, background, user functionality and data sources) about the specific resource, click on the underlined/ blue hyperlinked title. Once the resource description is opened, click on the link that says “Click here to access data from site” to go directly to the resource's web page. Advanced Search Click on the "Advanced Search" link located in the "Search Studies" highlighted box above the list of resources on the MCH Data Connect homepage. From the Search Scope drop down lists, enter either Keyword or Abstract (these are the most detailed fields used by the MCH Data Connect). Enter multiple search terms to use the “and” searching criterion. For example, to search for resources related to diabetes and exercise, the user would select "Keyword" from the drop d own list, "contains" and then enter "diabetes" in the field box. The user would repeat the first two steps to enter "exercise" in the next field box. Collections The Topic Folders section provides a list of broad categories that include many resources found in the MCH Data Connect. The files of the Topic Folders are on the left side of the homepage. Clicking on a file folder will result in a list of the resources that are related to the topic. The Topic Folders offer a starting place for your search. You can narrow your search further by performing either of the previous two searching techniques within a collection. Qu estions or Comments? For questions, comments, or if you think we missed a useful information tool, please contact us via email at mchdataconnect@gmail.com. Glossary Some terms you will see on this website are unique to the cataloging service, Dataverse; The MCH Data Connect uses them differently. Please see below for a glossary of terms you will find at MCH Data Connect. Please note that interactive tools, datasets, and reports are referred to as “resources.” Te rms Dataverse, the program used to develop the MCH Data Connect Study, resource containing relevant public health data and/or information Collection, broad categories into which resources have been classified How to Cite, used as the resource title by MCH Data Connect Study Global ID, unique code given to each resource Producer, the agency or entity that produces and maintains the resource< /p> Deposit Date, date when resource was added to the MCH Data Connect Provenance, will always be MCH Data Connect Abstract and Scope, contains resource summary and geographic unit information Abstract, summary of the resource Background, information about the purpose and development of the resource User Functionality, what users can do with the data (i.e. download, customize charts) Dat a Notes, information about data sources, years and samples (if applicable) Abstract Date, month and year that resource was added to MCH Data Connect Keyword, specific variables, topics, or words that the resource addresses/encompasses Geographic Unit, level at which data is available Title, name of specific resource Keyword Vocabulary, “link:” clicking on “link” will take user to an external website relate d to the keyword term. The following terms are not used by the MCH Data Connect Dataverse: Topic Classification; Topic Classification Vocabulary; Other ID; Author; Distributor; Funding Agency; Production Date; Distribution Date; Time Period Covered Start; Time...
u
Data from: Google Analytics & Twitter dataset from a movies, TV series and...
portalcientificovalencia.univeuropea.com
figshare.com
Updated 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yeste, Víctor; Yeste, Víctor (2024). Google Analytics & Twitter dataset from a movies, TV series and videogames website [Dataset]. https://portalcientificovalencia.univeuropea.com/documentos/67321ed3aea56d4af0485dc8
Explore at:
Dataset updated
2024
Authors
Yeste, Víctor; Yeste, Víctor
Description
Author: Víctor Yeste. Universitat Politècnica de Valencia.The object of this study is the design of a cybermetric methodology whose objectives are to measure the success of the content published in online media and the possible prediction of the selected success variables.In this case, due to the need to integrate data from two separate areas, such as web publishing and the analysis of their shares and related topics on Twitter, has opted for programming as you access both the Google Analytics v4 reporting API and Twitter Standard API, always respecting the limits of these.The website analyzed is hellofriki.com. It is an online media whose primary intention is to solve the need for information on some topics that provide daily a vast number of news in the form of news, as well as the possibility of analysis, reports, interviews, and many other information formats. All these contents are under the scope of the sections of cinema, series, video games, literature, and comics.This dataset has contributed to the elaboration of the PhD Thesis:Yeste Moreno, VM. (2021). Diseño de una metodología cibermétrica de cálculo del éxito para la optimización de contenidos web [Tesis doctoral]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/176009Data have been obtained from each last-minute news article published online according to the indicators described in the doctoral thesis. All related data are stored in a database, divided into the following tables:tesis_followers: User ID list of media account followers.tesis_hometimeline: data from tweets posted by the media account sharing breaking news from the web.status_id: Tweet IDcreated_at: date of publicationtext: content of the tweetpath: URL extracted after processing the shortened URL in textpost_shared: Article ID in WordPress that is being sharedretweet_count: number of retweetsfavorite_count: number of favoritestesis_hometimeline_other: data from tweets posted by the media account that do not share breaking news from the web. Other typologies, automatic Facebook shares, custom tweets without link to an article, etc. With the same fields as tesis_hometimeline.tesis_posts: data of articles published by the web and processed for some analysis.stats_id: Analysis IDpost_id: Article ID in WordPresspost_date: article publication date in WordPresspost_title: title of the articlepath: URL of the article in the middle webtags: Tags ID or WordPress tags related to the articleuniquepageviews: unique page viewsentrancerate: input ratioavgtimeonpage: average visit timeexitrate: output ratiopageviewspersession: page views per sessionadsense_adunitsviewed: number of ads viewed by usersadsense_viewableimpressionpercent: ad display ratioadsense_ctr: ad click ratioadsense_ecpm: estimated ad revenue per 1000 page viewstesis_stats: data from a particular analysis, performed at each published breaking news item. Fields with statistical values can be computed from the data in the other tables, but total and average calculations are saved for faster and easier further processing.id: ID of the analysisphase: phase of the thesis in which analysis has been carried out (right now all are 1)time: "0" if at the time of publication, "1" if 14 days laterstart_date: date and time of measurement on the day of publicationend_date: date and time when the measurement is made 14 days latermain_post_id: ID of the published article to be analysedmain_post_theme: Main section of the published article to analyzesuperheroes_theme: "1" if about superheroes, "0" if nottrailer_theme: "1" if trailer, "0" if notname: empty field, possibility to add a custom name manuallynotes: empty field, possibility to add personalized notes manually, as if some tag has been removed manually for being considered too generic, despite the fact that the editor put itnum_articles: number of articles analysednum_articles_with_traffic: number of articles analysed with traffic (which will be taken into account for traffic analysis)num_articles_with_tw_data: number of articles with data from when they were shared on the media’s Twitter accountnum_terms: number of terms analyzeduniquepageviews_total: total page viewsuniquepageviews_mean: average page viewsentrancerate_mean: average input ratioavgtimeonpage_mean: average duration of visitsexitrate_mean: average output ratiopageviewspersession_mean: average page views per sessiontotal: total of ads viewedadsense_adunitsviewed_mean: average of ads viewedadsense_viewableimpressionpercent_mean: average ad display ratioadsense_ctr_mean: average ad click ratioadsense_ecpm_mean: estimated ad revenue per 1000 page viewsTotal: total incomeretweet_count_mean: average incomefavorite_count_total: total of favoritesfavorite_count_mean: average of favoritesterms_ini_num_tweets: total tweets on the terms on the day of publicationterms_ini_retweet_count_total: total retweets on the terms on the day of publicationterms_ini_retweet_count_mean: average retweets on the terms on the day of publicationterms_ini_favorite_count_total: total of favorites on the terms on the day of publicationterms_ini_favorite_count_mean: average of favorites on the terms on the day of publicationterms_ini_followers_talking_rate: ratio of followers of the media Twitter account who have recently published a tweet talking about the terms on the day of publicationterms_ini_user_num_followers_mean: average followers of users who have spoken of the terms on the day of publicationterms_ini_user_num_tweets_mean: average number of tweets published by users who spoke about the terms on the day of publicationterms_ini_user_age_mean: average age in days of users who have spoken of the terms on the day of publicationterms_ini_ur_inclusion_rate: URL inclusion ratio of tweets talking about terms on the day of publicationterms_end_num_tweets: total tweets on terms 14 days after publicationterms_ini_retweet_count_total: total retweets on terms 14 days after publicationterms_ini_retweet_count_mean: average retweets on terms 14 days after publicationterms_ini_favorite_count_total: total bookmarks on terms 14 days after publicationterms_ini_favorite_count_mean: average of favorites on terms 14 days after publicationterms_ini_followers_talking_rate: ratio of media Twitter account followers who have recently posted a tweet talking about the terms 14 days after publicationterms_ini_user_num_followers_mean: average followers of users who have spoken of the terms 14 days after publicationterms_ini_user_num_tweets_mean: average number of tweets published by users who have spoken about the terms 14 days after publicationterms_ini_user_age_mean: the average age in days of users who have spoken of the terms 14 days after publicationterms_ini_ur_inclusion_rate: URL inclusion ratio of tweets talking about terms 14 days after publication.tesis_terms: data of the terms (tags) related to the processed articles.stats_id: Analysis IDtime: "0" if at the time of publication, "1" if 14 days laterterm_id: Term ID (tag) in WordPressname: Name of the termslug: URL of the termnum_tweets: number of tweetsretweet_count_total: total retweetsretweet_count_mean: average retweetsfavorite_count_total: total of favoritesfavorite_count_mean: average of favoritesfollowers_talking_rate: ratio of followers of the media Twitter account who have recently published a tweet talking about the termuser_num_followers_mean: average followers of users who were talking about the termuser_num_tweets_mean: average number of tweets published by users who were talking about the termuser_age_mean: average age in days of users who were talking about the termurl_inclusion_rate: URL inclusion ratio
d
Agency Voter Registration Activity
catalog.data.gov
data.cityofnewyork.us
Updated Oct 11, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
data.cityofnewyork.us (2025). Agency Voter Registration Activity [Dataset]. https://catalog.data.gov/dataset/agency-voter-registration-activity
Explore at:
Dataset updated
Oct 11, 2025
Dataset provided by
data.cityofnewyork.us
Description
This dataset captures how many voter registration applications each agency has distributed, how many applications agency staff sent to the Board of Elections, how many staff each agency trained to distribute voter registration applications, whether or not the agency hosts a link to voting.nyc on its website and if so, how many clicks that link received during the reporting period.
g
OGD Portal: Daily usage by record (since January 2024) | gimi9.com
gimi9.com
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
OGD Portal: Daily usage by record (since January 2024) | gimi9.com [Dataset]. https://gimi9.com/dataset/eu_12610-kanton-basel-landschaft
Explore at:
Description
The data on the use of the data sets on the OGD portal BL (data.bl.ch) are collected and published by the specialist and coordination office OGD BL. Contains the day the usage was measured.dataset_title: The title of the dataset_id record: The technical ID of the dataset.visitors: Specifies the number of daily visitors to the record. Visitors are recorded by counting the unique IP addresses that recorded access on the day of the survey. The IP address represents the network address of the device from which the portal was accessed.interactions: Includes all interactions with any record on data.bl.ch. A visitor can trigger multiple interactions. Interactions include clicks on the website (searching datasets, filters, etc.) as well as API calls (downloading a dataset as a JSON file, etc.).RemarksOnly calls to publicly available datasets are shown.IP addresses and interactions of users with a login of the Canton of Basel-Landschaft - in particular of employees of the specialist and coordination office OGD - are removed from the dataset before publication and therefore not shown.Calls from actors that are clearly identifiable as bots by the user agent header are also not shown.Combinations of dataset and date for which no use occurred (Visitors == 0 & Interactions == 0) are not shown.Due to synchronization problems, data may be missing by the day.
Watershed Boundary Dataset HUC 8s
datalibrary-lnr.hub.arcgis.com
resilience.climate.gov
+5more
Updated Sep 6, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Esri (2023). Watershed Boundary Dataset HUC 8s [Dataset]. https://datalibrary-lnr.hub.arcgis.com/datasets/esri::watershed-boundary-dataset-huc-8s
Explore at:
Dataset updated
Sep 6, 2023
Dataset authored and provided by
Esrihttp://esri.com/
Area covered
Description
Each drainage area is considered a Hydrologic Unit (HU) and is given a Hydrologic Unit Code (HUC) which serves as the unique identifier for the area. HUC 2s, 6s, 8s, 10s, & 12s, define the drainage Regions, Subregions, Basins, Subbasins, Watersheds and Subwatersheds, respectively, across the United States. Their boundaries are defined by hydrologic and topographic criteria that delineate an area of land upstream from a specific point on a river and are determined solely upon science based hydrologic principles, not favoring any administrative boundaries, special projects, or a particular program or agency. The Watershed Boundary Dataset is delineated and georeferenced to the USGS 1:24,000 scale topographic basemap.Hydrologic Units are delineated to nest in a multi-level, hierarchical drainage system with corresponding HUCs, so that as you move from small scale to large scale the HUC digits increase in increments of two. For example, the very largest HUCs have 2 digits, and thus are referred to as HUC 2s, and the very smallest HUCs have 12 digits, and thus are referred to as HUC 12s.Dataset SummaryPhenomenon Mapped: Watersheds in the United States, as delineated by the Watershed Boundary Dataset (WBD)Geographic Extent: Contiguous United States, Alaska, Hawaii, Puerto Rico, Guam, US Virgin Islands, Northern Marianas Islands and American SamoaProjection: Web MercatorUpdate Frequency: AnnualVisible Scale: Visible at all scales, however USGS recommends this dataset should not be used for scales of 1:24,000 or larger.Source: United States Geological Survey (WBD)Data Vintage: January 7, 2025What can you do with this layer?This layer is suitable for both visualization and analysis acrossthe ArcGIS system. This layer can be combined with your data and other layers from the ArcGIS Living Atlas of the World in ArcGIS Online and ArcGIS Pro to create powerful web maps that can be used alone or in a story map or other application. Because this layer is part of the ArcGIS Living Atlas of the World it is easy to add to your map:In ArcGIS Online, you can add this layer to a map by selecting Add then Browse Living Atlas Layers. A window will open. Type "Watershed Boundary Dataset" in the search box and browse to the layer. Select the layer then click Add to Map. In ArcGIS Pro, open a map and select Add Data from the Map Tab. Select Data at the top of the drop down menu. The Add Data dialog box will open on the left side of the box, expand Portal if necessary, then select Living Atlas. Type "Watershed Boundary Dataset" in the search box, browse to the layer then click OK.Questions?Please leave a comment below if you have a question about this layer, and we will get back to you as soon as possible.
h
corpus-dataset-normalized-for-persian-and-english
huggingface.co
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ali Bahadorani, corpus-dataset-normalized-for-persian-and-english [Dataset]. https://huggingface.co/datasets/ali619/corpus-dataset-normalized-for-persian-and-english
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Authors
Ali Bahadorani
Description
Dataset Summary

Persian data of this dataset is a collection of 400k blog posts (RohanAiLab/persian_blog). these posts have been gathered from more than 10 websites. This dataset can be used in different NLP tasks like language modeling, creating tokenizer and text generation tasks.

To see Persian data in Viewer tab click here

English data of this dataset is merged from english-wiki-corpus dataset. Note: If you need only Persian corpus click here Note: The data for both Persian… See the full description on the dataset page: https://huggingface.co/datasets/ali619/corpus-dataset-normalized-for-persian-and-english.
f
The codes and data for "A Graph Convolutional Neural Network-based Method...
figshare.com
txt
Updated Jan 14, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
FirstName LastName (2025). The codes and data for "A Graph Convolutional Neural Network-based Method for Predicting Computational Intensity of Geocomputation" [Dataset]. http://doi.org/10.6084/m9.figshare.28200623.v2
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.28200623.v2
Dataset updated
Jan 14, 2025
Dataset provided by
figshare
Authors
FirstName LastName
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
A Graph Convolutional Neural Network-based Method for Predicting Computational Intensity of GeocomputationThis is the implementation for the paper "A Graph Convolutional Neural Network-based Method for Predicting Computational Intensity of Geocomputation".The framework is Learning-based Computing Framework for Geospatial data(LCF-G).Prediction, ParallelComputation and SampleGeneration.This paper includes three case studies, each corresponding to a folder. Each folder contains four subfolders: data, CIThe data folder contains geospatail data.The CIPrediction folder contains model training code.The ParallelComputation folder contains geographic computation code.The SampleGeneration folder contains code for sample generation.Case 1: Generation of DEM from point cloud datastep 1: Data downloadDataset 1 has been uploaded to the directory 1point2dem/data. The other two datasets, Dataset 2 and Dataset 3, can be downloaded from the following website:OpenTopographyBelow are the steps for downloading Dataset 2 and Dataset 3, along with the query parameters:Dataset 2:Visit OpenTopography Website: Go to Dataset 2 Download Link.https://portal.opentopography.org/lidarDataset?opentopoID=OTLAS.112018.2193.1Coordinates & Classification:In the section "1. Coordinates & Classification", select the option "Manually enter selection coordinates".Set the coordinates as follows: Xmin = 1372495.692761,Ymin = 5076006.86821,Xmax = 1378779.529766,Ymax = 5085586.39531Point Cloud Data Download:Under section "2. Point Cloud Data Download", choose the option "Point cloud data in LAS format".Submit:Click on "SUBMIT" to initiate the download.Dataset 3:Visit OpenTopography Website:Go to Dataset 3 Download Link: https://portal.opentopography.org/lidarDataset?opentopoID=OTLAS.052016.26912.1Coordinates & Classification:In the section "1. Coordinates & Classification", select the option "Manually enter selection coordinates".Set the coordinates as follows:Xmin = 470047.153826,Ymin = 4963418.512121,Xmax = 479547.16556,Ymax = 4972078.92768Point Cloud Data Download:Under section "2. Point Cloud Data Download", choose the option "Point cloud data in LAS format".Submit:Click on "SUBMIT" to initiate the download.step 2: Sample generationThis step involves data preparation, and samples can be generated using the provided code. Since the samples have already been uploaded to 1point2dem/SampleGeneration/data, this step is optional.cd 1point2dem/SampleGenerationg++ PointCloud2DEMSampleGeneration.cpp -o PointCloud2DEMSampleGenerationmpiexec -n {number_processes} ./PointCloud2DEMSampleGeneration ../data/pcd path/to/outputstep 3: Model trainingThis step involves training three models (GAN, CNN, GAT). The model results are saved in 1point2dem/SampleGeneration/result, and the results for Table 3 in the paper are derived from this output.cd 1point2dem/CIPredictionpython -u point_prediction.py --model [GCN|ChebNet|GATNet]step 4: Parallel computationThis step uses the trained models to optimize parallel computation. The results for Figures 11-13 in the paper are generated from the output of this command.cd 1point2dem/ParallelComputationg++ ParallelPointCloud2DEM.cpp -o ParallelPointCloud2DEMmpiexec -n {number_processes} ./ParallelPointCloud2DEM ../data/pcdCase 2: Spatial intersection of vector datastep 1: Data downloadSome data from the paper has been uploaded to 2intersection/data. The remaining OSM data can be downloaded from GeoFabrik. Below are the download steps and parameters:Directly click the following link to download the OSM data: GeoFabrik - Czech Republic OSM Datastep 2: Sample generationThis step involves data preparation, and samples can be generated using the provided code. Since the samples have already been uploaded to 2intersection/SampleGeneration/data, this step is optional.cd 2intersection/SampleGenerationg++ ParallelIntersection.cpp -o ParallelIntersectionmpiexec -n {number_processes} ./ParallelIntersection ../data/shpfile ../data/shpfilestep 3: Model trainingThis step involves training three models (GAN, CNN, GAT). The model results are saved in 2intersection/SampleGeneration/result, and the results for Table 5 in the paper are derived from this output.cd 2intersection/CIPredictionpython -u vector_prediction.py --model [GCN|ChebNet|GATNet]step 4: Parallel computationThis step uses the trained models to optimize parallel computation. The results for Figures 14-16 in the paper are generated from the output of this command.cd 2intersection/ParallelComputationg++ ParallelIntersection.cpp -o ParallelIntersectionmpiexec -n {number_processes} ./ParallelIntersection ../data/shpfile1 ../data/shpfile2Case 3: WOfS analysis using raster datastep 1: Data downloadSome data from the paper has been uploaded to 3wofs/data. The remaining data can be downloaded from http://openge.org.cn/advancedRetrieval?type=dataset. Below are the query parameters:Product Selection: Select LC08_L1TP and LC08_L1GTLatitude and Longitude Selection:Minimum Longitude: 112.5,Maximum Longitude: 115.5, Minimum Latitude: 29.5, Maximum Latitude: 31.5Time Range: 2013-01-01 to 2018-12-31Other parameters: Defaultstep 2: Sample generationThis step involves data preparation, and samples can be generated using the provided code. Since the samples have already been uploaded to 3wofs/SampleGeneration/data, this step is optional.cd 3wofs/SampleGenerationsbt packeagespark-submit --master {host1,host2,host3} --class whu.edu.cn.core.cube.raster.WOfSSampleGeneration path/to/package.jarstep 3: Model trainingThis step involves training three models (GAN, CNN, GAT). The model results are saved in 3wofs/SampleGeneration/result, and the results for Table 6 in the paper are derived from this output.cd 3wofs/CIPredictionpython -u raster_prediction.py --model [GCN|ChebNet|GATNet]step 4: Parallel computationThis step uses the trained models to optimize parallel computation. The results for Figures 18, 19 in the paper are generated from the output of this command.cd 3wofs/ParallelComputationsbt packeagespark-submit --master {host1,host2,host3} --class whu.edu.cn.core.cube.raster.WOfSOptimizedByDL path/to/package.jar path/to/outputStatement about Case 3The experiment Case 3 presented in this paper was conducted with improvements made on the GeoCube platform.Code Name: GeoCubeCode Link: GeoCube Source CodeLicense Information: The GeoCube project is openly available under the CC BY 4.0 license.The GeoCube project is licensed under CC BY 4.0, which is the Creative Commons Attribution 4.0 International License, allowing anyone to freely share, modify, and distribute the platform's code.Citation:Gao, Fan (2022). A multi-source spatio-temporal data cube for large-scale geospatial analysis. figshare. Software. https://doi.org/10.6084/m9.figshare.15032847.v1Clarification Statement:The authors of this code are not affiliated with this manuscript. The innovations and steps in Case 3, including data download, sample generation, and parallel computation optimization, were independently developed and are not dependent on the GeoCube’s code.RequirementsThe codes use the following dependencies with Python 3.8torch==2.0.0torch_geometric==2.5.3networkx==2.6.3pyshp==2.3.1tensorrt==8.6.1matplotlib==3.7.2scipy==1.10.1scikit-learn==1.3.0geopandas==0.13.2
Child Care and Development Fund (CCDF) Policies Database, United States,...
childandfamilydataarchive.org
ascii, delimited +5
Updated Nov 27, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Minton, Sarah; Dwyer, Kelly; Todd, Margaret; Kwon, Danielle (2023). Child Care and Development Fund (CCDF) Policies Database, United States, 2009-2022 [Dataset]. http://doi.org/10.3886/ICPSR38908.v1
Explore at:
excel, r, stata, ascii, sas, spss, delimitedAvailable download formats
Unique identifier
https://doi.org/10.3886/ICPSR38908.v1
Dataset updated
Nov 27, 2023
Dataset provided by
Inter-university Consortium for Political and Social Researchhttps://www.icpsr.umich.edu/web/pages/
Authors
Minton, Sarah; Dwyer, Kelly; Todd, Margaret; Kwon, Danielle
License
https://www.icpsr.umich.edu/web/ICPSR/studies/38908/termshttps://www.icpsr.umich.edu/web/ICPSR/studies/38908/terms
Time period covered
Jan 1, 2009 - Dec 31, 2022
Area covered
United States
Description
The Child Care and Development Fund (CCDF) provides federal money to states and territories to provide assistance to low-income families, to obtain quality child care so they can work, attend training, or receive education. Within the broad federal parameters, States and Territories set the detailed policies. Those details determine whether a particular family will or will not be eligible for subsidies, how much the family will have to pay for the care, how families apply for and retain subsidies, the maximum amounts that child care providers will be reimbursed, and the administrative procedures that providers must follow. Thus, while CCDF is a single program from the perspective of federal law, it is in practice a different program in every state and territory. The CCDF Policies Database project is a comprehensive, up-to-date database of CCDF policy information that supports the needs of a variety of audiences through (1) analytic data files, (2) a project website and search tool, and (3) an annual report (Book of Tables). These resources are made available to researchers, administrators, and policymakers with the goal of addressing important questions concerning the effects of child care subsidy policies and practices on the children and families served. A description of the data files, project website and search tool, and Book of Tables is provided below: 1. Detailed, longitudinal analytic data files provide CCDF policy information for all 50 states, the District of Columbia, and the United States territories and outlying areas that capture the policies actually in effect at a point in time, rather than proposals or legislation. They capture changes throughout each year, allowing users to access the policies in place at any point in time between October 2009 and the most recent data release. The data are organized into 32 categories with each category of variables separated into its own dataset. The categories span five general areas of policy including: Eligibility Requirements for Families and Children (Datasets 1-5) Family Application, Terms of Authorization, and Redetermination (Datasets 6-13) Family Payments (Datasets 14-18) Policies for Providers, Including Maximum Reimbursement Rates (Datasets 19-27) Overall Administrative and Quality Information Plans (Datasets 28-32) The information in the data files is based primarily on the documents that caseworkers use as they work with families and providers (often termed "caseworker manuals"). The caseworker manuals generally provide much more detailed information on eligibility, family payments, and provider-related policies than the CCDF Plans submitted by states and territories to the federal government. The caseworker manuals also provide ongoing detail for periods in between CCDF Plan dates. Each dataset contains a series of variables designed to capture the intricacies of the rules covered in the category. The variables include a mix of categorical, numeric, and text variables. Most variables have a corresponding notes field to capture additional details related to that particular variable. In addition, each category has an additional notes field to capture any information regarding the rules that is not already outlined in the category's variables. Beginning with the 2020 files, the analytic data files are supplemented by four additional data files containing select policy information featured in the annual reports (prior to 2020, the full detail of the annual reports was reproduced as data files). The supplemental data files are available as 4 datasets (Datasets 33-36) and present key aspects of the differences in CCDF-funded programs across all states and territories as of October 1 of each year (2009-2022). The files include variables that are calculated using several variables from the analytic data files (Datasets 1-32) (such as copayment amounts for example family situations) and information that is part of the annual project reports (the annual Book of Tables) but not stored in the full database (such as summary market rate survey information from the CCDF plans). 2. The project website and search tool provide access to a point-and-click user interface. Users can select from the full set of public data to create custom tables. The website also provides access to the full range of reports and products released under the CCDF Policies Data
e
Evaluating Website Quality - Dataset - B2FIND
b2find.eudat.eu
Updated Sep 10, 2013
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2013). Evaluating Website Quality - Dataset - B2FIND [Dataset]. https://b2find.eudat.eu/dataset/d80dbab9-b9dc-57dd-b3ed-eac9185f21fa
Explore at:
Dataset updated
Sep 10, 2013
Description
Dit bestand bevat de data die zijn verzameld in het kader van het proefschrift van Sanne Elling: ‘Evaluating website quality: Five studies on user-focused evaluation methods’.Summary:The benefits of evaluating websites among potential users are widely acknowledged. There are several methods that can be used to evaluate the websites’ quality from a users’ perspective. In current practice, many evaluations are executed with inadequate methods that lack research-based validation. This thesis aims to gain more insight into evaluation methodology and to contribute to a higher standard of website evaluation in practice. A first way to evaluate website quality is measuring the users’ opinions. This is often done with questionnaires, which gather opinions in a cheap, fast, and easy way. However, many questionnaires seem to miss a solid statistical basis and a justification of the choice of quality dimensions and questions. We therefore developed the ‘Website Evaluation Questionnaire’ (WEQ), which was specifically designed for the evaluation of governmental websites. In a study in online and laboratory settings the WEQ has proved to be a valid and reliable instrument. A way to gather more specific user opinions, is inviting participants to review website pages. Participants provide their comments by clicking on a feedback button, marking a problematic segment, and formulating their feedback.There has been debate about the extent to which users are able to provide relevant feedback. The results of our studies showed that participants were able to provide useful feedback. They signalled many relevant problems that indeed were experienced by users who needed to find information on the website. Website quality can also be measured during participants’ task performance. A frequently used method is the concurrent think-aloud method (CTA), which involves participants who verbalize their thoughts while performing tasks. There have been doubts on the usefulness and exhaustiveness of participants’ verbalizations. Therefore, we have combined CTA and eye tracking in order to examine the cognitive processes that participants do and do not verbalize. The results showed that the participants’ verbalizations provided substantial information in addition to the directly observable user problems. There was also a rather high percentage of silences (27%) during which interesting observations could be made about the users’ processes and obstacles. A thorough evaluation should therefore combine verbalizations and (eye tracking) observations. In a retrospective think-aloud (RTA) evaluation participants verbalize their thoughts afterwards while watching a recording of their performance. A problem with RTA is that participants not always remember the thoughts they had during their task performance. We therefore complemented the dynamic screen replay of their actions (pages visited and mouse movements) with a dynamic gaze replay of the participants’ eye movements.Contrary to our expectations, no differences were found between the two conditions. It is not possible to draw conclusions on the single best method. The value of a specific method is strongly influenced by the goals and context of an evaluation. Also, the outcomes of the evaluation not only depend on the method, but also on other choices during the evaluation, such as participant selection, tasks, and the subsequent analysis.
Z
Transparency in Keyword Faceted Search: a dataset of Google Shopping html...
data.niaid.nih.gov
zenodo.org
+1more
Updated Jan 24, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Cozza Vittoria; Hoang Van Tien; Petrocchi Marinella; De Nicola Rocco (2020). Transparency in Keyword Faceted Search: a dataset of Google Shopping html pages [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_1491556
Explore at:
Dataset updated
Jan 24, 2020
Dataset provided by
Department of Information Engineering, University of Padua, Italy
IIT Institute of Informatics and Telematics, National Research Council (CNR), Pisa, Italy
IMT School for Advanced Studies, Lucca, Italy
Authors
Cozza Vittoria; Hoang Van Tien; Petrocchi Marinella; De Nicola Rocco
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset contains a collection of around 2,000 HTML pages: these web pages contain the search results obtained in return to queries for different products, searched by a set of synthetic users surfing Google Shopping (US version) from different locations, in July, 2016.

Each file in the collection has a name where there is indicated the location from where the search has been done, the userID, and the searched product: no_email_LOCATION_USERID.PRODUCT.shopping_testing.#.html

The locations are Philippines (PHI), United States (US), India (IN). The userIDs: 26 to 30 for users searching from Philippines, 1 to 5 from US, 11 to 15 from India.

Products have been choice following 130 keywords (e.g., MP3 player, MP4 Watch, Personal organizer, Television, etc.).

In the following, we describe how the search results have been collected.

Each user has a fresh profile. The creation of a new profile corresponds to launch a new, isolated, web browser client instance and open the Google Shopping US web page.

To mimic real users, the synthetic users can browse, scroll pages, stay on a page, and click on links.

A fully-fledged web browser is used to get the correct desktop version of the website under investigation. This is because websites could be designed to behave according to user agents, as witnessed by the differences between the mobile and desktop versions of the same website.

The prices are the retail ones displayed by Google Shopping in US dollars (thus, excluding shipping fees).

Several frameworks have been proposed for interacting with web browsers and analysing results from search engines. This research adopts OpenWPM. OpenWPM is automatised with Selenium to efficiently create and manage different users with isolated Firefox and Chrome client instances, each of them with their own associated cookies.

The experiments run, on average, 24 hours. In each of them, the software runs on our local server, but the browser's traffic is redirected to the designated remote servers (i.e., to India), via tunneling in SOCKS proxies. This way, all commands are simultaneously distributed over all proxies. The experiments adopt the Mozilla Firefox browser (version 45.0) for the web browsing tasks and run under Ubuntu 14.04. Also, for each query, we consider the first page of results, counting 40 products. Among them, the focus of the experiments is mostly on the top 10 and top 3 results.

Due to connection errors, one of the Philippine profiles have no associated results. Also, for Philippines, a few keywords did not lead to any results: videocassette recorders, totes, umbrellas. Similarly, for US, no results were for totes and umbrellas.

The search results have been analyzed in order to check if there were evidence of price steering, based on users' location.

One term of usage applies:

In any research product whose findings are based on this dataset, please cite

@inproceedings{DBLP:conf/ircdl/CozzaHPN19, author = {Vittoria Cozza and Van Tien Hoang and Marinella Petrocchi and Rocco {De Nicola}}, title = {Transparency in Keyword Faceted Search: An Investigation on Google Shopping}, booktitle = {Digital Libraries: Supporting Open Science - 15th Italian Research Conference on Digital Libraries, {IRCDL} 2019, Pisa, Italy, January 31 - February 1, 2019, Proceedings}, pages = {29--43}, year = {2019}, crossref = {DBLP:conf/ircdl/2019}, url = {https://doi.org/10.1007/978-3-030-11226-4_3}, doi = {10.1007/978-3-030-11226-4_3}, timestamp = {Fri, 18 Jan 2019 23:22:50 +0100}, biburl = {https://dblp.org/rec/bib/conf/ircdl/CozzaHPN19}, bibsource = {dblp computer science bibliography, https://dblp.org} }
d
Louisville Metro KY - Contract Compliance and Certification System
catalog.data.gov
data.lojic.org
Updated Jul 30, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Louisville/Jefferson County Information Consortium (2025). Louisville Metro KY - Contract Compliance and Certification System [Dataset]. https://catalog.data.gov/dataset/louisville-metro-ky-contract-compliance-and-certification-system
Explore at:
Dataset updated
Jul 30, 2025
Dataset provided by
Louisville/Jefferson County Information Consortium
Area covered
Louisville, Kentucky
Description
On the Contract Compliance and Information System website available on: https://louisvilleky.diversitycompliance.com/?TN=louisvilleky , users can download and consume various types of data related to supplier diversity and prequalification processes. This includes information on how businesses can become prequalified to bid on contracts, details about current bid opportunities, and resources for minority, women-owned, and disadvantaged business enterprises. The site also provides access to compliance documents and guidelines to help businesses understand and meet the requirements for doing business with the Louisville Metro Government. Additionally, users can find tools and support for navigating the prequalification and bidding processes.To download data from the Compliance website, you can follow these steps:Use the Search Option: Each item has a search option that allows you to find specific datasets by entering keywords or using filters.Select the Desired Dataset: Once you find the dataset you need, click on it to view more details.Download the Data: You can download the data in various formats, including CSV and Excel. Look for the download button or link, and choose your preferred format.If you have any further questions about the data or the website, please contact the Human Relations Commission directly at: https://louisvilleky.wufoo.com/forms/contact-human-relations-commission/
NSF Public Access Repository
catalog.data.gov
Updated Sep 19, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
National Science Foundation (2021). NSF Public Access Repository [Dataset]. https://catalog.data.gov/dataset/nsf-public-access-repository
Explore at:
Dataset updated
Sep 19, 2021
Dataset provided by
National Science Foundationhttp://www.nsf.gov/
Description
The NSF Public Access Repository contains an initial collection of journal publications and the final accepted version of the peer-reviewed manuscript or the version of record. To do this, NSF draws upon services provided by the publisher community including the Clearinghouse of Open Research for the United States, CrossRef, and International Standard Serial Number. When clicking on a Digital Object Identifier number, you will be taken to an external site maintained by the publisher. Some full text articles may not be available without a charge during the embargo, or administrative interval. Some links on this page may take you to non-federal websites. Their policies may differ from this website.
i
Building a DGA Classifier: Part 1, Data Preparation
impactcybertrust.org
search.datacite.org
Updated Jan 28, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
External Data Source (2019). Building a DGA Classifier: Part 1, Data Preparation [Dataset]. http://doi.org/10.23721/100/1478811
Explore at:
Unique identifier
https://doi.org/10.23721/100/1478811
Dataset updated
Jan 28, 2019
Authors
External Data Source
Description
The purpose of building aDGAclassifier isn't specifically for takedowns of botnets, but to discover and detect the use on our network or services. If we can you have a list of domains resolved and accessed at your organization, it is possible now to see which of those are potentially generated and used bymalware.

The dataset consists of three sources (as decribed in the Data-Driven Security blog):

Alexa: For samples of legitimate domains, an obvious choice is to go to the Alexa list of top web sites. But it's not ready for our use as is. If you grab thetop 1 Million Alexa domainsand parse it, you'll find just over 11 thousand are full URLs and not just domains, and there are thousands of domains with subdomains that don't help us (we are only classifying on domains here). So after I remove the URLs, de-duplicated the domains and clean it up, I end up with the Alexa top965,843.

"Real World" Data fromOpenDNS: After reading the post from Frank Denis at OpenDNS titled"Why Using Real World Data Matters For Building Effective Security Models", I grabbed their10,000 Top Domainsand their10,000 Random samples. If we compare that to the top Alexa domains, 6,901 of the top ten thousand are in the alexa data and 893 of the random domains are in the Alexa data. I will clean that up as I make the final training dataset.

DGAdo: The Click Security version wasn't very clear in where they got their bad domains so I decided to collect my own and this was rather fun. Because I work with some interesting characters (who know interesting characters), I was able to collect several data sets from recent botnets: "Cryptolocker", two seperate "Game-Over Zues" algorithms, and an anonymous collection of malicious (and algorithmically generated) domains. In the end, I was able to collect 73,598 algorithmically generateddomains.
;
FOI-02148 - Datasets - Open Data Portal
opendata.nhsbsa.net
Updated Sep 9, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
nhsbsa.net (2024). FOI-02148 - Datasets - Open Data Portal [Dataset]. https://opendata.nhsbsa.net/dataset/foi-02148
Explore at:
Dataset updated
Sep 9, 2024
Dataset provided by
NHS Business Services Authority
Description
Please click the below web link to see the exemption in full. www.legislation.gov.uk/ukpga/2000/36/section/40 _Breach of confidentiality _ Please note that the identification of individuals is also a breach of the common law duty of confidence. An individual who has been identified could make a claim against the NHSBSA for the disclosure of the confidential information. The information requested is therefore being withheld as it falls under the exemption in section 41(1) ‘Information provided in confidence’ of the Freedom of Information Act. Please click the below web link to see the exemption in full. www.legislation.gov.uk/ukpga/2000/36/section/41 Question 1 14,844 claims have been received by the Vaccine Damage Payment Scheme (VDPS). Question 2 7,028 claimants have been notified of an outcome. Question 3
n
FOI-02163 - Datasets - Open Data Portal
opendata.nhsbsa.net
Updated Sep 16, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). FOI-02163 - Datasets - Open Data Portal [Dataset]. https://opendata.nhsbsa.net/dataset/foi-02163
Explore at:
Dataset updated
Sep 16, 2024
Description
a - it is not fair to disclose individual’s personal details to the world and is likely to cause damage or distress. b - these details are not of sufficient interest to the public to warrant an intrusion into the privacy of the individual. Please click the below web link to see the exemption in full. www.legislation.gov.uk/ukpga/2000/36/section/40 Breach of confidentiality Please note that the identification of individuals is also a breach of the common law duty of confidence. An individual who has been identified could make a claim against the NHSBSA for the disclosure of the confidential information. The information requested is therefore being withheld as it falls under the exemption in section 41(1) ‘Information provided in confidence’ of the Freedom of Information Act. Please click the below web link to see the exemption in full. www.legislation.gov.uk/ukpga/2000/36/section/41
Watershed Boundary Dataset HUC 6s
resilience-fema.hub.arcgis.com
resilience.climate.gov
+3more
Updated Sep 6, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Esri (2023). Watershed Boundary Dataset HUC 6s [Dataset]. https://resilience-fema.hub.arcgis.com/datasets/esri::watershed-boundary-dataset-huc-6s
Explore at:
Dataset updated
Sep 6, 2023
Dataset authored and provided by
Esrihttp://esri.com/
Area covered
Description
Each drainage area is considered a Hydrologic Unit (HU) and is given a Hydrologic Unit Code (HUC) which serves as the unique identifier for the area. HUC 2s, 6s, 8s, 10s, & 12s, define the drainage Regions, Subregions, Basins, Subbasins, Watersheds and Subwatersheds, respectively, across the United States. Their boundaries are defined by hydrologic and topographic criteria that delineate an area of land upstream from a specific point on a river and are determined solely upon science based hydrologic principles, not favoring any administrative boundaries, special projects, or a particular program or agency. The Watershed Boundary Dataset is delineated and georeferenced to the USGS 1:24,000 scale topographic basemap.Hydrologic Units are delineated to nest in a multi-level, hierarchical drainage system with corresponding HUCs, so that as you move from small scale to large scale the HUC digits increase in increments of two. For example, the very largest HUCs have 2 digits, and thus are referred to as HUC 2s, and the very smallest HUCs have 12 digits, and thus are referred to as HUC 12s.Dataset SummaryPhenomenon Mapped: Watersheds in the United States, as delineated by the Watershed Boundary Dataset (WBD)Geographic Extent: Contiguous United States, Alaska, Hawaii, Puerto Rico, Guam, US Virgin Islands, Northern Marianas Islands and American SamoaProjection: Web MercatorUpdate Frequency: AnnualVisible Scale: Visible at all scales, however USGS recommends this dataset should not be used for scales of 1:24,000 or larger.Source: United States Geological Survey (WBD)Data Vintage: January 7, 2025What can you do with this layer?This layer is suitable for both visualization and analysis acrossthe ArcGIS system. This layer can be combined with your data and other layers from the ArcGIS Living Atlas of the World in ArcGIS Online and ArcGIS Pro to create powerful web maps that can be used alone or in a story map or other application. Because this layer is part of the ArcGIS Living Atlas of the World it is easy to add to your map:In ArcGIS Online, you can add this layer to a map by selecting Add then Browse Living Atlas Layers. A window will open. Type "Watershed Boundary Dataset" in the search box and browse to the layer. Select the layer then click Add to Map. In ArcGIS Pro, open a map and select Add Data from the Map Tab. Select Data at the top of the drop down menu. The Add Data dialog box will open on the left side of the box, expand Portal if necessary, then select Living Atlas. Type "Watershed Boundary Dataset" in the search box, browse to the layer then click OK.Questions?Please leave a comment below if you have a question about this layer, and we will get back to you as soon as possible.
Data from: ERA5 hourly data on single levels from 1940 to present
cds.climate.copernicus.eu
search-sandbox-2.test.dataone.org
+1more
grib
Updated Oct 18, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
ECMWF (2025). ERA5 hourly data on single levels from 1940 to present [Dataset]. http://doi.org/10.24381/cds.adbb2d47
Explore at:
gribAvailable download formats
Unique identifier
https://doi.org/10.24381/cds.adbb2d47
Dataset updated
Oct 18, 2025
Dataset provided by
European Centre for Medium-Range Weather Forecastshttp://ecmwf.int/
Authors
ECMWF
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
Jan 1, 1940 - Oct 12, 2025
Description
ERA5 is the fifth generation ECMWF reanalysis for the global climate and weather for the past 8 decades. Data is available from 1940 onwards. ERA5 replaces the ERA-Interim reanalysis. Reanalysis combines model data with observations from across the world into a globally complete and consistent dataset using the laws of physics. This principle, called data assimilation, is based on the method used by numerical weather prediction centres, where every so many hours (12 hours at ECMWF) a previous forecast is combined with newly available observations in an optimal way to produce a new best estimate of the state of the atmosphere, called analysis, from which an updated, improved forecast is issued. Reanalysis works in the same way, but at reduced resolution to allow for the provision of a dataset spanning back several decades. Reanalysis does not have the constraint of issuing timely forecasts, so there is more time to collect observations, and when going further back in time, to allow for the ingestion of improved versions of the original observations, which all benefit the quality of the reanalysis product. ERA5 provides hourly estimates for a large number of atmospheric, ocean-wave and land-surface quantities. An uncertainty estimate is sampled by an underlying 10-member ensemble at three-hourly intervals. Ensemble mean and spread have been pre-computed for convenience. Such uncertainty estimates are closely related to the information content of the available observing system which has evolved considerably over time. They also indicate flow-dependent sensitive areas. To facilitate many climate applications, monthly-mean averages have been pre-calculated too, though monthly means are not available for the ensemble mean and spread. ERA5 is updated daily with a latency of about 5 days. In case that serious flaws are detected in this early release (called ERA5T), this data could be different from the final release 2 to 3 months later. In case that this occurs users are notified. The data set presented here is a regridded subset of the full ERA5 data set on native resolution. It is online on spinning disk, which should ensure fast and easy access. It should satisfy the requirements for most common applications. An overview of all ERA5 datasets can be found in this article. Information on access to ERA5 data on native resolution is provided in these guidelines. Data has been regridded to a regular lat-lon grid of 0.25 degrees for the reanalysis and 0.5 degrees for the uncertainty estimate (0.5 and 1 degree respectively for ocean waves). There are four main sub sets: hourly and monthly products, both on pressure levels (upper air fields) and single levels (atmospheric, ocean-wave and land surface quantities). The present entry is "ERA5 hourly data on single levels from 1940 to present".
h
Bittensor_Whitepaper_Webscrape_Example
huggingface.co
Updated Jun 18, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Gopher AI (2025). Bittensor_Whitepaper_Webscrape_Example [Dataset]. https://huggingface.co/datasets/Gopher-Lab/Bittensor_Whitepaper_Webscrape_Example
Explore at:
Dataset updated
Jun 18, 2025
Dataset authored and provided by
Gopher AI
Description
🌐 Web Scraper: Turn Any URL into AI-Ready Data

Convert any public web page into clean, structured JSON in one click. Just paste a URL and this tool scrapes, cleans, and formats the content—ready to be used in any AI or content pipeline. Whether you're building datasets for LLMs or feeding fresh content into agents, this no-code tool makes it effortless to extract high-quality data from the web.

✨ Key Features

⚡ Scrape Any Public Page – Works on blogs, websites, docs… See the full description on the dataset page: https://huggingface.co/datasets/Gopher-Lab/Bittensor_Whitepaper_Webscrape_Example.

Facebook

Twitter

Click to copy link

Link copied

Cite

Red Stag Fulfillment (2025). Amazon Daily Traffic Statistics 2025 [Dataset]. https://redstagfulfillment.com/how-many-daily-visits-does-amazon-receive/

Amazon Daily Traffic Statistics 2025

Explore at:

htmlAvailable download formats

Dataset updated

May 19, 2025

Dataset authored and provided by

Red Stag Fulfillment

Time period covered

2019 - 2025

Area covered

Global

Variables measured

Daily website visits, Monthly traffic volume, Geographic distribution, Seasonal traffic patterns, Traffic sources breakdown, Mobile vs desktop traffic split

Description

Comprehensive dataset analyzing Amazon's daily website visits, traffic patterns, seasonal trends, and comparative analysis with other ecommerce platforms based on May 2025 data.

Clear search

Close search

Google apps

Main menu

Amazon Daily Traffic Statistics 2025

Job Offers Web Scraping Search

Job Offers Web Scraping Search

Targeted Results to Find the Optimal Work Solution

About this dataset

More Datasets

Featured Notebooks

How to use the dataset

Research Ideas

Acknowledgements

License

Columns

Acknowledgements

MCH Data Connect

Data from: Google Analytics & Twitter dataset from a movies, TV series and...

Agency Voter Registration Activity

OGD Portal: Daily usage by record (since January 2024) | gimi9.com

Watershed Boundary Dataset HUC 8s

corpus-dataset-normalized-for-persian-and-english

The codes and data for "A Graph Convolutional Neural Network-based Method...

Child Care and Development Fund (CCDF) Policies Database, United States,...

Evaluating Website Quality - Dataset - B2FIND

Transparency in Keyword Faceted Search: a dataset of Google Shopping html...

Louisville Metro KY - Contract Compliance and Certification System

NSF Public Access Repository

Building a DGA Classifier: Part 1, Data Preparation

FOI-02148 - Datasets - Open Data Portal

FOI-02163 - Datasets - Open Data Portal

Watershed Boundary Dataset HUC 6s

Data from: ERA5 hourly data on single levels from 1940 to present

Bittensor_Whitepaper_Webscrape_Example

Amazon Daily Traffic Statistics 2025