12 datasets found

w
Websites using Same But Different
webtechsurvey.com
csv
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
WebTechSurvey, Websites using Same But Different [Dataset]. https://webtechsurvey.com/technology/same-but-different
Explore at:
csvAvailable download formats
Dataset authored and provided by
WebTechSurvey
License
https://webtechsurvey.com/termshttps://webtechsurvey.com/terms
Time period covered
2025
Area covered
Global
Description
A complete list of live websites using the Same But Different technology, compiled through global website indexing conducted by WebTechSurvey.
Multilingual Scraper of Privacy Policies and Terms of Service
zenodo.org
bin, zip
Updated Apr 24, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
David Bernhard; David Bernhard; Luka Nenadic; Luka Nenadic; Stefan Bechtold; Karel Kubicek; Karel Kubicek; Stefan Bechtold (2025). Multilingual Scraper of Privacy Policies and Terms of Service [Dataset]. http://doi.org/10.5281/zenodo.14562039
Explore at:
zip, binAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.14562039
Dataset updated
Apr 24, 2025
Dataset provided by
Zenodohttp://zenodo.org/
Authors
David Bernhard; David Bernhard; Luka Nenadic; Luka Nenadic; Stefan Bechtold; Karel Kubicek; Karel Kubicek; Stefan Bechtold
License
Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
License information was derived automatically
Description
Multilingual Scraper of Privacy Policies and Terms of Service: Scraped Documents of 2024

This dataset supplements publication "Multilingual Scraper of Privacy Policies and Terms of Service" at ACM CSLAW’25, March 25–27, 2025, München, Germany. It includes the first 12 months of scraped policies and terms from about 800k websites, see concrete numbers below.

The following table lists the amount of websites visited per month:

Month Number of websites
2024-01 551'148
2024-02 792'921
2024-03 844'537
2024-04 802'169
2024-05 805'878
2024-06 809'518
2024-07 811'418
2024-08 813'534
2024-09 814'321
2024-10 817'586
2024-11 828'662
2024-12 827'101

The amount of websites visited should always be higher than the number of jobs (Table 1 of the paper) as a website may redirect, resulting in two websites scraped or it has to be retried.

To simplify the access, we release the data in large CSVs. Namely, there is one file for policies and another for terms per month. All of these files contain all metadata that are usable for the analysis. If your favourite CSV parser reports the same numbers as above then our dataset is correctly parsed. We use ‘,’ as a separator, the first row is the heading and strings are in quotes.

Since our scraper sometimes collects other documents than policies and terms (for how often this happens, see the evaluation in Sec. 4 of the publication) that might contain personal data such as addresses of authors of websites that they maintain only for a selected audience. We therefore decided to reduce the risks for websites by anonymizing the data using Presidio. Presidio substitutes personal data with tokens. If your personal data has not been effectively anonymized from the database and you wish for it to be deleted, please contact us.

Preliminaries

The uncompressed dataset is about 125 GB in size, so you will need sufficient storage. This also means that you likely cannot process all the data at once in your memory, so we split the data in months and in files for policies and terms.

Files and structure

The files have the following names:

2024_policy.csv for policies

2024_terms.csv for terms

Shared metadata

Both files contain the following metadata columns:

website_month_id - identification of crawled website

job_id - one website can have multiple jobs in case of redirects (but most commonly has only one)

website_index_status - network state of loading the index page. This is resolved by the Chromed DevTools Protocol.

DNS_ERROR - domain cannot be resolved

OK - all fine

REDIRECT - domain redirect to somewhere else

TIMEOUT - the request timed out

BAD_CONTENT_TYPE - 415 Unsupported Media Type

HTTP_ERROR - 404 error

TCP_ERROR - error in the network connection

UNKNOWN_ERROR - unknown error

website_lang - language of index page detected based on langdetect library

website_url - the URL of the website sampled from the CrUX list (may contain subdomains, etc). Use this as a unique identifier for connecting data between months.

job_domain_status - indicates the status of loading the index page. Can be:

OK - all works well (at the moment, should be all entries)

BLACKLISTED - URL is on our list of blocked URLs

UNSAFE - website is not safe according to save browsing API by Google

LOCATION_BLOCKED - country is in the list of blocked countries

job_started_at - when the visit of the website was started

job_ended_at - when the visit of the website was ended

job_crux_popularity - JSON with all popularity ranks of the website this month

job_index_redirect - when we detect that the domain redirects us, we stop the crawl and create a new job with the target URL. This saves time if many websites redirect to one target, as it will be crawled only once. The index_redirect is then the job.id corresponding to the redirect target.

job_num_starts - amount of crawlers that started this job (counts restarts in case of unsuccessful crawl, max is 3)

job_from_static - whether this job was included in the static selection (see Sec. 3.3 of the paper)

job_from_dynamic - whether this job was included in the dynamic selection (see Sec. 3.3 of the paper) - this is not exclusive with from_static - both can be true when the lists overlap.

job_crawl_name - our name of the crawl, contains year and month (e.g., 'regular-2024-12' for regular crawls, in Dec 2024)

Policy data

policy_url_id - ID of the URL this policy has

policy_keyword_score - score (higher is better) according to the crawler's keywords list that given document is a policy

policy_ml_probability - probability assigned by the BERT model that given document is a policy

policy_consideration_basis - on which basis we decided that this url is policy. The following three options are executed by the crawler in this order:

'keyword matching' - this policy was found using the crawler navigation (which is based on keywords)

'search' - this policy was found using search engine

'path guessing' - this policy was found by using well-known URLs like example.com/policy

policy_url - full URL to the policy

policy_content_hash - used as identifier - if the document remained the same between crawls, it won't create a new entry

policy_content - contains the text of policies and terms extracted to Markdown using Mozilla's readability library

policy_lang - Language detected by fasttext of the content

Terms data

Analogous to policy data, just substitute policy to terms.

Updates

Check this Google Docs for an updated version of this README.md.
h
all-portable-apps-and-ai-in-one-url
huggingface.co
Updated Feb 12, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Derur (2025). all-portable-apps-and-ai-in-one-url [Dataset]. https://huggingface.co/datasets/Derur/all-portable-apps-and-ai-in-one-url
Explore at:
Dataset updated
Feb 12, 2025
Authors
Derur
Description
Saving you time and space on HHD! Экономлю ваше время и место на диске!
"-cl" = clear (no models / other languages) / очишенное (без моделей / лишних языков) Моя личная подборка портативных приложений и ИИ!Перепаковывал и уменьшал размер архивов лично я!Поддержите меня: Boosty или Donationalerts
My personal selection of portable apps and AI's!I personally repacked and reduced the size of the archives!Support me: Boosty or Donationalerts

Files authors / Авторы… See the full description on the dataset page: https://huggingface.co/datasets/Derur/all-portable-apps-and-ai-in-one-url.
Fraudulent Bank Websites, Phishing E-mails and Similar Scams | DATA.GOV.HK
data.gov.hk
Updated Oct 26, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
data.gov.hk (2018). Fraudulent Bank Websites, Phishing E-mails and Similar Scams | DATA.GOV.HK [Dataset]. https://data.gov.hk/en-data/dataset/hk-hkma-banksvf-fraudulent-bank-scams
Explore at:
Dataset updated
Oct 26, 2018
Dataset provided by
data.gov.hk
Description
This API is providing the information of press releases issued by the authorized institutions and other similar press releases issued by the HKMA in the past regarding fraudulent bank websites, phishing E-mails and similar scams information.
Evaluating Web Table Annotation Methods: From Entity Lookups to Entity...
springernature.figshare.com
application/gzip
Updated May 30, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Vasilis Efthymiou; Oktie Hassanzadeh; Mariano Rodríguez-Muro; Vassilis Christophides (2023). Evaluating Web Table Annotation Methods: From Entity Lookups to Entity Embeddings [Dataset]. http://doi.org/10.6084/m9.figshare.5229847.v1
Explore at:
application/gzipAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.5229847.v1
Dataset updated
May 30, 2023
Dataset provided by
Figsharehttp://figshare.com/
Authors
Vasilis Efthymiou; Oktie Hassanzadeh; Mariano Rodríguez-Muro; Vassilis Christophides
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Data sets used for experimental evaluation in the related publication: Evaluating Web Table Annotation Methods: From Entity Lookups to Entity EmbeddingsThe data sets are contained within archive folders corresponding the three gold standard data sets used in the related publication. Each is presented in both .csv and .json formats.The gold standard data sets are collections of web tables:T2D consists of a schema-level gold standard of 1,748Web tables, manually annotated with class- and property-mappings, as well as an entity-level gold standard of 233 Web tables. Limaye consists of 400 manually annotated Web tables with entity-, class-, and property-level correspondences, where single cells (not rows) are mapped to entities. The corrected version of this gold standard is adapted to annotate rows with entities, from the annotations of the label column cells. WikipediaGS is an instance-level gold standard developed from 485K Wikipedia tables, in which links in the label column are used to infer the annotation of a row to a DBpedia entity. Data formatCSV:The .csv files are formatted as double quoted (' " ') fields, separated by commas (',').In the tables files, each file corresponds to one table, each field represents a column, and each line represents a different row.In the entities files, there are only three fields:"DBpedia uri","cell string","row number"representing the correct annotation, the string of the label column cell, and the row (starting from 0) in which this mapping is found, respectively.Tables and entities files that correspond to the same table have the same filename.The same formatting and naming convention is used in T2D gold standard (http://webdatacommons.org/webtables/goldstandard.html).JSON:Each line in a .json file corresponds to a table, written as a JSONObject. T2D and Limaye tables files contain only one line (table) per file, while the Wikipedia gold standard contains multiple lines (tables) per .json file. In T2D and Limaye, the entity mappings of those tables can be found in the entities files with the same filename, while in Wikipedia, the entity mappings of each table can be found the line of the entities files having the "tableId" field as the one of the corresponding table.The contents of a table in .json are given as a two-dimensional array (a JSONArray of JSONArray s), called "contents". Each JSONArray in the contents represents a table row. Each element of this array is a JSONObject, representing one cell of the row. The field "data" of each cell contains the cell string contents, while there may also be a field "isHeader" to denote of the current cell is in a header row. In the Wikipedia gold standard there may also be a "wikiPageId" field, denoting the existing hyperlink of this cell to a Wikipedia page. It only contains the suffix of a Wikipedia URL, skipping the first part "https://en.wikipedia.org/wiki/".The entity mappings files are in the same format as in csv:["DBpedia uri","cell string",row number] inside the "mappings" field of a json file. Note on license: please refer to the README.txt. Data is derived from Wikipedia and other sources may have different licenses.

Wikipedia contents can be shared under the terms of Creative Commons Attribution-ShareAlike License as outlined on Wikipedia: https://en.wikipedia.org/wiki/Wikipedia:Reusing_Wikipedia_content

The correspondences of the T2D Gold standard is provided under the terms of the Apache license. The Web tables are provided according the same terms of use, disclaimer of warranties and limitation of liabilities that apply to the Common Crawl corpus. The DBpedia subset is licensed under the terms of the Creative Commons Attribution-ShareAlike License and the GNU Free Documentation License that applies to DBpedia. Limaye gold standard is downloaded from: http://websail-fe.cs.northwestern.edu/TabEL/ (download date: August 25, 2016). Please refer to the original website and the following paper for more details and citation information: G. Limaye, S. Sarawagi, and S. Chakrabarti. Annotating and Searching Web Tables Using Entities, Types and Relationships. PVLDB, 3(1):1338â€“1347, 2010.Also: THIS DATA IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
c
Google Analytics www cityofrochester gov
data.cityofrochester.gov
Updated Dec 11, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Open_Data_Admin (2021). Google Analytics www cityofrochester gov [Dataset]. https://data.cityofrochester.gov/datasets/google-analytics-www-cityofrochester-gov/about
Explore at:
Dataset updated
Dec 11, 2021
Dataset authored and provided by
Open_Data_Admin
Description
Data dictionary: Page_Title: Title of webpage used for pages of the website www.cityofrochester.gov Pageviews: Total number of pages viewed over the course of the calendar year listed in the year column. Repeated views of a single page are counted. Unique_Pageviews: Unique Pageviews - The number of sessions during which a specified page was viewed at least once. A unique pageview is counted for each URL and page title combination. Avg_Time: Average amount of time users spent looking at a specified page or screen. Entrances: The number of times visitors entered the website through a specified page.Bounce_Rate: " A bounce is a single-page session on your site. In Google Analytics, a bounce is calculated specifically as a session that triggers only a single request to the Google Analytics server, such as when a user opens a single page on your site and then exits without triggering any other requests to the Google Analytics server during that session. Bounce rate is single-page sessions on a page divided by all sessions that started with that page, or the percentage of all sessions on your site in which users viewed only a single page and triggered only a single request to the Google Analytics server. These single-page sessions have a session duration of 0 seconds since there are no subsequent hits after the first one that would let Google Analytics calculate the length of the session. "Exit_Rate: The number of exits from a page divided by the number of pageviews for the page. This is inclusive of sessions that started on different pages, as well as “bounce” sessions that start and end on the same page. For all pageviews to the page, Exit Rate is the percentage that were the last in the session. Year: Calendar year over which the data was collected. Data reflects the counts for each metric from January 1st through December 31st.
g
Development Economics Data Group - Terms of Use Score | gimi9.com
gimi9.com
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Development Economics Data Group - Terms of Use Score | gimi9.com [Dataset]. https://gimi9.com/dataset/worldbank_wb_spi_d2_2_terms_of_use/
Explore at:
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
Terms of Use Score from Open Data Watch (ODW). Openness element 5 measures whether data are available with an open terms of use. Generally, terms of use (TOU) will apply to an entire website or data portal (unless otherwise specified). In these cases, all data found on the same website and/or portal will receive the same score. If a portal is located on the same domain as the NSO website, the terms of use on the NSO site will apply. If the data are located on a portal or website on a different domain, another terms of use will need to be present. For a policy/ license to be accepted as a terms of use, it must clearly refer to the data found on the website. Terms of use that refer to nondata content (such as pictures, logos, etc.) of the website are not considered. A copyright symbol at the bottom of the page is not sufficient. A sentence indicating a recommended citation format is not sufficient. Terms of use are classified the following ways: (1) Not Available, (2) Restrictive, (3) Semi-Restrictive, and (4) Open. If the TOU contains one or more restrictive clauses, it receives 0 points and is classified as “restrictive.”
Z
Transparency in Keyword Faceted Search: a dataset of Google Shopping html...
data.niaid.nih.gov
zenodo.org
Updated Jan 24, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
De Nicola Rocco (2020). Transparency in Keyword Faceted Search: a dataset of Google Shopping html pages [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_1491556
Explore at:
Dataset updated
Jan 24, 2020
Dataset provided by
Cozza Vittoria
Petrocchi Marinella
Hoang Van Tien
De Nicola Rocco
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset contains a collection of around 2,000 HTML pages: these web pages contain the search results obtained in return to queries for different products, searched by a set of synthetic users surfing Google Shopping (US version) from different locations, in July, 2016.

Each file in the collection has a name where there is indicated the location from where the search has been done, the userID, and the searched product: no_email_LOCATION_USERID.PRODUCT.shopping_testing.#.html

The locations are Philippines (PHI), United States (US), India (IN). The userIDs: 26 to 30 for users searching from Philippines, 1 to 5 from US, 11 to 15 from India.

Products have been choice following 130 keywords (e.g., MP3 player, MP4 Watch, Personal organizer, Television, etc.).

In the following, we describe how the search results have been collected.

Each user has a fresh profile. The creation of a new profile corresponds to launch a new, isolated, web browser client instance and open the Google Shopping US web page.

To mimic real users, the synthetic users can browse, scroll pages, stay on a page, and click on links.

A fully-fledged web browser is used to get the correct desktop version of the website under investigation. This is because websites could be designed to behave according to user agents, as witnessed by the differences between the mobile and desktop versions of the same website.

The prices are the retail ones displayed by Google Shopping in US dollars (thus, excluding shipping fees).

Several frameworks have been proposed for interacting with web browsers and analysing results from search engines. This research adopts OpenWPM. OpenWPM is automatised with Selenium to efficiently create and manage different users with isolated Firefox and Chrome client instances, each of them with their own associated cookies.

The experiments run, on average, 24 hours. In each of them, the software runs on our local server, but the browser's traffic is redirected to the designated remote servers (i.e., to India), via tunneling in SOCKS proxies. This way, all commands are simultaneously distributed over all proxies. The experiments adopt the Mozilla Firefox browser (version 45.0) for the web browsing tasks and run under Ubuntu 14.04. Also, for each query, we consider the first page of results, counting 40 products. Among them, the focus of the experiments is mostly on the top 10 and top 3 results.

Due to connection errors, one of the Philippine profiles have no associated results. Also, for Philippines, a few keywords did not lead to any results: videocassette recorders, totes, umbrellas. Similarly, for US, no results were for totes and umbrellas.

The search results have been analyzed in order to check if there were evidence of price steering, based on users' location.

One term of usage applies:

In any research product whose findings are based on this dataset, please cite

@inproceedings{DBLP:conf/ircdl/CozzaHPN19, author = {Vittoria Cozza and Van Tien Hoang and Marinella Petrocchi and Rocco {De Nicola}}, title = {Transparency in Keyword Faceted Search: An Investigation on Google Shopping}, booktitle = {Digital Libraries: Supporting Open Science - 15th Italian Research Conference on Digital Libraries, {IRCDL} 2019, Pisa, Italy, January 31 - February 1, 2019, Proceedings}, pages = {29--43}, year = {2019}, crossref = {DBLP:conf/ircdl/2019}, url = {https://doi.org/10.1007/978-3-030-11226-4_3}, doi = {10.1007/978-3-030-11226-4_3}, timestamp = {Fri, 18 Jan 2019 23:22:50 +0100}, biburl = {https://dblp.org/rec/bib/conf/ircdl/CozzaHPN19}, bibsource = {dblp computer science bibliography, https://dblp.org} }
Web Based Construction Management Software Market Report | Global Forecast...
dataintelo.com
csv, pdf, pptx
Updated Jan 7, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dataintelo (2025). Web Based Construction Management Software Market Report | Global Forecast From 2025 To 2033 [Dataset]. https://dataintelo.com/report/global-web-based-construction-management-software-market
Explore at:
pdf, csv, pptxAvailable download formats
Dataset updated
Jan 7, 2025
Dataset authored and provided by
Dataintelo
License
https://dataintelo.com/privacy-and-policyhttps://dataintelo.com/privacy-and-policy
Time period covered
2024 - 2032
Area covered
Global
Description
Web Based Construction Management Software Market Outlook

As of 2023, the global market size for Web Based Construction Management Software is valued at approximately USD 5.2 billion, with a projected reach of USD 12.4 billion by 2032, growing at a CAGR of 10% during the forecast period. This growth is primarily driven by the increasing need for real-time project tracking, efficient resource management, and the rising adoption of digital solutions in the construction industry.

The growth factors fueling the Web Based Construction Management Software market are multifaceted. Firstly, technological advancements and the integration of artificial intelligence (AI) and machine learning (ML) in construction management software have significantly enhanced predictive analytics and automation capabilities. These technologies enable construction managers to make informed decisions, optimize resource allocation, and predict potential project delays. Consequently, the demand for advanced software solutions that can streamline complex construction processes is on the rise.

Additionally, the growing emphasis on sustainable construction practices is propelling the adoption of web-based construction management software. The software aids in better planning, tracking, and reporting, ensuring that construction projects meet environmental regulations and sustainability goals. With governments and organizations globally pushing for greener construction methods, the market for such software is anticipated to expand rapidly. The ability to monitor and reduce waste, manage energy consumption, and ensure compliance with sustainability standards is becoming increasingly crucial.

Moreover, the rising trend of remote work and decentralized project teams has accelerated the need for web-based solutions that provide seamless access to project data from any location. This shift has been further amplified by the COVID-19 pandemic, which highlighted the necessity for digital collaboration tools. Web-based construction management software facilitates real-time collaboration among stakeholders, ensuring that all team members are on the same page despite geographical barriers. The scalability and flexibility offered by cloud-based solutions make them particularly attractive for construction companies of all sizes.

Construction Scheduling Software plays a pivotal role in enhancing the efficiency of web-based construction management systems. By providing tools for detailed project timelines, resource allocation, and task prioritization, this software ensures that all project phases are meticulously planned and executed. The integration of scheduling software with other construction management tools allows for seamless updates and adjustments, accommodating any unforeseen changes in project scope or timelines. This adaptability is crucial in the fast-paced construction environment, where delays can lead to significant cost overruns. As the industry moves towards more complex and larger-scale projects, the demand for robust scheduling solutions is expected to grow, reinforcing the importance of integrating such software into existing management systems.

From a regional perspective, North America is expected to maintain a significant share of the market due to the early adoption of technology and the presence of key market players. However, the Asia Pacific region is projected to witness the highest growth rate, driven by rapid urbanization, infrastructural development, and increasing investments in smart city projects. Developing economies in this region are recognizing the benefits of adopting digital construction management solutions to enhance efficiency and competitiveness in the construction sector.

Component Analysis

The Web Based Construction Management Software market can be segmented by components into Software and Services. The software segment includes various applications and tools designed to manage different aspects of construction projects. This segment is expected to dominate the market due to the increasing demand for comprehensive project management solutions. These software solutions offer functionalities such as scheduling, budgeting, resource allocation, and document management, which are essential for efficient project execution.

On the other hand, the services segment comprises consulting, training, support, and maintenance services provided by vendors to ensure the effective implementation and use of constru
Total global visitor traffic to Wikipedia.org 2024
statista.com
ai-chatbox.pro
Updated Nov 11, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2024). Total global visitor traffic to Wikipedia.org 2024 [Dataset]. https://www.statista.com/statistics/1259907/wikipedia-website-traffic/
Explore at:
Dataset updated
Nov 11, 2024
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
Oct 2023 - Mar 2024
Area covered
Worldwide
Description
In March 2024, close to 4.4 billion unique global visitors had visited Wikipedia.org, slightly down from 4.4 billion visitors since August of the same year. Wikipedia is a free online encyclopedia with articles generated by volunteers worldwide. The platform is hosted by the Wikimedia Foundation.
Squarespace (SQSP): Building a Web Presence, One Website at a Time...
kappasignal.com
Updated Sep 4, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
KappaSignal (2024). Squarespace (SQSP): Building a Web Presence, One Website at a Time (Forecast) [Dataset]. https://www.kappasignal.com/2024/09/squarespace-sqsp-building-web-presence.html
Explore at:
Dataset updated
Sep 4, 2024
Dataset authored and provided by
KappaSignal
License
https://www.kappasignal.com/p/legal-disclaimer.htmlhttps://www.kappasignal.com/p/legal-disclaimer.html
Description
This analysis presents a rigorous exploration of financial data, incorporating a diverse range of statistical features. By providing a robust foundation, it facilitates advanced research and innovative modeling techniques within the field of finance.

Squarespace (SQSP): Building a Web Presence, One Website at a Time

Financial data:

Historical daily stock prices (open, high, low, close, volume)

Fundamental data (e.g., market capitalization, price to earnings P/E ratio, dividend yield, earnings per share EPS, price to earnings growth, debt-to-equity ratio, price-to-book ratio, current ratio, free cash flow, projected earnings growth, return on equity, dividend payout ratio, price to sales ratio, credit rating)

Technical indicators (e.g., moving averages, RSI, MACD, average directional index, aroon oscillator, stochastic oscillator, on-balance volume, accumulation/distribution A/D line, parabolic SAR indicator, bollinger bands indicators, fibonacci, williams percent range, commodity channel index)

Machine learning features:

Feature engineering based on financial data and technical indicators

Sentiment analysis data from social media and news articles

Macroeconomic data (e.g., GDP, unemployment rate, interest rates, consumer spending, building permits, consumer confidence, inflation, producer price index, money supply, home sales, retail sales, bond yields)

Potential Applications:

Stock price prediction

Portfolio optimization

Algorithmic trading

Market sentiment analysis

Risk management

Use Cases:

Researchers investigating the effectiveness of machine learning in stock market prediction

Analysts developing quantitative trading Buy/Sell strategies

Individuals interested in building their own stock market prediction models

Students learning about machine learning and financial applications

Additional Notes:

The dataset may include different levels of granularity (e.g., daily, hourly)

Data cleaning and preprocessing are essential before model training

Regular updates are recommended to maintain the accuracy and relevance of the data
f
Data from: S1 Dataset -
plos.figshare.com
figshare.com
xlsx
Updated Jan 24, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Muath Saad Alassaf; Ayman Bakkari; Jehad Saleh; Abdulsamad Habeeb; Bashaer Fahad Aljuhani; Ahmad A. Qazali; Ahmed Yaseen Alqutaibi (2025). S1 Dataset - [Dataset]. http://doi.org/10.1371/journal.pone.0312832.s002
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0312832.s002
Dataset updated
Jan 24, 2025
Dataset provided by
PLOS ONE
Authors
Muath Saad Alassaf; Ayman Bakkari; Jehad Saleh; Abdulsamad Habeeb; Bashaer Fahad Aljuhani; Ahmad A. Qazali; Ahmed Yaseen Alqutaibi
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
BackgroundThis study aimed to investigate the quality and readability of online English health information about dental sensitivity and how patients evaluate and utilize these web-based information.MethodsThe credibility and readability of health information was obtained from three search engines. We conducted searches in "incognito" mode to reduce the possibility of biases. Quality assessment utilized JAMA benchmarks, the DISCERN tool, and HONcode. Readability was analyzed using the SMOG, FRE, and FKGL indices.ResultsOut of 600 websites, 90 were included, with 62.2% affiliated with dental or medical centers, among these websites, 80% exclusively related to dental implant treatments. Regarding JAMA benchmarks, currency was the most commonly achieved and 87.8% of websites fell into the "moderate quality" category. Word and sentence counts ranged widely with a mean of 815.7 (±435.4) and 60.2 (±33.3), respectively. FKGL averaging 8.6 (±1.6), SMOG scores averaging 7.6 (±1.1), and FRE scale showed a mean of 58.28 (±9.1), with "fair difficult" being the most common category.ConclusionThe overall evaluation using DISCERN indicated a moderate quality level, with a notable absence of referencing. JAMA benchmarks revealed a general non-adherence among websites, as none of the websites met all of the four criteria. Only one website was HON code certified, suggesting a lack of reliable sources for web-based health information accuracy. Readability assessments showed varying results, with the majority being "fair difficult". Although readability did not significantly differ across affiliations, a wide range of the number of words and sentences count was observed between them.
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Month	Number of websites
2024-01	551'148
2024-02	792'921
2024-03	844'537
2024-04	802'169
2024-05	805'878
2024-06	809'518
2024-07	811'418
2024-08	813'534
2024-09	814'321
2024-10	817'586
2024-11	828'662
2024-12	827'101

Facebook

Twitter

Click to copy link

Link copied

Cite

WebTechSurvey, Websites using Same But Different [Dataset]. https://webtechsurvey.com/technology/same-but-different

Websites using Same But Different

Explore at:

csvAvailable download formats

Dataset authored and provided by

WebTechSurvey

License

https://webtechsurvey.com/termshttps://webtechsurvey.com/terms

Time period covered

2025

Area covered

Global

Description

A complete list of live websites using the Same But Different technology, compiled through global website indexing conducted by WebTechSurvey.

Clear search

Close search

Google apps

Main menu

Websites using Same But Different

Multilingual Scraper of Privacy Policies and Terms of Service

Multilingual Scraper of Privacy Policies and Terms of Service: Scraped Documents of 2024

Preliminaries

Files and structure

Shared metadata

Policy data

Terms data

Updates

all-portable-apps-and-ai-in-one-url

Fraudulent Bank Websites, Phishing E-mails and Similar Scams | DATA.GOV.HK

Evaluating Web Table Annotation Methods: From Entity Lookups to Entity...

Google Analytics www cityofrochester gov

Development Economics Data Group - Terms of Use Score | gimi9.com

Transparency in Keyword Faceted Search: a dataset of Google Shopping html...

Web Based Construction Management Software Market Report | Global Forecast...

Web Based Construction Management Software Market Outlook

Component Analysis

Total global visitor traffic to Wikipedia.org 2024

Squarespace (SQSP): Building a Web Presence, One Website at a Time...

Squarespace (SQSP): Building a Web Presence, One Website at a Time

Financial data:

Machine learning features:

Potential Applications:

Use Cases:

Additional Notes:

Data from: S1 Dataset -

Websites using Same But Different