12 datasets found
  1. w

    Websites using Same But Different

    • webtechsurvey.com
    csv
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    WebTechSurvey, Websites using Same But Different [Dataset]. https://webtechsurvey.com/technology/same-but-different
    Explore at:
    csvAvailable download formats
    Dataset authored and provided by
    WebTechSurvey
    License

    https://webtechsurvey.com/termshttps://webtechsurvey.com/terms

    Time period covered
    2025
    Area covered
    Global
    Description

    A complete list of live websites using the Same But Different technology, compiled through global website indexing conducted by WebTechSurvey.

  2. Multilingual Scraper of Privacy Policies and Terms of Service

    • zenodo.org
    bin, zip
    Updated Apr 24, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    David Bernhard; David Bernhard; Luka Nenadic; Luka Nenadic; Stefan Bechtold; Karel Kubicek; Karel Kubicek; Stefan Bechtold (2025). Multilingual Scraper of Privacy Policies and Terms of Service [Dataset]. http://doi.org/10.5281/zenodo.14562039
    Explore at:
    zip, binAvailable download formats
    Dataset updated
    Apr 24, 2025
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    David Bernhard; David Bernhard; Luka Nenadic; Luka Nenadic; Stefan Bechtold; Karel Kubicek; Karel Kubicek; Stefan Bechtold
    License

    Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
    License information was derived automatically

    Description

    Multilingual Scraper of Privacy Policies and Terms of Service: Scraped Documents of 2024

    This dataset supplements publication "Multilingual Scraper of Privacy Policies and Terms of Service" at ACM CSLAW’25, March 25–27, 2025, München, Germany. It includes the first 12 months of scraped policies and terms from about 800k websites, see concrete numbers below.

    The following table lists the amount of websites visited per month:

    MonthNumber of websites
    2024-01551'148
    2024-02792'921
    2024-03844'537
    2024-04802'169
    2024-05805'878
    2024-06809'518
    2024-07811'418
    2024-08813'534
    2024-09814'321
    2024-10817'586
    2024-11828'662
    2024-12827'101

    The amount of websites visited should always be higher than the number of jobs (Table 1 of the paper) as a website may redirect, resulting in two websites scraped or it has to be retried.

    To simplify the access, we release the data in large CSVs. Namely, there is one file for policies and another for terms per month. All of these files contain all metadata that are usable for the analysis. If your favourite CSV parser reports the same numbers as above then our dataset is correctly parsed. We use ‘,’ as a separator, the first row is the heading and strings are in quotes.

    Since our scraper sometimes collects other documents than policies and terms (for how often this happens, see the evaluation in Sec. 4 of the publication) that might contain personal data such as addresses of authors of websites that they maintain only for a selected audience. We therefore decided to reduce the risks for websites by anonymizing the data using Presidio. Presidio substitutes personal data with tokens. If your personal data has not been effectively anonymized from the database and you wish for it to be deleted, please contact us.

    Preliminaries

    The uncompressed dataset is about 125 GB in size, so you will need sufficient storage. This also means that you likely cannot process all the data at once in your memory, so we split the data in months and in files for policies and terms.

    Files and structure

    The files have the following names:

    • 2024_policy.csv for policies
    • 2024_terms.csv for terms

    Shared metadata

    Both files contain the following metadata columns:

    • website_month_id - identification of crawled website
    • job_id - one website can have multiple jobs in case of redirects (but most commonly has only one)
    • website_index_status - network state of loading the index page. This is resolved by the Chromed DevTools Protocol.
      • DNS_ERROR - domain cannot be resolved
      • OK - all fine
      • REDIRECT - domain redirect to somewhere else
      • TIMEOUT - the request timed out
      • BAD_CONTENT_TYPE - 415 Unsupported Media Type
      • HTTP_ERROR - 404 error
      • TCP_ERROR - error in the network connection
      • UNKNOWN_ERROR - unknown error
    • website_lang - language of index page detected based on langdetect library
    • website_url - the URL of the website sampled from the CrUX list (may contain subdomains, etc). Use this as a unique identifier for connecting data between months.
    • job_domain_status - indicates the status of loading the index page. Can be:
      • OK - all works well (at the moment, should be all entries)
      • BLACKLISTED - URL is on our list of blocked URLs
      • UNSAFE - website is not safe according to save browsing API by Google
      • LOCATION_BLOCKED - country is in the list of blocked countries
    • job_started_at - when the visit of the website was started
    • job_ended_at - when the visit of the website was ended
    • job_crux_popularity - JSON with all popularity ranks of the website this month
    • job_index_redirect - when we detect that the domain redirects us, we stop the crawl and create a new job with the target URL. This saves time if many websites redirect to one target, as it will be crawled only once. The index_redirect is then the job.id corresponding to the redirect target.
    • job_num_starts - amount of crawlers that started this job (counts restarts in case of unsuccessful crawl, max is 3)
    • job_from_static - whether this job was included in the static selection (see Sec. 3.3 of the paper)
    • job_from_dynamic - whether this job was included in the dynamic selection (see Sec. 3.3 of the paper) - this is not exclusive with from_static - both can be true when the lists overlap.
    • job_crawl_name - our name of the crawl, contains year and month (e.g., 'regular-2024-12' for regular crawls, in Dec 2024)

    Policy data

    • policy_url_id - ID of the URL this policy has
    • policy_keyword_score - score (higher is better) according to the crawler's keywords list that given document is a policy
    • policy_ml_probability - probability assigned by the BERT model that given document is a policy
    • policy_consideration_basis - on which basis we decided that this url is policy. The following three options are executed by the crawler in this order:
      1. 'keyword matching' - this policy was found using the crawler navigation (which is based on keywords)
      2. 'search' - this policy was found using search engine
      3. 'path guessing' - this policy was found by using well-known URLs like example.com/policy
    • policy_url - full URL to the policy
    • policy_content_hash - used as identifier - if the document remained the same between crawls, it won't create a new entry
    • policy_content - contains the text of policies and terms extracted to Markdown using Mozilla's readability library
    • policy_lang - Language detected by fasttext of the content

    Terms data

    Analogous to policy data, just substitute policy to terms.

    Updates

    Check this Google Docs for an updated version of this README.md.

  3. h

    all-portable-apps-and-ai-in-one-url

    • huggingface.co
    Updated Feb 12, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Derur (2025). all-portable-apps-and-ai-in-one-url [Dataset]. https://huggingface.co/datasets/Derur/all-portable-apps-and-ai-in-one-url
    Explore at:
    Dataset updated
    Feb 12, 2025
    Authors
    Derur
    Description

    Saving you time and space on HHD! Экономлю ваше время и место на диске!
    "-cl" = clear (no models / other languages) / очишенное (без моделей / лишних языков) Моя личная подборка портативных приложений и ИИ!Перепаковывал и уменьшал размер архивов лично я!Поддержите меня: Boosty или Donationalerts
    My personal selection of portable apps and AI's!I personally repacked and reduced the size of the archives!Support me: Boosty or Donationalerts

    Files authors / Авторы… See the full description on the dataset page: https://huggingface.co/datasets/Derur/all-portable-apps-and-ai-in-one-url.

  4. Fraudulent Bank Websites, Phishing E-mails and Similar Scams | DATA.GOV.HK

    • data.gov.hk
    Updated Oct 26, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    data.gov.hk (2018). Fraudulent Bank Websites, Phishing E-mails and Similar Scams | DATA.GOV.HK [Dataset]. https://data.gov.hk/en-data/dataset/hk-hkma-banksvf-fraudulent-bank-scams
    Explore at:
    Dataset updated
    Oct 26, 2018
    Dataset provided by
    data.gov.hk
    Description

    This API is providing the information of press releases issued by the authorized institutions and other similar press releases issued by the HKMA in the past regarding fraudulent bank websites, phishing E-mails and similar scams information.

  5. Evaluating Web Table Annotation Methods: From Entity Lookups to Entity...

    • springernature.figshare.com
    application/gzip
    Updated May 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Vasilis Efthymiou; Oktie Hassanzadeh; Mariano Rodríguez-Muro; Vassilis Christophides (2023). Evaluating Web Table Annotation Methods: From Entity Lookups to Entity Embeddings [Dataset]. http://doi.org/10.6084/m9.figshare.5229847.v1
    Explore at:
    application/gzipAvailable download formats
    Dataset updated
    May 30, 2023
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Vasilis Efthymiou; Oktie Hassanzadeh; Mariano Rodríguez-Muro; Vassilis Christophides
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Data sets used for experimental evaluation in the related publication: Evaluating Web Table Annotation Methods: From Entity Lookups to Entity EmbeddingsThe data sets are contained within archive folders corresponding the three gold standard data sets used in the related publication. Each is presented in both .csv and .json formats.The gold standard data sets are collections of web tables:T2D consists of a schema-level gold standard of 1,748Web tables, manually annotated with class- and property-mappings, as well as an entity-level gold standard of 233 Web tables. Limaye consists of 400 manually annotated Web tables with entity-, class-, and property-level correspondences, where single cells (not rows) are mapped to entities. The corrected version of this gold standard is adapted to annotate rows with entities, from the annotations of the label column cells. WikipediaGS is an instance-level gold standard developed from 485K Wikipedia tables, in which links in the label column are used to infer the annotation of a row to a DBpedia entity. Data formatCSV:The .csv files are formatted as double quoted (' " ') fields, separated by commas (',').In the tables files, each file corresponds to one table, each field represents a column, and each line represents a different row.In the entities files, there are only three fields:"DBpedia uri","cell string","row number"representing the correct annotation, the string of the label column cell, and the row (starting from 0) in which this mapping is found, respectively.Tables and entities files that correspond to the same table have the same filename.The same formatting and naming convention is used in T2D gold standard (http://webdatacommons.org/webtables/goldstandard.html).JSON:Each line in a .json file corresponds to a table, written as a JSONObject. T2D and Limaye tables files contain only one line (table) per file, while the Wikipedia gold standard contains multiple lines (tables) per .json file. In T2D and Limaye, the entity mappings of those tables can be found in the entities files with the same filename, while in Wikipedia, the entity mappings of each table can be found the line of the entities files having the "tableId" field as the one of the corresponding table.The contents of a table in .json are given as a two-dimensional array (a JSONArray of JSONArray s), called "contents". Each JSONArray in the contents represents a table row. Each element of this array is a JSONObject, representing one cell of the row. The field "data" of each cell contains the cell string contents, while there may also be a field "isHeader" to denote of the current cell is in a header row. In the Wikipedia gold standard there may also be a "wikiPageId" field, denoting the existing hyperlink of this cell to a Wikipedia page. It only contains the suffix of a Wikipedia URL, skipping the first part "https://en.wikipedia.org/wiki/".The entity mappings files are in the same format as in csv:["DBpedia uri","cell string",row number] inside the "mappings" field of a json file. Note on license: please refer to the README.txt. Data is derived from Wikipedia and other sources may have different licenses.

    Wikipedia contents can be shared under the terms of Creative Commons Attribution-ShareAlike License as outlined on Wikipedia: https://en.wikipedia.org/wiki/Wikipedia:Reusing_Wikipedia_content

    The correspondences of the T2D Gold standard is provided under the terms of the Apache license. The Web tables are provided according the same terms of use, disclaimer of warranties and limitation of liabilities that apply to the Common Crawl corpus. The DBpedia subset is licensed under the terms of the Creative Commons Attribution-ShareAlike License and the GNU Free Documentation License that applies to DBpedia. Limaye gold standard is downloaded from: http://websail-fe.cs.northwestern.edu/TabEL/ (download date: August 25, 2016). Please refer to the original website and the following paper for more details and citation information: G. Limaye, S. Sarawagi, and S. Chakrabarti. Annotating and Searching Web Tables Using Entities, Types and Relationships. PVLDB, 3(1):1338–1347, 2010.Also: THIS DATA IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

  6. c

    Google Analytics www cityofrochester gov

    • data.cityofrochester.gov
    Updated Dec 11, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Open_Data_Admin (2021). Google Analytics www cityofrochester gov [Dataset]. https://data.cityofrochester.gov/datasets/google-analytics-www-cityofrochester-gov/about
    Explore at:
    Dataset updated
    Dec 11, 2021
    Dataset authored and provided by
    Open_Data_Admin
    Description

    Data dictionary: Page_Title: Title of webpage used for pages of the website www.cityofrochester.gov Pageviews: Total number of pages viewed over the course of the calendar year listed in the year column. Repeated views of a single page are counted. Unique_Pageviews: Unique Pageviews - The number of sessions during which a specified page was viewed at least once. A unique pageview is counted for each URL and page title combination. Avg_Time: Average amount of time users spent looking at a specified page or screen. Entrances: The number of times visitors entered the website through a specified page.Bounce_Rate: " A bounce is a single-page session on your site. In Google Analytics, a bounce is calculated specifically as a session that triggers only a single request to the Google Analytics server, such as when a user opens a single page on your site and then exits without triggering any other requests to the Google Analytics server during that session. Bounce rate is single-page sessions on a page divided by all sessions that started with that page, or the percentage of all sessions on your site in which users viewed only a single page and triggered only a single request to the Google Analytics server. These single-page sessions have a session duration of 0 seconds since there are no subsequent hits after the first one that would let Google Analytics calculate the length of the session. "Exit_Rate: The number of exits from a page divided by the number of pageviews for the page. This is inclusive of sessions that started on different pages, as well as “bounce” sessions that start and end on the same page. For all pageviews to the page, Exit Rate is the percentage that were the last in the session. Year: Calendar year over which the data was collected. Data reflects the counts for each metric from January 1st through December 31st.

  7. g

    Development Economics Data Group - Terms of Use Score | gimi9.com

    • gimi9.com
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Development Economics Data Group - Terms of Use Score | gimi9.com [Dataset]. https://gimi9.com/dataset/worldbank_wb_spi_d2_2_terms_of_use/
    Explore at:
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Terms of Use Score from Open Data Watch (ODW). Openness element 5 measures whether data are available with an open terms of use. Generally, terms of use (TOU) will apply to an entire website or data portal (unless otherwise specified). In these cases, all data found on the same website and/or portal will receive the same score. If a portal is located on the same domain as the NSO website, the terms of use on the NSO site will apply. If the data are located on a portal or website on a different domain, another terms of use will need to be present. For a policy/ license to be accepted as a terms of use, it must clearly refer to the data found on the website. Terms of use that refer to nondata content (such as pictures, logos, etc.) of the website are not considered. A copyright symbol at the bottom of the page is not sufficient. A sentence indicating a recommended citation format is not sufficient. Terms of use are classified the following ways: (1) Not Available, (2) Restrictive, (3) Semi-Restrictive, and (4) Open. If the TOU contains one or more restrictive clauses, it receives 0 points and is classified as “restrictive.”

  8. Z

    Transparency in Keyword Faceted Search: a dataset of Google Shopping html...

    • data.niaid.nih.gov
    • zenodo.org
    Updated Jan 24, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    De Nicola Rocco (2020). Transparency in Keyword Faceted Search: a dataset of Google Shopping html pages [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_1491556
    Explore at:
    Dataset updated
    Jan 24, 2020
    Dataset provided by
    Cozza Vittoria
    Petrocchi Marinella
    Hoang Van Tien
    De Nicola Rocco
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset contains a collection of around 2,000 HTML pages: these web pages contain the search results obtained in return to queries for different products, searched by a set of synthetic users surfing Google Shopping (US version) from different locations, in July, 2016.

    Each file in the collection has a name where there is indicated the location from where the search has been done, the userID, and the searched product: no_email_LOCATION_USERID.PRODUCT.shopping_testing.#.html

    The locations are Philippines (PHI), United States (US), India (IN). The userIDs: 26 to 30 for users searching from Philippines, 1 to 5 from US, 11 to 15 from India.

    Products have been choice following 130 keywords (e.g., MP3 player, MP4 Watch, Personal organizer, Television, etc.).

    In the following, we describe how the search results have been collected.

    Each user has a fresh profile. The creation of a new profile corresponds to launch a new, isolated, web browser client instance and open the Google Shopping US web page.

    To mimic real users, the synthetic users can browse, scroll pages, stay on a page, and click on links.

    A fully-fledged web browser is used to get the correct desktop version of the website under investigation. This is because websites could be designed to behave according to user agents, as witnessed by the differences between the mobile and desktop versions of the same website.

    The prices are the retail ones displayed by Google Shopping in US dollars (thus, excluding shipping fees).

    Several frameworks have been proposed for interacting with web browsers and analysing results from search engines. This research adopts OpenWPM. OpenWPM is automatised with Selenium to efficiently create and manage different users with isolated Firefox and Chrome client instances, each of them with their own associated cookies.

    The experiments run, on average, 24 hours. In each of them, the software runs on our local server, but the browser's traffic is redirected to the designated remote servers (i.e., to India), via tunneling in SOCKS proxies. This way, all commands are simultaneously distributed over all proxies. The experiments adopt the Mozilla Firefox browser (version 45.0) for the web browsing tasks and run under Ubuntu 14.04. Also, for each query, we consider the first page of results, counting 40 products. Among them, the focus of the experiments is mostly on the top 10 and top 3 results.

    Due to connection errors, one of the Philippine profiles have no associated results. Also, for Philippines, a few keywords did not lead to any results: videocassette recorders, totes, umbrellas. Similarly, for US, no results were for totes and umbrellas.

    The search results have been analyzed in order to check if there were evidence of price steering, based on users' location.

    One term of usage applies:

    In any research product whose findings are based on this dataset, please cite

    @inproceedings{DBLP:conf/ircdl/CozzaHPN19, author = {Vittoria Cozza and Van Tien Hoang and Marinella Petrocchi and Rocco {De Nicola}}, title = {Transparency in Keyword Faceted Search: An Investigation on Google Shopping}, booktitle = {Digital Libraries: Supporting Open Science - 15th Italian Research Conference on Digital Libraries, {IRCDL} 2019, Pisa, Italy, January 31 - February 1, 2019, Proceedings}, pages = {29--43}, year = {2019}, crossref = {DBLP:conf/ircdl/2019}, url = {https://doi.org/10.1007/978-3-030-11226-4_3}, doi = {10.1007/978-3-030-11226-4_3}, timestamp = {Fri, 18 Jan 2019 23:22:50 +0100}, biburl = {https://dblp.org/rec/bib/conf/ircdl/CozzaHPN19}, bibsource = {dblp computer science bibliography, https://dblp.org} }

  9. Web Based Construction Management Software Market Report | Global Forecast...

    • dataintelo.com
    csv, pdf, pptx
    Updated Jan 7, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dataintelo (2025). Web Based Construction Management Software Market Report | Global Forecast From 2025 To 2033 [Dataset]. https://dataintelo.com/report/global-web-based-construction-management-software-market
    Explore at:
    pdf, csv, pptxAvailable download formats
    Dataset updated
    Jan 7, 2025
    Dataset authored and provided by
    Dataintelo
    License

    https://dataintelo.com/privacy-and-policyhttps://dataintelo.com/privacy-and-policy

    Time period covered
    2024 - 2032
    Area covered
    Global
    Description

    Web Based Construction Management Software Market Outlook



    As of 2023, the global market size for Web Based Construction Management Software is valued at approximately USD 5.2 billion, with a projected reach of USD 12.4 billion by 2032, growing at a CAGR of 10% during the forecast period. This growth is primarily driven by the increasing need for real-time project tracking, efficient resource management, and the rising adoption of digital solutions in the construction industry.



    The growth factors fueling the Web Based Construction Management Software market are multifaceted. Firstly, technological advancements and the integration of artificial intelligence (AI) and machine learning (ML) in construction management software have significantly enhanced predictive analytics and automation capabilities. These technologies enable construction managers to make informed decisions, optimize resource allocation, and predict potential project delays. Consequently, the demand for advanced software solutions that can streamline complex construction processes is on the rise.



    Additionally, the growing emphasis on sustainable construction practices is propelling the adoption of web-based construction management software. The software aids in better planning, tracking, and reporting, ensuring that construction projects meet environmental regulations and sustainability goals. With governments and organizations globally pushing for greener construction methods, the market for such software is anticipated to expand rapidly. The ability to monitor and reduce waste, manage energy consumption, and ensure compliance with sustainability standards is becoming increasingly crucial.



    Moreover, the rising trend of remote work and decentralized project teams has accelerated the need for web-based solutions that provide seamless access to project data from any location. This shift has been further amplified by the COVID-19 pandemic, which highlighted the necessity for digital collaboration tools. Web-based construction management software facilitates real-time collaboration among stakeholders, ensuring that all team members are on the same page despite geographical barriers. The scalability and flexibility offered by cloud-based solutions make them particularly attractive for construction companies of all sizes.



    Construction Scheduling Software plays a pivotal role in enhancing the efficiency of web-based construction management systems. By providing tools for detailed project timelines, resource allocation, and task prioritization, this software ensures that all project phases are meticulously planned and executed. The integration of scheduling software with other construction management tools allows for seamless updates and adjustments, accommodating any unforeseen changes in project scope or timelines. This adaptability is crucial in the fast-paced construction environment, where delays can lead to significant cost overruns. As the industry moves towards more complex and larger-scale projects, the demand for robust scheduling solutions is expected to grow, reinforcing the importance of integrating such software into existing management systems.



    From a regional perspective, North America is expected to maintain a significant share of the market due to the early adoption of technology and the presence of key market players. However, the Asia Pacific region is projected to witness the highest growth rate, driven by rapid urbanization, infrastructural development, and increasing investments in smart city projects. Developing economies in this region are recognizing the benefits of adopting digital construction management solutions to enhance efficiency and competitiveness in the construction sector.



    Component Analysis



    The Web Based Construction Management Software market can be segmented by components into Software and Services. The software segment includes various applications and tools designed to manage different aspects of construction projects. This segment is expected to dominate the market due to the increasing demand for comprehensive project management solutions. These software solutions offer functionalities such as scheduling, budgeting, resource allocation, and document management, which are essential for efficient project execution.



    On the other hand, the services segment comprises consulting, training, support, and maintenance services provided by vendors to ensure the effective implementation and use of constru

  10. Total global visitor traffic to Wikipedia.org 2024

    • statista.com
    • ai-chatbox.pro
    Updated Nov 11, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2024). Total global visitor traffic to Wikipedia.org 2024 [Dataset]. https://www.statista.com/statistics/1259907/wikipedia-website-traffic/
    Explore at:
    Dataset updated
    Nov 11, 2024
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    Oct 2023 - Mar 2024
    Area covered
    Worldwide
    Description

    In March 2024, close to 4.4 billion unique global visitors had visited Wikipedia.org, slightly down from 4.4 billion visitors since August of the same year. Wikipedia is a free online encyclopedia with articles generated by volunteers worldwide. The platform is hosted by the Wikimedia Foundation.

  11. Squarespace (SQSP): Building a Web Presence, One Website at a Time...

    • kappasignal.com
    Updated Sep 4, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    KappaSignal (2024). Squarespace (SQSP): Building a Web Presence, One Website at a Time (Forecast) [Dataset]. https://www.kappasignal.com/2024/09/squarespace-sqsp-building-web-presence.html
    Explore at:
    Dataset updated
    Sep 4, 2024
    Dataset authored and provided by
    KappaSignal
    License

    https://www.kappasignal.com/p/legal-disclaimer.htmlhttps://www.kappasignal.com/p/legal-disclaimer.html

    Description

    This analysis presents a rigorous exploration of financial data, incorporating a diverse range of statistical features. By providing a robust foundation, it facilitates advanced research and innovative modeling techniques within the field of finance.

    Squarespace (SQSP): Building a Web Presence, One Website at a Time

    Financial data:

    • Historical daily stock prices (open, high, low, close, volume)

    • Fundamental data (e.g., market capitalization, price to earnings P/E ratio, dividend yield, earnings per share EPS, price to earnings growth, debt-to-equity ratio, price-to-book ratio, current ratio, free cash flow, projected earnings growth, return on equity, dividend payout ratio, price to sales ratio, credit rating)

    • Technical indicators (e.g., moving averages, RSI, MACD, average directional index, aroon oscillator, stochastic oscillator, on-balance volume, accumulation/distribution A/D line, parabolic SAR indicator, bollinger bands indicators, fibonacci, williams percent range, commodity channel index)

    Machine learning features:

    • Feature engineering based on financial data and technical indicators

    • Sentiment analysis data from social media and news articles

    • Macroeconomic data (e.g., GDP, unemployment rate, interest rates, consumer spending, building permits, consumer confidence, inflation, producer price index, money supply, home sales, retail sales, bond yields)

    Potential Applications:

    • Stock price prediction

    • Portfolio optimization

    • Algorithmic trading

    • Market sentiment analysis

    • Risk management

    Use Cases:

    • Researchers investigating the effectiveness of machine learning in stock market prediction

    • Analysts developing quantitative trading Buy/Sell strategies

    • Individuals interested in building their own stock market prediction models

    • Students learning about machine learning and financial applications

    Additional Notes:

    • The dataset may include different levels of granularity (e.g., daily, hourly)

    • Data cleaning and preprocessing are essential before model training

    • Regular updates are recommended to maintain the accuracy and relevance of the data

  12. f

    Data from: S1 Dataset -

    • plos.figshare.com
    • figshare.com
    xlsx
    Updated Jan 24, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Muath Saad Alassaf; Ayman Bakkari; Jehad Saleh; Abdulsamad Habeeb; Bashaer Fahad Aljuhani; Ahmad A. Qazali; Ahmed Yaseen Alqutaibi (2025). S1 Dataset - [Dataset]. http://doi.org/10.1371/journal.pone.0312832.s002
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Jan 24, 2025
    Dataset provided by
    PLOS ONE
    Authors
    Muath Saad Alassaf; Ayman Bakkari; Jehad Saleh; Abdulsamad Habeeb; Bashaer Fahad Aljuhani; Ahmad A. Qazali; Ahmed Yaseen Alqutaibi
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    BackgroundThis study aimed to investigate the quality and readability of online English health information about dental sensitivity and how patients evaluate and utilize these web-based information.MethodsThe credibility and readability of health information was obtained from three search engines. We conducted searches in "incognito" mode to reduce the possibility of biases. Quality assessment utilized JAMA benchmarks, the DISCERN tool, and HONcode. Readability was analyzed using the SMOG, FRE, and FKGL indices.ResultsOut of 600 websites, 90 were included, with 62.2% affiliated with dental or medical centers, among these websites, 80% exclusively related to dental implant treatments. Regarding JAMA benchmarks, currency was the most commonly achieved and 87.8% of websites fell into the "moderate quality" category. Word and sentence counts ranged widely with a mean of 815.7 (±435.4) and 60.2 (±33.3), respectively. FKGL averaging 8.6 (±1.6), SMOG scores averaging 7.6 (±1.1), and FRE scale showed a mean of 58.28 (±9.1), with "fair difficult" being the most common category.ConclusionThe overall evaluation using DISCERN indicated a moderate quality level, with a notable absence of referencing. JAMA benchmarks revealed a general non-adherence among websites, as none of the websites met all of the four criteria. Only one website was HON code certified, suggesting a lack of reliable sources for web-based health information accuracy. Readability assessments showed varying results, with the majority being "fair difficult". Although readability did not significantly differ across affiliations, a wide range of the number of words and sentences count was observed between them.

  13. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
WebTechSurvey, Websites using Same But Different [Dataset]. https://webtechsurvey.com/technology/same-but-different

Websites using Same But Different

Explore at:
csvAvailable download formats
Dataset authored and provided by
WebTechSurvey
License

https://webtechsurvey.com/termshttps://webtechsurvey.com/terms

Time period covered
2025
Area covered
Global
Description

A complete list of live websites using the Same But Different technology, compiled through global website indexing conducted by WebTechSurvey.

Search
Clear search
Close search
Google apps
Main menu