63 datasets found

Media Web Reputation Ranking - SCImago
kaggle.com
Updated Apr 9, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ali Jalaali (2025). Media Web Reputation Ranking - SCImago [Dataset]. https://www.kaggle.com/datasets/alijalali4ai/media-web-reputation-ranking-scimago
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Apr 9, 2025
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Ali Jalaali
Description

Using four metrics—**Authority Score, Referring Domains, Citation Flow, and Trust Flow**—with an equal weight of 25%, SCImago constructs an overall indicator that reflects media websites’ digital reputation. The results define their relative position in the ranking and permit a comparison of digital development and leadership.

☢️❓The entire dataset is obtained from public and open-access data of SCImago Media Rankings
Data from: Journal Ranking Dataset
kaggle.com
Updated Aug 15, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Abir (2023). Journal Ranking Dataset [Dataset]. https://www.kaggle.com/datasets/xabirhasan/journal-ranking-dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Aug 15, 2023
Dataset provided by
Kaggle
Authors
Abir
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Journals & Ranking

An academic journal or research journal is a periodical publication in which research articles relating to a particular academic discipline is published, according to Wikipedia. Currently, there are more than 25,000 peer-reviewed journals that are indexed in citation index databases such as Scopus and Web of Science. These indexes are ranked on the basis of various metrics such as CiteScore, H-index, etc. The metrics are calculated from yearly citation data of the journal. A lot of efforts are given to make a metric that reflects the journal's quality.

Journal Ranking Dataset

This is a comprehensive dataset on the academic journals coving their metadata information as well as citation, metrics, and ranking information. Detailed data on their subject area is also given in this dataset. The dataset is collected from the following indexing databases: - Scimago Journal Ranking - Scopus - Web of Science Master Journal List

The data is collected by scraping and then it was cleaned, details of which can be found in HERE.

Key Features

Rank: Overall rank of journal (derived from sorted SJR index).

Title: Name or title of journal.

OA: Open Access or not.

Country: Country of origin.

SJR-index: A citation index calculated by Scimago.

CiteScore: A citation index calculated by Scopus.

H-index: Hirsh index, the largest number h such that at least h articles in that journal were cited at least h times each.

Best Quartile: Top Q-index or quartile a journal has in any subject area.

Best Categories: Subject areas with top quartile.

Best Subject Area: Highest ranking subject area.

Best Subject Rank: Rank of the highest ranking subject area.

Total Docs.: Total number of documents of the journal.

Total Docs. 3y: Total number of documents in the past 3 years.

Total Refs.: Total number of references of the journal.

Total Cites 3y: Total number of citations in the past 3 years.

Citable Docs. 3y: Total number of citable documents in the past 3 years.

Cites/Doc. 2y: Total number of citations divided by the total number of documents in the past 2 years.

Refs./Doc.: Total number of references divided by the total number of documents.

Publisher: Name of the publisher company of the journal.

Core Collection: Web of Science core collection name.

Coverage: Starting year of coverage.

Active: Active or inactive.

In-Press: Articles in press or not.

ISO Language Code: Three-letter ISO 639 code for language.

ASJC Codes: All Science Journal Classification codes for the journal.

Rest of the features provide further details on the journal's subject area or category: - Life Sciences: Top level subject area. - Social Sciences: Top level subject area. - Physical Sciences: Top level subject area. - Health Sciences: Top level subject area. - 1000 General: ASJC main category. - 1100 Agricultural and Biological Sciences: ASJC main category. - 1200 Arts and Humanities: ASJC main category. - 1300 Biochemistry, Genetics and Molecular Biology: ASJC main category. - 1400 Business, Management and Accounting: ASJC main category. - 1500 Chemical Engineering: ASJC main category. - 1600 Chemistry: ASJC main category. - 1700 Computer Science: ASJC main category. - 1800 Decision Sciences: ASJC main category. - 1900 Earth and Planetary Sciences: ASJC main category. - 2000 Economics, Econometrics and Finance: ASJC main category. - 2100 Energy: ASJC main category. - 2200 Engineering: ASJC main category. - 2300 Environmental Science: ASJC main category. - 2400 Immunology and Microbiology: ASJC main category. - 2500 Materials Science: ASJC main category. - 2600 Mathematics: ASJC main category. - 2700 Medicine: ASJC main category. - 2800 Neuroscience: ASJC main category. - 2900 Nursing: ASJC main category. - 3000 Pharmacology, Toxicology and Pharmaceutics: ASJC main category. - 3100 Physics and Astronomy: ASJC main category. - 3200 Psychology: ASJC main category. - 3300 Social Sciences: ASJC main category. - 3400 Veterinary: ASJC main category. - 3500 Dentistry: ASJC main category. - 3600 Health Professions: ASJC main category.
Esports Performance Rankings and Results
kaggle.com
Updated Dec 12, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The Devastator (2022). Esports Performance Rankings and Results [Dataset]. https://www.kaggle.com/datasets/thedevastator/unlocking-collegiate-esports-performance-with-bu/suggestions
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Dec 12, 2022
Dataset provided by
Kagglehttp://kaggle.com/
Authors
The Devastator
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Esports Performance Rankings and Results

Performance Rankings and Results from Multiple Esports Platforms

By [source]

About this dataset

This dataset provides a detailed look into the world of competitive video gaming in universities. It covers a wide range of topics, from performance rankings and results across multiple esports platforms to the individual team and university rankings within each tournament. With an incredible wealth of data, fans can discover statistics on their favorite teams or explore the challenges placed upon university gamers as they battle it out to be the best. Dive into the information provided and get an inside view into the world of collegiate esports tournaments as you assess all things from Match ID, Team 1, University affiliations, Points earned or lost in each match and special Seeds or UniSeeds for exceptional teams. Of course don't forget about exploring all the great Team Names along with their corresponding websites for further details on stats across tournaments!

More Datasets

For more datasets, click here.

Featured Notebooks

🚨 Your notebook can be here! 🚨!

How to use the dataset

Download Files First, make sure you have downloaded the CS_week1, CS_week2, CS_week3 and seeds datasets on Kaggle. You will also need to download the currentRankings file for each week of competition. All files should be saved using their originally assigned name in order for your analysis tools to read them properly (ie: CS_week1.csv).

Understand File Structure Once all data has been collected and organized into separate files on your desktop/laptop computer/mobile device/etc., it's time to become familiar with what type of information is included in each file. The main folder contains three main data files: week1-3 and seedings. The week1-3 contain teams matched against one another according to university, point score from match results as well as team name and website URL associated with university entry; whereas the seedings include a ranking system amongst university entries which are accompanied by information regarding team names, website URLs etc.. Furthermore, there is additional file featured which contains currentRankings scores for each individual player/teams for an first given period of competition (ie: first week).

Analyzing Data Now that everything is set up on your end it’s time explore! You can dive deep into trends amongst universities or individual players in regards to specific match performances or standings overall throughout weeks of competition etc… Furthermore you may also jumpstart insights via further creation of graphs based off compiled date from sources taken from BUECTracker dataset! For example let us say we wanted compare two universities- let's say Harvard University v Cornell University - against one another since beginning of event i we shall extract respective points(column),dates(column)(found under result tab) ,regions(csilluminating North America vs Europe etc)general stats such as maps played etc.. As well any other custom ideas which would come along in regards when dealing with similar datasets!

Research Ideas

Analyze the performance of teams and identify areas for improvement for better performance in future competitions.

Assess which esports platforms are the most popular among gamers.

Gain a better understanding of player rankings across different regions, based on rankings system, to create targeted strategies that could boost individual players' scoring potential or team overall success in competitive gaming events

Acknowledgements

If you use this dataset in your research, please credit the original authors. Data Source

License

License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.

Columns

File: CS_week1.csv | Column name | Description | |:---------------|:----------------------------------------------| | Match ID | Unique identifier for each match. (Integer) | | Team 1 | Name of the first team in the match. (String) | | University | University associated with the team. (String) |

File: CS_week1_currentRankings.csv | Column name | Description | |:--------------|:-----------------------------------------------------------|...
Most visited websites by hierachycal categories
kaggle.com
Updated Sep 18, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Natanael de Souza Figueiredo (2020). Most visited websites by hierachycal categories [Dataset]. https://www.kaggle.com/natanael127/most-visited-websites-by-hierachycal-categories/code
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Sep 18, 2020
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Natanael de Souza Figueiredo
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Context

Alexa Internet was founded in April 1996 by Brewster Kahle and Bruce Gilliat. The company's name was chosen in homage to the Library of Alexandria of Ptolemaic Egypt, drawing a parallel between the largest repository of knowledge in the ancient world and the potential of the Internet to become a similar store of knowledge. (from Wikipedia)

The categories list was going out by September, 17h, 2020. So I would like to save it. https://support.alexa.com/hc/en-us/articles/360051913314

This dataset was elaborated by this python script (V2.0): https://github.com/natanael127/dump-alexa-ranking

Content

The sites are grouped in 17 macro categories and this tree ends having more than 360.000 nodes. Subjects are very organized and each of them has its own rank of most accessed domains. So, even the keys of a sub-dictionary may be a good small dataset to use.

Acknowledgements

Thank you my friend André (https://github.com/andrerclaudio) by helping me with tips of Google Colaboratory and computational power to get the data until our deadline.

Inspiration

Alexa ranking was inspired by Library of Alexandria. In the modern world, it may be a good start for AI know more about many, many subjects of the world.
P
Alexa Domains Dataset
paperswithcode.com
opendatalab.com
Updated Feb 1, 2001
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Isaac Corley; Jonathan Lwowski; Justin Hoffman (2001). Alexa Domains Dataset [Dataset]. https://paperswithcode.com/dataset/gagan-bhatia
Explore at:
Dataset updated
Feb 1, 2001
Authors
Isaac Corley; Jonathan Lwowski; Justin Hoffman
Description
This dataset is composed of the URLs of the top 1 million websites. The domains are ranked using the Alexa traffic ranking which is determined using a combination of the browsing behavior of users on the website, the number of unique visitors, and the number of pageviews. In more detail, unique visitors are the number of unique users who visit a website on a given day, and pageviews are the total number of user URL requests for the website. However, multiple requests for the same website on the same day are counted as a single pageview. The website with the highest combination of unique visitors and pageviews is ranked the highest
o
Data set of the article: Using Machine Learning for Web Page Classification...
explore.openaire.eu
Updated Jan 4, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Goran Matošević; Jasminka Dobša; Dunja Mladenić (2021). Data set of the article: Using Machine Learning for Web Page Classification in Search Engine Optimization [Dataset]. http://doi.org/10.5281/zenodo.4416123
Explore at:
Unique identifier
https://doi.org/10.5281/zenodo.4416123
Dataset updated
Jan 4, 2021
Authors
Goran Matošević; Jasminka Dobša; Dunja Mladenić
Description
Data of investigation published in the article: "Using Machine Learning for Web Page Classification in Search Engine Optimization" Abstract of the article: This paper presents a novel approach of using machine learning algorithms based on experts’ knowledge to classify web pages into three predefined classes according to the degree of content adjustment to the search engine optimization (SEO) recommendations. In this study, classifiers were built and trained to classify an unknown sample (web page) into one of the three predefined classes and to identify important factors that affect the degree of page adjustment. The data in the training set are manually labeled by domain experts. The experimental results show that machine learning can be used for predicting the degree of adjustment of web pages to the SEO recommendations—classifier accuracy ranges from 54.59% to 69.67%, which is higher than the baseline accuracy of classification of samples in the majority class (48.83%). Practical significance of the proposed approach is in providing the core for building software agents and expert systems to automatically detect web pages, or parts of web pages, that need improvement to comply with the SEO guidelines and, therefore, potentially gain higher rankings by search engines. Also, the results of this study contribute to the field of detecting optimal values of ranking factors that search engines use to rank web pages. Experiments in this paper suggest that important factors to be taken into consideration when preparing a web page are page title, meta description, H1 tag (heading), and body text—which is aligned with the findings of previous research. Another result of this research is a new data set of manually labeled web pages that can be used in further research.
Empirical Analysis of Ranking Models for an Adaptable Dataset Search:...
figshare.com
zip
Updated Jun 2, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Angelo Batista Neves Júnior; Luiz André Portes Paes Leme; Marco Antonio Casanova (2023). Empirical Analysis of Ranking Models for an Adaptable Dataset Search: complementary material [Dataset]. http://doi.org/10.6084/m9.figshare.5620651.v4
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.5620651.v4
Dataset updated
Jun 2, 2023
Dataset provided by
Figsharehttp://figshare.com/
Authors
Angelo Batista Neves Júnior; Luiz André Portes Paes Leme; Marco Antonio Casanova
License
https://www.gnu.org/licenses/gpl-3.0.htmlhttps://www.gnu.org/licenses/gpl-3.0.html
Description
This repository contains performance measures of dataset ranking models.- Usage: from Results/src run Python results m1 m2 ...such that mi can be omitted, or be any element of the list of model labels ['bayesian-12C', 'bayesian-5L', 'bayesian-5L12C', 'cos-12C', 'cos-5L', 'cos-5L5C', 'j48-12C', 'j48-5L', 'j48-5L5C', 'jrip-12C', 'jrip-5L', 'jrip-5L5C', 'sn-12C', 'sn-5L', 'sn-5L12C']. Results of selected models will be plotted in a 2D line plot. If no model is provided all models will be listed.
Data set of the article: Ranking by relevance and citation counts, a...
zenodo.org
bin
Updated Jan 24, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Cristòfol Rovira; Cristòfol Rovira; Lluís Codina; Lluís Codina; Frederic Guerrero-Solé; Frederic Guerrero-Solé; Carlos Lopezosa; Carlos Lopezosa (2020). Data set of the article: Ranking by relevance and citation counts, a comparative study: Google Scholar, Microsoft Academic, WoS and Scopus [Dataset]. http://doi.org/10.5281/zenodo.3381151
Explore at:
binAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.3381151
Dataset updated
Jan 24, 2020
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Cristòfol Rovira; Cristòfol Rovira; Lluís Codina; Lluís Codina; Frederic Guerrero-Solé; Frederic Guerrero-Solé; Carlos Lopezosa; Carlos Lopezosa
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Data of investigation published in the article "Ranking by relevance and citation counts, a comparative study: Google Scholar, Microsoft Academic, WoS and Scopus".

Abstract of the article:

Search engine optimization (SEO) constitutes the set of methods designed to increase the visibility of, and the number of visits to, a web page by means of its ranking on the search engine results pages. Recently, SEO has also been applied to academic databases and search engines, in a trend that is in constant growth. This new approach, known as academic SEO (ASEO), has generated a field of study with considerable future growth potential due to the impact of open science. The study reported here forms part of this new field of analysis. The ranking of results is a key aspect in any information system since it determines the way in which these results are presented to the user. The aim of this study is to analyse and compare the relevance ranking algorithms employed by various academic platforms to identify the importance of citations received in their algorithms. Specifically, we analyse two search engines and two bibliographic databases: Google Scholar and Microsoft Academic, on the one hand, and Web of Science and Scopus, on the other. A reverse engineering methodology is employed based on the statistical analysis of Spearman’s correlation coefficients. The results indicate that the ranking algorithms used by Google Scholar and Microsoft are the two that are most heavily influenced by citations received. Indeed, citation counts are clearly the main SEO factor in these academic search engines. An unexpected finding is that, at certain points in time, WoS used citations received as a key ranking factor, despite the fact that WoS support documents claim this factor does not intervene.
i
Data from: A dataset on the evaluation of the accessibility of the home...
ieee-dataport.org
observatorio-cientifico.ua.es
+2more
Updated Aug 26, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Milton Campoverde Molina (2021). A dataset on the evaluation of the accessibility of the home pages of the web portals of Ecuadorian higher education institutions ranked in Webometrics [Dataset]. https://ieee-dataport.org/documents/dataset-evaluation-accessibility-home-pages-web-portals-ecuadorian-higher-education
Explore at:
Dataset updated
Aug 26, 2021
Authors
Milton Campoverde Molina
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Ecuador
Description
this research aims to evaluate the accessibility of the home pages of the web portals of the Ecuadorian higher education institutions ranked in the Webometrics with the Web Content Accessibility Guidelines (WCAG) 2.1 of the World Wide Web Consortium.
Dataset covidgilance signals
zenodo.org
bin, csv +3
Updated Sep 25, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Gaudinat Arnaud; Gaudinat Arnaud (2020). Dataset covidgilance signals [Dataset]. http://doi.org/10.5281/zenodo.4048460
Explore at:
csv, tsv, bin, text/x-python, txtAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.4048460
Dataset updated
Sep 25, 2020
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Gaudinat Arnaud; Gaudinat Arnaud
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Research datasets about top signals for covid 19 (coronavirus) for study into Google Trends (GT) and with SEO metrics

Website

The study is currently published on https://covidgilance.org website (in french)

Datasets description

covid signals -> |selection| -> 4 dataset -> |serp.py| -> 4 serp datasets -> |aggregate_serp.pl| -> 4 aggregated dataset of serp -> |prepare datasets| -> 4 ranked top seo dataset

Original lists of signals (mainly covid symptoms) - dataset

Description: contain the original relevant list of signals for covid19 (here list of queries where you can see, in GT, a relevant signal during the covid 19 period of time)
Name: covid_signal_list.tsv

List of content:

- id: unique id for the topic
- topic-fr: name of the topic in French
- topic-en: name of the topic in English
- topic-id: GT topic id
- keyword fr: one or several keywords in French for GT
- keyword en: one or several keywords in English for GT
- fr-topic-url-12M: link to 12-months French query topic in GT in France
- en-topic-url-12M: link to 12-months English query topic in GT in US
- fr-url-12M: link to 12-months French queries in GT in France
- en-url-12M: link to 12-months English queries topic in GT in US
- fr-topic-url-5M: link to 5-months French query topic in GT in France
- en-topic-url-5M: link to 5-months English query topic in GT in US
- fr-url-5M: link to 5-months French queries in GT in France
- en-url-5M: link to 5-months English queries topic in GT in US

Tool to get SERP of covid signals - tool

Description: query google with a list of covid signals and obtain a list of serps in csv (tsv in fact) file format
Name: serper.py

python serper.py

SERP files - datasets

Description Serp results for 4 datesets of queries Names: simple version of covid signals from google.ch in French: serp_signals_20_ch_fr.csv
simple version of covid signals from google.com in English: serp_signals_20_en.csv
amplified version of covid signals from google.ch in French: serp_signals_covid_20_ch_fr.csv
amplified version of covid signals from google.com in English: serp_signals_covid_20_en.csv

amplified version means that for each query we create two queries one with the keywords "covid" and one with "coronavirus"

Tool to aggregate SERP results - tool

Description: load csv serp data and aggregate the data to create a new csv file where each line is a website and each column is a query. Name: aggregate_serp.pl

`perl aggregate_serp.pl> aggregated_signals_20_en.csv

datasets of top website from the SERP results - dataset

Description a aggregated version of the SERP where each line is a website and each column a query
Names:
aggregated_signals_20_ch_fr.csv
aggregated_signals_20_en.csv
aggregated_signals_covid_20_ch_fr.csv
aggregated_signals_covid_20_en.csv

List of content:

- domain: domain name of the website
- signal 1: Position of the query 1 (signal 1) in the SERP where 30 indicates arbitrary that this website is not present in the SERP
- signal ...: Position of the query (signal) in the SERP where 30 indicates arbitrary that this website is not present in the SERP
- signal n: Position of the query n (signal n) in the SERP where 30 indicates arbitrary that this website is not present in the SERP
- total: average position (total of all position /divided by the number of queries)
- missing: Total number of missing results in the SERP for this website

datasets ranked top seo - dataset

Description a ranked (by weighted average position) version of the aggregated version of the SERP where each line is a website and each column a query. TOP 20 have more information about the type and HONcode validity (from the date of collect: September 2020)

Names:
ranked_signals_20_ch_fr.csv
ranked_signals_20_en.csv
ranked_signals_covid_20_ch_fr.csv
ranked_signals_covid_20_en.csv

List of content:

- domain: domain name of the website
- signal 1: Position of the query 1 (signal 1) in the SERP where 30 indicates arbitrary that this website is not present in the SERP
- signal ...: Position of the query (signal) in the SERP where 30 indicates arbitrary that this website is not present in the SERP
- signal n: Position of the query n (signal n) in the SERP where 30 indicates arbitrary that this website is not present in the SERP
- avg position: average position (total of all position /divided by the number of queries)
- nb missing: Total number of missing results in the SERP for this website
- % presence: % of presence
- weighted avg postion: combination of avg position and % of presence for final ranking
- honcode: status of the Honcode certificate for this website (none/valid/expired)
- type: type of the website (health, gov, edu or media)
Z
Data from: Webis-Web-Archive-17
data.niaid.nih.gov
webis.de
+2more
Updated Jul 19, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Stein, Benno (2024). Webis-Web-Archive-17 [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_1002203
Explore at:
Dataset updated
Jul 19, 2024
Dataset provided by
Kiesel, Johannes
Potthast, Martin
Hagen, Matthias
Kneist, Florian
Stein, Benno
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
The Webis-Web-Archive-17 comprises a total of 10,000 web page archives from mid-2017 that were carefully sampled from the Common Crawl to involve a mixture of high-ranking and low-ranking web pages. The dataset contains the web archive files, HTML DOM, and screenshots of each web page, as well as per-page annotations of visual web archive quality. See this overview for all datasets that built upon this one. If you use this dataset in your research, please cite it using this paper.
Traces captured by visiting the top 1500 website
kaggle.com
zip
Updated Aug 25, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
DNS_dataset (2021). Traces captured by visiting the top 1500 website [Dataset]. https://www.kaggle.com/jacksontang16/traces-captured-by-visiting-the-top-1500-website
Explore at:
zip(5852806 bytes)Available download formats
Dataset updated
Aug 25, 2021
Authors
DNS_dataset
Description
Dataset

This dataset was created by DNS_dataset

Contents
d
Best Virtual Data Rooms 2024 Dataset
dataroom-providers.org
Updated Sep 6, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dataroom Providers (2018). Best Virtual Data Rooms 2024 Dataset [Dataset]. https://dataroom-providers.org/
Explore at:
Dataset updated
Sep 6, 2018
Dataset authored and provided by
Dataroom Providers
Description
Best virtual data rooms 2024 dataset is created to provide the data room users and M&A specialists with detailed information on the best virtual data rooms. The dataset contains the descriptions of each dataroom solution and their ratings.
Yahoo-Learning-to-Rank-Challenge
huggingface.co
Updated Dec 15, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yahoo-Research (2024). Yahoo-Learning-to-Rank-Challenge [Dataset]. https://huggingface.co/datasets/YahooResearch/Yahoo-Learning-to-Rank-Challenge
Explore at:
Dataset updated
Dec 15, 2024
Dataset provided by
Yahoo!https://tw.yahoo.com/
Yahoo! Research
Authors
Yahoo-Research
License
https://choosealicense.com/licenses/other/https://choosealicense.com/licenses/other/
Description
Yahoo! Learning to Rank Challenge, version 1.0

Machine learning has been successfully applied to web search ranking and the goal of this dataset to benchmark such machine learning algorithms. The dataset consists of features extracted from (query,url) pairs along with relevance judgments. The queries, ulrs and features descriptions are not given, only the feature values are. There are two datasets in this distribution: a large one and a small one. Each dataset is divided in 3 sets:… See the full description on the dataset page: https://huggingface.co/datasets/YahooResearch/Yahoo-Learning-to-Rank-Challenge.
P
MSLR WEB30K Dataset
paperswithcode.com
Updated Apr 14, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Tao Qin; Tie-Yan Liu (2025). MSLR WEB30K Dataset [Dataset]. https://paperswithcode.com/dataset/mslr-web30k
Explore at:
Dataset updated
Apr 14, 2025
Authors
Tao Qin; Tie-Yan Liu
Description
The datasets are machine learning data, in which queries and urls are represented by IDs. The datasets consist of feature vectors extracted from query-url pairs along with relevance judgment labels:

(1) The relevance judgments are obtained from a retired labeling set of a commercial web search engine (Microsoft Bing), which take 5 values from 0 (irrelevant) to 4 (perfectly relevant).

(2) The features are basically extracted by us, and are those widely used in the research community.

In the data files, each row corresponds to a query-url pair. The first column is relevance label of the pair, the second column is query id, and the following columns are features. The larger value the relevance label has, the more relevant the query-url pair is. A query-url pair is represented by a 136-dimensional feature vector.
d
Real Estate Data | Property Listing, Sold Properties, Rankings, Agent...
datarade.ai
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Grepsr, Real Estate Data | Property Listing, Sold Properties, Rankings, Agent Datasets | Global Coverage | For Competitive Property Pricing and Investment [Dataset]. https://datarade.ai/data-products/real-estate-property-data-grepsr-grepsr
Explore at:
.bin, .json, .xml, .csv, .xls, .sql, .txtAvailable download formats
Dataset authored and provided by
Grepsr
Area covered
Malaysia, Kazakhstan, Congo (Democratic Republic of the), Iraq, Australia, Spain, South Sudan, Tonga, Holy See, Kuwait
Description
Extract detailed property data points — address, URL, prices, floor space, overview, parking, agents, and more — from any real estate listings. The Rankings data contains the ranking of properties as they come in the SERPs of different property listing sites. Furthermore, with our real estate agents' data, you can directly get in touch with the real estate agents/brokers via email or phone numbers.

A. Usecase/Applications possible with the data:

Property pricing - accurate property data for real estate valuation. Gather information about properties and their valuations from Federal, State, or County level websites. Monitor the real estate market across the country and decide the best time to buy or sell based on data

Secure your real estate investment - Monitor foreclosures and auctions to identify investment opportunities. Identify areas within special economic and opportunity zones such as QOZs - cross-map that with commercial or residential listings to identify leads. Ensure the safety of your investments, property, and personnel by analyzing crime data prior to investing.

Identify hot, emerging markets - Gather data about rent, demographic, and population data to expand retail and e-commerce businesses. Helps you drive better investment decisions.

Profile a building’s retrofit history - a building permit is required before the start of any construction activity of a building, such as changing the building structure, remodeling, or installing new equipment. Moreover, many large cities provide public datasets of building permits in history. Use building permits to profile a city’s building retrofit history.

Study market changes - New construction data helps measure and evaluate the size, composition, and changes occurring within the housing and construction sectors.

Finding leads - Property records can reveal a wealth of information, such as how long an owner has currently lived in a home. US Census Bureau data and City-Data.com provide profiles of towns and city neighborhoods as well as demographic statistics. This data is available for free and can help agents increase their expertise in their communities and get a feel for the local market.

Searching for Targeted Leads - Focusing on small, niche areas of the real estate market can sometimes be the most efficient method of finding leads. For example, targeting high-end home sellers may take longer to develop a lead, but the payoff could be greater. Or, you may have a special interest or background in a certain type of home that would improve your chances of connecting with potential sellers. In these cases, focused data searches may help you find the best leads and develop relationships with future sellers.

How does it work?

Analyze sample data

Customize parameters to suit your needs

Add to your projects

Contact support for further customization
Dataset Search WebApp
figshare.com
zip
Updated May 31, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Angelo Batista Neves Júnior; Luiz André Portes Paes Leme (2023). Dataset Search WebApp [Dataset]. http://doi.org/10.6084/m9.figshare.5217958.v2
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.5217958.v2
Dataset updated
May 31, 2023
Dataset provided by
figshare
Authors
Angelo Batista Neves Júnior; Luiz André Portes Paes Leme
License
https://www.gnu.org/copyleft/gpl.htmlhttps://www.gnu.org/copyleft/gpl.html
Description
Despite the fact that extensive list of open datasets are available in catalogues, most of the data publishers still connects their datasets to other popular datasets, such as DBpedia5, Freebase 6 and Geonames7. Although the linkage with popular datasets would allow us to explore external resources, it would fail to cover highly specialized information. Catalogues of linked data describe the content of datasets in terms of the update periodicity, authors, SPARQL endpoints, linksets with other datasets, amongst others, as recommended by W3C VoID Vocabulary. However, catalogues by themselves do not provide any explicit information to help the URI linkage process.Searching techniques can rank available datasets SI according to the probability that it will be possible to define links between URIs of SI and a given dataset T to be published, so that most of the links, if not all, could be found by inspecting the most relevant datasets in the ranking. dataset-search is a tool for searching datasets for linkage.
A
‘QS World University Rankings 2017 - 2022’ analyzed by Analyst-2
analyst-2.ai
Updated Aug 1, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2022). ‘QS World University Rankings 2017 - 2022’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-qs-world-university-rankings-2017-2022-7fc4/d793e726/?iid=007-103&v=presentation
Explore at:
Dataset updated
Aug 1, 2022
Dataset authored and provided by
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Analysis of ‘QS World University Rankings 2017 - 2022’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/padhmam/qs-world-university-rankings-2017-2022 on 13 February 2022.

--- Dataset description provided by original source is as follows ---

Context

QS World University Rankings is an annual publication of global university rankings by Quacquarelli Symonds. The QS ranking receives approval from the International Ranking Expert Group (IREG), and is viewed as one of the three most-widely read university rankings in the world. QS publishes its university rankings in partnership with Elsevier.

Content

This dataset contains university data from the year 2017 to 2022. It has a total of 15 features. - university - name of the university - year - year of ranking - rank_display - rank given to the university - score - score of the university based on the six key metrics mentioned above - link - link to the university profile page on QS website - country - country in which the university is located - city - city in which the university is located - region - continent in which the university is located - logo - link to the logo of the university - type - type of university (public or private) - research_output - quality of research at the university - student_faculty_ratio - number of students assigned to per faculty - international_students - number of international students enrolled at the university - size - size of the university in terms of area - faculty_count - number of faculty or academic staff at the university

Acknowledgements

This dataset was acquired by scraping the QS World University Rankings website with Python and Selenium. Cover Image: Source

Inspiration

Some of the questions that can be answered with this dataset, 1. What makes a best ranked university? 2. Does the location of a university play a role in its ranking? 3. What do the best universities have in common? 4. How important is academic research for a university? 5. Which country is preferred by international students?

--- Original source retains full ownership of the source dataset ---
Cross-language corpora of privacy policies
zenodo.org
explore.openaire.eu
+1more
csv, zip
Updated Jun 17, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Francesco Ciclosi; Francesco Ciclosi; Silvia Vidor; Silvia Vidor; Fabio Massacci; Fabio Massacci (2023). Cross-language corpora of privacy policies [Dataset]. http://doi.org/10.5281/zenodo.7729546
Explore at:
csv, zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.7729546
Dataset updated
Jun 17, 2023
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Francesco Ciclosi; Francesco Ciclosi; Silvia Vidor; Silvia Vidor; Fabio Massacci; Fabio Massacci
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The dataset consists of three different privacy policy corpora (in English and Italian) composed of 81 unique privacy policy texts spanning the period 2018-2021. This dataset makes available an example of three corpora of privacy policies. The first corpus is the English-language corpus, the original used in the study by Tang et al. [2]. The other two are cross-language corpora built (one, the source corpus, in English, and the other, the replication corpus, in Italian, which is the language of a potential replication study) from the first corpus.

The policies were collected from:

the Alexa top 10 Italy and U.S. websites rank;

the Play Store apps rank in the "most profitable games" category of the Play Store for Italy and the U.S.

We manually analyzed the Alexa top 10 Italy websites as of November 2021. Analogously, we analyzed selected apps that, in the same period, had ranked better in the "most profitable games" category of the Play Store for Italy.

All the privacy policies are ANSI-encoded text files and have been manually read and verified.
The dataset is helpful as a starting point for building comparable cross-language privacy policies corpora. The availability of these comparable cross-language privacy policies corpora helps replicate studies in different languages.
Details on the methodology can be found in the accompanying paper.

The available files are as follows:

policies-texts.zip --> contains a directory of text files with the policy texts. File names are the SHA1 hashes of the policy text.

policy-metadata.csv --> Contains a CSV file with the metadata for each privacy policy.

This dataset is the original dataset used in the publication [1]. The original English U.S. corpus is described in the publication [2].

[1] F. Ciclosi, S. Vidor and F. Massacci. "Building cross-language corpora for human understanding of privacy policies." Workshop on Digital Sovereignty in Cyber Security: New Challenges in Future Vision. Communications in Computer and Information Science. Springer International Publishing, 2023, In press.

[2] J. Tang, H. Shoemaker, A. Lerner, and E. Birrell. Defining Privacy: How Users Interpret Technical Terms in Privacy Policies. Proceedings on Privacy Enhancing Technologies, 3:70–94, 2021.
Data articles in journals
zenodo.org
bin, csv, txt
Updated Sep 21, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Carlota Balsa-Sanchez; Carlota Balsa-Sanchez; Vanesa Loureiro; Vanesa Loureiro (2023). Data articles in journals [Dataset]. http://doi.org/10.5281/zenodo.7458466
Explore at:
bin, txt, csvAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.7458466
Dataset updated
Sep 21, 2023
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Carlota Balsa-Sanchez; Carlota Balsa-Sanchez; Vanesa Loureiro; Vanesa Loureiro
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Last Version: 4

Authors: Carlota Balsa-Sánchez, Vanesa Loureiro

Date of data collection: 2022/12/15

General description: The publication of datasets according to the FAIR principles, could be reached publishing a data paper (or software paper) in data journals or in academic standard journals. The excel and CSV file contains a list of academic journals that publish data papers and software papers.
File list:

- data_articles_journal_list_v4.xlsx: full list of 140 academic journals in which data papers or/and software papers could be published
- data_articles_journal_list_v4.csv: full list of 140 academic journals in which data papers or/and software papers could be published

Relationship between files: both files have the same information. Two different formats are offered to improve reuse

Type of version of the dataset: final processed version

Versions of the files: 4th version
- Information updated: number of journals, URL, document types associated to a specific journal, publishers normalization and simplification of document types
- Information added : listed in the Directory of Open Access Journals (DOAJ), indexed in Web of Science (WOS) and quartile in Journal Citation Reports (JCR) and/or Scimago Journal and Country Rank (SJR), Scopus and Web of Science (WOS), Journal Master List.

Version: 3

Authors: Carlota Balsa-Sánchez, Vanesa Loureiro

Date of data collection: 2022/10/28

General description: The publication of datasets according to the FAIR principles, could be reached publishing a data paper (or software paper) in data journals or in academic standard journals. The excel and CSV file contains a list of academic journals that publish data papers and software papers.
File list:

- data_articles_journal_list_v3.xlsx: full list of 124 academic journals in which data papers or/and software papers could be published
- data_articles_journal_list_3.csv: full list of 124 academic journals in which data papers or/and software papers could be published

Relationship between files: both files have the same information. Two different formats are offered to improve reuse

Type of version of the dataset: final processed version

Versions of the files: 3rd version
- Information updated: number of journals, URL, document types associated to a specific journal, publishers normalization and simplification of document types
- Information added : listed in the Directory of Open Access Journals (DOAJ), indexed in Web of Science (WOS) and quartile in Journal Citation Reports (JCR) and/or Scimago Journal and Country Rank (SJR).

Erratum - Data articles in journals Version 3:

Botanical Studies -- ISSN 1999-3110 -- JCR (JIF) Q2
Data -- ISSN 2306-5729 -- JCR (JIF) n/a
Data in Brief -- ISSN 2352-3409 -- JCR (JIF) n/a

Version: 2

Author: Francisco Rubio, Universitat Politècnia de València.

Date of data collection: 2020/06/23

General description: The publication of datasets according to the FAIR principles, could be reached publishing a data paper (or software paper) in data journals or in academic standard journals. The excel and CSV file contains a list of academic journals that publish data papers and software papers.
File list:

- data_articles_journal_list_v2.xlsx: full list of 56 academic journals in which data papers or/and software papers could be published
- data_articles_journal_list_v2.csv: full list of 56 academic journals in which data papers or/and software papers could be published

Relationship between files: both files have the same information. Two different formats are offered to improve reuse

Type of version of the dataset: final processed version

Versions of the files: 2nd version
- Information updated: number of journals, URL, document types associated to a specific journal, publishers normalization and simplification of document types
- Information added : listed in the Directory of Open Access Journals (DOAJ), indexed in Web of Science (WOS) and quartile in Scimago Journal and Country Rank (SJR)

Total size: 32 KB

Version 1: Description

This dataset contains a list of journals that publish data articles, code, software articles and database articles.

The search strategy in DOAJ and Ulrichsweb was the search for the word data in the title of the journals.
Acknowledgements:
Xaquín Lores Torres for his invaluable help in preparing this dataset.

Facebook

Twitter

Click to copy link

Link copied

Cite

Ali Jalaali (2025). Media Web Reputation Ranking - SCImago [Dataset]. https://www.kaggle.com/datasets/alijalali4ai/media-web-reputation-ranking-scimago

Media Web Reputation Ranking - SCImago

Quality, Influence & Trustworthiness, as Reputation of Global Media Websites

Explore at:

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Dataset updated

Apr 9, 2025

Dataset provided by

Kagglehttp://kaggle.com/

Authors

Ali Jalaali

Description

Using four metrics—**Authority Score, Referring Domains, Citation Flow, and Trust Flow**—with an equal weight of 25%, SCImago constructs an overall indicator that reflects media websites’ digital reputation. The results define their relative position in the ranking and permit a comparison of digital development and leadership.

☢️❓The entire dataset is obtained from public and open-access data of SCImago Media Rankings

Clear search

Close search

Google apps

Main menu

Media Web Reputation Ranking - SCImago

Data from: Journal Ranking Dataset

Journals & Ranking

Journal Ranking Dataset

Key Features

Esports Performance Rankings and Results

Esports Performance Rankings and Results

Performance Rankings and Results from Multiple Esports Platforms

About this dataset

More Datasets

Featured Notebooks

How to use the dataset

Research Ideas

Acknowledgements

License

Columns

Most visited websites by hierachycal categories

Context

Content

Acknowledgements

Inspiration

Alexa Domains Dataset

Data set of the article: Using Machine Learning for Web Page Classification...

Empirical Analysis of Ranking Models for an Adaptable Dataset Search:...

Data set of the article: Ranking by relevance and citation counts, a...

Data from: A dataset on the evaluation of the accessibility of the home...

Dataset covidgilance signals

Data from: Webis-Web-Archive-17

Traces captured by visiting the top 1500 website

Dataset

Contents

Best Virtual Data Rooms 2024 Dataset

Yahoo-Learning-to-Rank-Challenge

MSLR WEB30K Dataset

Real Estate Data | Property Listing, Sold Properties, Rankings, Agent...

Dataset Search WebApp

‘QS World University Rankings 2017 - 2022’ analyzed by Analyst-2

Context

Content

Acknowledgements

Inspiration

Cross-language corpora of privacy policies

Data articles in journals

Media Web Reputation Ranking - SCImago

Quality, Influence & Trustworthiness, as Reputation of Global Media Websites