17 datasets found

Data from: Examining bias perpetuation in academic search engines: an...
zenodo.org
bin, csv, zip
Updated Feb 8, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ulloa Roberto; Ulloa Roberto (2024). Examining bias perpetuation in academic search engines: an algorithm audit of Google and Semantic Scholar [Dataset]. http://doi.org/10.5281/zenodo.10636247
Explore at:
bin, zip, csvAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.10636247
Dataset updated
Feb 8, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Ulloa Roberto; Ulloa Roberto
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
Main dataset (main.csv)

The main file contains an entry (N=28530) per search result in all collected pages. It comprises the following columns:

id: Unique identifier of the file (corresponds to the last part of the filename)

filename: Name of the file associated with the row (the file is in serp_html.zip)

engine: The search engine used (Google Scholar or Semantic Scholar).

browser: The web browser used for the search (Firefox or Chrome)

region: The geographical region where the search was made.

year: The year when the search was made

month: The month when the search was made

day: The day when the search was made

query: The full search query that was used

query_type: The type of the search query (health or technology)

topic: The topic associated with the search query ('covid vaccines', 'cryptocurrencies', 'internet', 'social media', 'vaccines', 'coffee')

trt: Treatment variable associated with the search (benefits or risks).

url: The URL of the (article) search result

title: The title of the (article) search result.

authorship: The author(s) of the (article) search result.

abstract_id: Unique identifier for the abstract of the (article) search result which connects with annotated-abstracts_v0.6.xlsx

abstract_hash: Hash value of the abstract for data integrity

link_n: The total number of results in the search page

rank: The rank of the search result on the search engine results page.

annotation: Any annotations associated with the (article's abstract) search result. One of: '3. Confirms both benefits and risks', '4. Confirms neither benefits nor risks', '1. Confirms benefits', '2. Confirms risks', '5. Abstract not related to {topic}')

valence: -1 for abstracts containing risks, 0 for neutral abstracts, 1 for abstracts only containing benefits

Annotated abstracts (annotated-abstracts_v0.6.xlsx)

Manually annotated abstracts resulting from the searches.

Raw search engine result pages (serp_html.zip)

The zip contains an HTML per search engine result page collected (N=2853). See column filename from the main dataset.
h
semantic-scholar-manajemen-proyek
huggingface.co
Updated Feb 2, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sederhana Gulo (2025). semantic-scholar-manajemen-proyek [Dataset]. https://huggingface.co/datasets/derhan/semantic-scholar-manajemen-proyek
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Feb 2, 2025
Authors
Sederhana Gulo
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Semantic Scholar: Manajemen Proyek

Batch Fetched from Semantic Scholar API, with parameters:

query = '"manajemen proyek" | "project management" | "metodologi manajemen proyek" | "teknik manajemen proyek" | "alat manajemen proyek" | "keterampilan manajemen proyek" | "tantangan manajemen proyek" | "risiko manajemen proyek" | "studi kasus manajemen proyek" | "tren manajemen proyek" | "manajemen proyek di berbagai industri" | "manajemen proyek dalam organisasi" | "manajemen proyek… See the full description on the dataset page: https://huggingface.co/datasets/derhan/semantic-scholar-manajemen-proyek.
Z
COVID-19 Open Research Dataset (CORD-19)
data.niaid.nih.gov
live.european-language-grid.eu
+2more
Updated Jul 22, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sebastian Kohlmeier (2024). COVID-19 Open Research Dataset (CORD-19) [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_3715505
Explore at:
Dataset updated
Jul 22, 2024
Dataset provided by
Kyle Lo
Sebastian Kohlmeier
JJ Yang
Lucy Lu Wang
Description
A full description of this dataset along with updated information can be found here.

In response to the COVID-19 pandemic, the Allen Institute for AI has partnered with leading research groups to prepare and distribute the COVID-19 Open Research Dataset (CORD-19), a free resource of scholarly articles, including full text content, about COVID-19 and the coronavirus family of viruses for use by the global research community.

This dataset is intended to mobilize researchers to apply recent advances in natural language processing to generate new insights in support of the fight against this infectious disease. The corpus will be updated weekly as new research is published in peer-reviewed publications and archival services like bioRxiv, medRxiv, and others.

By downloading this dataset you are agreeing to the Dataset license. Specific licensing information for individual articles in the dataset is available in the metadata file.

Additional licensing information is available on the PMC website, medRxiv website and bioRxiv website.

Dataset content:

Commercial use subset

Non-commercial use subset

PMC custom license subset

bioRxiv/medRxiv subset (pre-prints that are not peer reviewed)

Metadata file

Readme

Each paper is represented as a single JSON object (see schema file for details).

Description:

The dataset contains all COVID-19 and coronavirus-related research (e.g. SARS, MERS, etc.) from the following sources:

PubMed's PMC open access corpus using this query (COVID-19 and coronavirus research)

Additional COVID-19 research articles from a corpus maintained by the WHO

bioRxiv and medRxiv pre-prints using the same query as PMC (COVID-19 and coronavirus research)

We also provide a comprehensive metadata file of coronavirus and COVID-19 research articles with links to PubMed, Microsoft Academic and the WHO COVID-19 database of publications (includes articles without open access full text).

We recommend using metadata from the comprehensive file when available, instead of parsed metadata in the dataset. Please note the dataset may contain multiple entries for individual PMC IDs in cases when supplementary materials are available.

This repository is linked to the WHO database of publications on coronavirus disease and other resources, such as Microsoft Academic Graph, PubMed, and Semantic Scholar. A coalition including the Chan Zuckerberg Initiative, Georgetown University’s Center for Security and Emerging Technology, Microsoft Research, and the National Library of Medicine of the National Institutes of Health came together to provide this service.

Citation:

When including CORD-19 data in a publication or redistribution, please cite the dataset as follows:

In bibliography:

COVID-19 Open Research Dataset (CORD-19). 2020. Version 2020-MM-DD. Retrieved from https://pages.semanticscholar.org/coronavirus-research. Accessed YYYY-MM-DD. 10.5281/zenodo.3715505

In text:

(CORD-19, 2020)

The Allen Institute for AI and particularly the Semantic Scholar team will continue to provide updates to this dataset as the situation evolves and new research is released.
A
Academic AI Tools Report
datainsightsmarket.com
doc, pdf, ppt
Updated Jul 5, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Data Insights Market (2025). Academic AI Tools Report [Dataset]. https://www.datainsightsmarket.com/reports/academic-ai-tools-504839
Explore at:
ppt, pdf, docAvailable download formats
Dataset updated
Jul 5, 2025
Dataset authored and provided by
Data Insights Market
License
https://www.datainsightsmarket.com/privacy-policyhttps://www.datainsightsmarket.com/privacy-policy
Time period covered
2025 - 2033
Area covered
Global
Variables measured
Market Size
Description
The academic AI tools market is experiencing rapid growth, driven by increasing research output, the need for efficient literature reviews, and the demand for advanced analytical capabilities. The market, estimated at $1.5 billion in 2025, is projected to witness a Compound Annual Growth Rate (CAGR) of 25% from 2025 to 2033, reaching approximately $8 billion by 2033. This expansion is fueled by several key factors. Firstly, the rising number of academic publications and the complexity of research necessitate tools that can efficiently process and analyze vast amounts of information. Secondly, the integration of AI into various research stages, from literature review and data analysis to writing and editing, is significantly enhancing productivity and accuracy. Thirdly, the emergence of sophisticated AI models capable of understanding nuanced academic language and context is driving adoption among researchers and institutions. However, the market also faces challenges. High initial investment costs for both developers and users can be a barrier to entry. Concerns about data privacy and intellectual property rights in relation to AI-driven research are also significant. Furthermore, the market is currently fragmented, with numerous players competing for market share, leading to a highly dynamic competitive landscape. The continued success of companies like Consensus, Scite Assistant, Research Rabbit, Paperpal, Elicit, Perplexity, Semantic Scholar, Connected Papers, Scholarcy, Gemini, Julius, Bit AI, and Trinka will depend on their ability to differentiate their offerings, improve user experience, and effectively address the challenges related to data security and ethical considerations. Future growth will likely be driven by advancements in natural language processing (NLP), improved integration with existing research workflows, and the development of specialized tools catering to specific research fields.
E
Tab2Know evaluation data
live.european-language-grid.eu
csv
Updated Sep 13, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2022). Tab2Know evaluation data [Dataset]. https://live.european-language-grid.eu/catalogue/corpus/18315
Explore at:
csvAvailable download formats
Dataset updated
Sep 13, 2022
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
This resource contains the following files:
- `venues.txt`: The venues that were use for selecting PDFs from the [Semantic Scholar Open Research Corpus](http://s2-public-api-prod.us-west-2.elasticbeanstalk.com/corpus/) that were published in the last 5 years.
- `extracted-tables.tar.gz`: All tables that we extracted using [Tabula](https://github.com/tabulapdf/tabula) from these PDFs.
- `sample-400.tar.gz`: A sample of these tables which we used for annotation.
- `ontology.ttl`: The annotation ontology in Turtle format.
- `all_metadata.jsonl`: Annotations for this sample in the JSON format described below.
- `labelqueries.csv`: The label queries used for weak annotation, created using the annotation interface. This CSV file contains 6 columns: a numeric ID, the label query template name (`template`), the template slots (`slots`), the label type (`label`), the annotation value (`value`), and a toggle for the interface (`enabled`).
- `labelqueries-sparql-templates.zip`: The label query templates. These are SPARQL queries with slots of the form `{{slot}}`. The templates in `labelqueries.csv` refer to these files.
- `rules.txt`: Datalog rules that we used for entity resolution.
- `tab2know-graph.nt.gz`: The final RDF graph that contains all extracted table structures, predicted table and column classes, and resolved entity links.
B
Ballistic Deflection Transistor Report
datainsightsmarket.com
doc, pdf, ppt
Updated Jun 14, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Data Insights Market (2025). Ballistic Deflection Transistor Report [Dataset]. https://www.datainsightsmarket.com/reports/ballistic-deflection-transistor-891679
Explore at:
pdf, doc, pptAvailable download formats
Dataset updated
Jun 14, 2025
Dataset authored and provided by
Data Insights Market
License
https://www.datainsightsmarket.com/privacy-policyhttps://www.datainsightsmarket.com/privacy-policy
Time period covered
2025 - 2033
Area covered
Global
Variables measured
Market Size
Description
The Ballistic Deflection Transistor (BDT) market is poised for significant growth, driven by the increasing demand for high-speed, low-power electronics in various applications. While precise market sizing data is unavailable, we can reasonably infer substantial expansion based on industry trends. The compound annual growth rate (CAGR) for similar emerging semiconductor technologies often falls within the range of 15-25% during periods of rapid innovation and adoption. Assuming a conservative CAGR of 18% and a 2025 market size of $500 million (a plausible estimate given the involvement of established players like Elsevier and the potential applications), the market is projected to reach approximately $1.7 billion by 2033. Key drivers include the need for enhanced performance in data centers, advanced computing, and 5G/6G infrastructure, pushing the boundaries of traditional transistor technologies. Furthermore, the miniaturization trend in electronics fuels the demand for BDTs, enabling more compact and efficient devices. While challenges remain in terms of manufacturing complexity and cost, ongoing research and development efforts are addressing these limitations. The market segmentation will likely see a strong focus on high-performance computing segments, and early adopters will be key to driving market expansion. The major restraints currently hindering widespread BDT adoption include the high manufacturing costs associated with advanced fabrication techniques and the inherent complexities in designing and integrating these novel transistors into existing systems. This necessitates specialized expertise and infrastructure, limiting immediate accessibility. However, these challenges are progressively being overcome as fabrication technologies mature and economies of scale emerge. The competitive landscape involves key players like Semantic Scholar, Elliott Sound Products (likely in a related but not directly BDT-focused capacity, hence requiring assumptions), and Elsevier (whose involvement suggests potential in licensing or data analysis support), signifying the presence of both technological expertise and investment interest in the field. Geographic distribution will likely mirror trends in semiconductor manufacturing, with regions like North America, Europe, and East Asia capturing significant market share.
DORIS-MAE-v1
zenodo.org
data.niaid.nih.gov
bin, json
Updated Oct 17, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jianyou Wang; Kaicheng Wang; Xiaoyue Wang; Prudhviraj Naidu; Leon Bergen; Ramamohan Paturi; Jianyou Wang; Kaicheng Wang; Xiaoyue Wang; Prudhviraj Naidu; Leon Bergen; Ramamohan Paturi (2023). DORIS-MAE-v1 [Dataset]. http://doi.org/10.5281/zenodo.8299749
Explore at:
bin, jsonAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.8299749
Dataset updated
Oct 17, 2023
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Jianyou Wang; Kaicheng Wang; Xiaoyue Wang; Prudhviraj Naidu; Leon Bergen; Ramamohan Paturi; Jianyou Wang; Kaicheng Wang; Xiaoyue Wang; Prudhviraj Naidu; Leon Bergen; Ramamohan Paturi
License
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Description
In scientific research, the ability to effectively retrieve relevant documents based on complex, multifaceted queries is critical. Existing evaluation datasets for this task are limited, primarily due to the high costs and effort required to annotate resources that effectively represent complex queries. To address this, we propose a novel task, Scientific DOcument Retrieval using Multi-level Aspect-based quEries (DORIS-MAE), which is designed to handle the complex nature of user queries in scientific research.

Documentations for the DORIS-MAE dataset is publicly available at https://github.com/Real-Doris-Mae/Doris-Mae-Dataset. This upload contains both DORIS-MAE dataset version 1 and ada-002 vector embeddings for all queries and related abstracts (used in candidate pool creation). DORIS-MAE dataset version 1 is comprised of four main sub-datasets, each serving distinct purposes.

The Query dataset contains 100 human-crafted complex queries spanning across five categories: ML, NLP, CV, AI, and Composite. Each category has 20 associated queries. Queries are broken down into aspects (ranging from 3 to 9 per query) and sub-aspects (from 0 to 6 per aspect, with 0 signifying no further breakdown required). For each query, a corresponding candidate pool of relevant paper abstracts, ranging from 99 to 138, is provided.

The Corpus dataset is composed of 363,133 abstracts from computer science papers, published between 2011-2021, and sourced from arXiv. Each entry includes title, original abstract, URL, primary and secondary categories, as well as citation information retrieved from Semantic Scholar. A masked version of each abstract is also provided, facilitating the automated creation of queries.

The Annotation dataset includes generated annotations for all 165,144 question pairs, each comprising an aspect/sub-aspect and a corresponding paper abstract from the query's candidate pool. It includes the original text generated by ChatGPT (version chatgpt-3.5-turbo-0301) explaining its decision-making process, along with a three-level relevance score (e.g., 0,1,2) representing ChatGPT's final decision.

Finally, the Test Set dataset contains human annotations for a random selection of 250 question pairs used in hypothesis testing. It includes each of the three human annotators' final decisions, recorded as a three-level relevance score (e.g., 0,1,2).

The file "ada_embedding_for_DORIS-MAE_v1.pickle" contains text embeddings for the DORIS-MAE dataset, generated by OpenAI's ada-002 model. The structure of the file is as follows:

├── ada_embedding_for_DORIS-MAE_v1.pickle
├── "Query"
│ ├── query_id_1 (Embedding of query_1)
│ ├── query_id_2 (Embedding of query_2)
│ └── query_id_3 (Embedding of query_3)
│ .
│ .
│ .
└── "Corpus"
├── corpus_id_1 (Embedding of abstract_1)
├── corpus_id_2 (Embedding of abstract_2)
└── corpus_id_3 (Embedding of abstract_3)
.
.
.
f
Datasets for Fair Name-Based Gender Prediction in Scientific Communities
figshare.com
zip
Updated Aug 14, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Maria Guariglia Migliore; Gregorio D'Agostino; Tatiana Patriarca; Antonio De Nicola (2025). Datasets for Fair Name-Based Gender Prediction in Scientific Communities [Dataset]. http://doi.org/10.6084/m9.figshare.29909603.v1
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.29909603.v1
Dataset updated
Aug 14, 2025
Dataset provided by
figshare
Authors
Maria Guariglia Migliore; Gregorio D'Agostino; Tatiana Patriarca; Antonio De Nicola
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The datasets support the evaluation of fair name-based gender prediction software across two scientific domains: energy transition and critical infrastructures. Each dataset contains public information on scientific authors and their gender, determined through manual validation and compared against predictions from multiple automated tools. The gender labels in these datasets represent the assessment of human annotators based solely on the information available (e.g., names) and do not necessarily reflect the self-identified gender or gender perception of the authors.The energy transition dataset is derived from papers retrieved from Scopus using the query terms “energy transition” OR “energy transformation.” The initial set of 17,591 papers was refined to 10,130 using the Energy Systems Ontology (ESO) (De Nicola et al., 2024), authored by 27,363 individuals. From this population, 1,000 authors were randomly selected for manual gender validation, resulting in 260 females, 575 males, and 165 of undetermined gender.The critical infrastructures dataset is based on all 380 papers published between 2006 and 2022 in the proceedings of the International Conference on Critical Information Infrastructures Security (CRITIS), involving 929 authors. All authors were manually validated, yielding 153 females, 768 males, and 8 of undetermined gender.The datasets are provided in JSON format, one file per domain:- ET-report.json contains records for the 1,000 manually validated authors in the energy transition dataset. Each record includes the author’s full name, the Semantic Scholar ID, the manual validation gender label, and the predictions from multiple automated gender prediction tools (Prediction Manager, Gender API, ChatGPT, and NamSor).- CRITIS-report.json contains records for all 929 manually validated authors in the critical infrastructures dataset, with the same structure and fields as the energy transition file, except without the Semantic Scholar ID.These structured files enable reproducible analysis, cross-tool performance comparisons, and integration into further research workflows.ReferenceDe Nicola, A., Patriarca, T., Fresilli, B., Opromolla, A., Guariglia Migliore, M., Leonardi, N., D’Agostino, G., Cellini, M., Mirenda, C., Tagliacozzo, S., Pisacane, L., Vassillo, C. (2024) D.1.2 - Report on gendered assessment of the energy systems knowledge community and EU policies for sustainable energy systems—Horizon Europe Project gEneSys—Transforming gendered interrelations of power and inequalities in transition pathways to sustainable energy systems, grant agreement no. 101094326. https://ec.europa.eu/research/participants/documents/downloadPublic?documentIds=080166e509765b4f&appId=PPGMS
h
semantic-scholar-teknik-sipil
huggingface.co
Updated Feb 2, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sederhana Gulo (2025). semantic-scholar-teknik-sipil [Dataset]. https://huggingface.co/datasets/derhan/semantic-scholar-teknik-sipil
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Feb 2, 2025
Authors
Sederhana Gulo
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Semantic Scholar: Teknik Sipil

Batch fetched from Semantic Scholar API, with parameters:

query = '"manajemen proyek" | "manajemen konstruksi" | "manajemen proyek konstruksi" | "proyek konstruksi" | "proyek" | "konstruksi" | "analisa struktur beton" | "desain jembatan" | "ketahanan gempa" | "pemodelan struktur" | "material komposit" | "mekanika tanah" | "pondasi dalam" | "stabilitas lereng" | "perbaikan tanah" | "geoteknik lingkungan" | "rekayasa lalu lintas" | "perencanaan… See the full description on the dataset page: https://huggingface.co/datasets/derhan/semantic-scholar-teknik-sipil.
Reviewed Forecasting Articles
zenodo.org
Updated Mar 19, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
André Bauer; André Bauer (2020). Reviewed Forecasting Articles [Dataset]. http://doi.org/10.5281/zenodo.3716035
Explore at:
Unique identifier
https://doi.org/10.5281/zenodo.3716035
Dataset updated
Mar 19, 2020
Dataset provided by
Zenodohttp://zenodo.org/
Authors
André Bauer; André Bauer
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
As time series forecasting is an essential pillar in many decision-making fields, there is a broad range of academic work concerning forecasting. To this end, we reviewed 100 scientific papers published during the last 40 years. The papers were collected from the search engines: Google Schoolar, Mendely, IEEE Xplore, and Semantic Scholar. We select papers that have attracted at least on average 8.5 cites per year. The selected papers cover different topics, for example, supply chain demand, river flow, tourism, traffic, stock prices, electric/power demand, and many more.
f
Supplementary materials for the publication “Technological Sovereignty as a...
figshare.com
png
Updated May 18, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Boris Chigarev (2025). Supplementary materials for the publication “Technological Sovereignty as a Current Energy Security Challenge. Preliminary analysis” [Dataset]. http://doi.org/10.6084/m9.figshare.29094296.v1
Explore at:
pngAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.29094296.v1
Dataset updated
May 18, 2025
Dataset provided by
figshare
Authors
Boris Chigarev
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The paper is planned to be published on https://www.preprints.org/ and further in the journal Energy Systems Research (https://esrj.ru/index.php/esr).! Use the JSON files at https://app.vosviewer.com/.Abstract. Energy security is often interpreted as independence from fossil fuels, but a one-sided approach can lead to dependence on high-value-added technologies. The development of artificial intelligence, which requires high energy consumption, chips and servers, is shifting competition in manufacturing and services from energy security to technological sovereignty. With the development of technology, sovereignty has shifted from military independence to freedom from economic coercion by other states and large corporations. The aim of this study was to identify suitable tools for analyzing abstract texts from tens of thousands of bibliometric records and pre-assessing relevant topics related to the energy sector to effectively analyze trends in technological sovereignty issues. In this paper 10 thousand bibliometric records for the year 2024, sorted by relevance and exported from the open abstract database Scilit on the query: “energy AND technology” in [Title, Abstract, Keyword], Content Type: JOURNAL-ARTICLE, English. Filters were applied on the “Subject” category most related to technology: Power Systems & Electric Vehicles, Energy Systems & Technologies, Electrical Energy Management, and Nuclear Technology & Instrumentation. The main theme of the bibliometric data analyzed was renewable energy. Twelve clusters were identified based on keywords, of which three were closest to the topic for which this research was funded: hydrogen, heat energy storage and greenhouse gas emissions. These clusters reflect keywords derived from both Yake! and PatternRank. The Yake! program outperforms PatternRank in terms of run time and representation of found keywords in abstract texts. The feasibility of using AnyAscii for text preprocessing is demonstrated. Using artificial intelligence to create text based on key phrases speeds up text processing, but the need for manual editing remains. The study showed that there is a need to expand data sources, e.g. using OnePetro for oil and gas topics, IEEE Xplore for energy systems issues, Semantic Scholar to evaluate the role of AI in the energy sector.
h
material-synthesis-papers-s2api-400K
huggingface.co
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
iknow-lab, material-synthesis-papers-s2api-400K [Dataset]. https://huggingface.co/datasets/iknow-lab/material-synthesis-papers-s2api-400K
Explore at:
Dataset authored and provided by
iknow-lab
Description
Query period: Dec 2024 ~ Jan 2024 Row example

{ "paperId": "fe6f7573766e3cd2f268faeb2ef4e24d8820cc48", "externalIds": { "CorpusId": 220514064 }, "publicationVenue": null, "url": "https://www.semanticscholar.org/paper/fe6f7573766e3cd2f268faeb2ef4e24d8820cc48", "title": "THE 14TH INTERNATIONAL CONFERENCE ON FLOW PROCESSING IN COMPOSITE MATERIALS THROUGH THICKNESS THERMOPLASTIC MELT IMPREGNATION: EFFECT OF FABRIC ARCHITECTURE ON COMPACTION AND PERMEABILITY UNDER… See the full description on the dataset page: https://huggingface.co/datasets/iknow-lab/material-synthesis-papers-s2api-400K.
r
Hepatology International CiteScore 2024-2025 - ResearchHelpDesk
researchhelpdesk.org
Updated May 8, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Research Help Desk (2022). Hepatology International CiteScore 2024-2025 - ResearchHelpDesk [Dataset]. https://www.researchhelpdesk.org/journal/sjr/585/hepatology-international
Explore at:
Dataset updated
May 8, 2022
Dataset authored and provided by
Research Help Desk
Description
Hepatology International CiteScore 2024-2025 - ResearchHelpDesk - Hepatology International is a peer-reviewed journal featuring articles written by clinicians, clinical researchers and basic scientists is dedicated to research and patient care issues in hepatology. This journal focuses mainly on new and emerging diagnostic and treatment options, protocols and molecular and cellular basis of disease pathogenesis, new technologies, in liver and biliary sciences. Hepatology International publishes original research articles related to clinical care and basic research; review articles; consensus guidelines for diagnosis and treatment; invited editorials, and controversies in contemporary issues. The journal does not publish case reports.. Hepatology International requests that all authors comply to Springer’s ethical policies. These ethical statements should be clearly indicated on all articles for all 3 ethics statements and for all authors mentioned by name. These statements should be placed at the end of each article just before the Reference section. Hepatology International is the official journal of the Asian Pacific Association for the Study of the Liver (APASL). This is a peer-reviewed journal featuring articles written by clinicians, clinical researchers and basic scientists is dedicated to research and patient care issues in hepatology. This journal will focus mainly on new and emerging technologies, cutting-edge science and advances in liver and biliary disorders. Types of articles published: Original Research Articles related to clinical care and basic research Review Articles Consensus guidelines for diagnosis and treatment Clinical cases, images Selected Author Summaries Video Submissions Now indexed by ISI A peer-reviewed journal with global reach and championed and edited by international experts in the field Focuses on the complete spectrum of contemporary clinical and basic science related issues in the field of adult and pediatric hepatobiliary and allied sciences, new and emerging technologies, cutting-edge innovations and future trends in liver and biliary disorders Publishes original research articles, editorials, reviews, consensus guidelines for diagnosis and treatment of liver diseases Abstracted and indexed in BFI List CLOCKSS CNKI CNPIEC Dimensions EBSCO Discovery Service EMBASE Google Scholar Japanese Science and Technology Agency (JST) Journal Citation Reports/Science Edition Medline Naver Norwegian Register for Scientific Journals and Series OCLC WorldCat Discovery Service Portico ProQuest SciTech Premium Collection ProQuest Toxicology Abstracts ProQuest-ExLibris Primo ProQuest-ExLibris Summon Reaxys SCImago SCOPUS Science Citation Index Expanded (SciSearch) Semantic Scholar TD Net Discovery Service UGC-CARE List (India)
M
Knowledge Graph of COVID-19 Literature
catalog.midasnetwork.us
json
Updated Jul 6, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
MIDAS Coordination Center (2023). Knowledge Graph of COVID-19 Literature [Dataset]. https://catalog.midasnetwork.us/collection/130
Explore at:
jsonAvailable download formats
Dataset updated
Jul 6, 2023
Dataset authored and provided by
MIDAS Coordination Center
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Variables measured
disease, COVID-19, pathogen, Homo sapiens, data service, host organism, clinical trial, infectious disease, sequence collection, Severe acute respiratory syndrome coronavirus 2
Dataset funded by
National Institute of General Medical Sciences
Description
IBM is providing free access to its COVID-19 Knowledge Graph integrating COVID-19 data from various sources: CORD-19 (https://www.semanticscholar.org/cord19) for literature, Clinicaltrials.gov (https://clinicaltrials.gov/) and WHO ICTRP (https://www.who.int/ictrp/search) for trials, DrugBank (https://www.drugbank.ca/) and GenBank (https://www.ncbi.nlm.nih.gov/genbank) for database data. Prepared search reports at the Reports Page are available on open access. However, to access the COVID-19 Knowledge Graph, it is necessary to request access.
f
Data chart with key characteristics of included sources.
plos.figshare.com
xls
Updated Jun 1, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Phuc Pham-Duc; Kavitha Sriparamananthan (2023). Data chart with key characteristics of included sources. [Dataset]. http://doi.org/10.1371/journal.pone.0259069.t002
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0259069.t002
Dataset updated
Jun 1, 2023
Dataset provided by
PLOS ONE
Authors
Phuc Pham-Duc; Kavitha Sriparamananthan
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Data chart with key characteristics of included sources.
r
Hepatology International FAQ - ResearchHelpDesk
researchhelpdesk.org
Updated May 25, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Research Help Desk (2022). Hepatology International FAQ - ResearchHelpDesk [Dataset]. https://www.researchhelpdesk.org/journal/faq/585/hepatology-international
Explore at:
Dataset updated
May 25, 2022
Dataset authored and provided by
Research Help Desk
Description
Hepatology International FAQ - ResearchHelpDesk - Hepatology International is a peer-reviewed journal featuring articles written by clinicians, clinical researchers and basic scientists is dedicated to research and patient care issues in hepatology. This journal focuses mainly on new and emerging diagnostic and treatment options, protocols and molecular and cellular basis of disease pathogenesis, new technologies, in liver and biliary sciences. Hepatology International publishes original research articles related to clinical care and basic research; review articles; consensus guidelines for diagnosis and treatment; invited editorials, and controversies in contemporary issues. The journal does not publish case reports.. Hepatology International requests that all authors comply to Springer’s ethical policies. These ethical statements should be clearly indicated on all articles for all 3 ethics statements and for all authors mentioned by name. These statements should be placed at the end of each article just before the Reference section. Hepatology International is the official journal of the Asian Pacific Association for the Study of the Liver (APASL). This is a peer-reviewed journal featuring articles written by clinicians, clinical researchers and basic scientists is dedicated to research and patient care issues in hepatology. This journal will focus mainly on new and emerging technologies, cutting-edge science and advances in liver and biliary disorders. Types of articles published: Original Research Articles related to clinical care and basic research Review Articles Consensus guidelines for diagnosis and treatment Clinical cases, images Selected Author Summaries Video Submissions Now indexed by ISI A peer-reviewed journal with global reach and championed and edited by international experts in the field Focuses on the complete spectrum of contemporary clinical and basic science related issues in the field of adult and pediatric hepatobiliary and allied sciences, new and emerging technologies, cutting-edge innovations and future trends in liver and biliary disorders Publishes original research articles, editorials, reviews, consensus guidelines for diagnosis and treatment of liver diseases Abstracted and indexed in BFI List CLOCKSS CNKI CNPIEC Dimensions EBSCO Discovery Service EMBASE Google Scholar Japanese Science and Technology Agency (JST) Journal Citation Reports/Science Edition Medline Naver Norwegian Register for Scientific Journals and Series OCLC WorldCat Discovery Service Portico ProQuest SciTech Premium Collection ProQuest Toxicology Abstracts ProQuest-ExLibris Primo ProQuest-ExLibris Summon Reaxys SCImago SCOPUS Science Citation Index Expanded (SciSearch) Semantic Scholar TD Net Discovery Service UGC-CARE List (India)
f
PCC framework for search strategy development.
plos.figshare.com
xls
Updated Jun 9, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Phuc Pham-Duc; Kavitha Sriparamananthan (2023). PCC framework for search strategy development. [Dataset]. http://doi.org/10.1371/journal.pone.0259069.t001
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0259069.t001
Dataset updated
Jun 9, 2023
Dataset provided by
PLOS ONE
Authors
Phuc Pham-Duc; Kavitha Sriparamananthan
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
PCC framework for search strategy development.
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

Ulloa Roberto; Ulloa Roberto (2024). Examining bias perpetuation in academic search engines: an algorithm audit of Google and Semantic Scholar [Dataset]. http://doi.org/10.5281/zenodo.10636247

Data from: Examining bias perpetuation in academic search engines: an algorithm audit of Google and Semantic Scholar

Explore at:

bin, zip, csvAvailable download formats

Unique identifier

https://doi.org/10.5281/zenodo.10636247

Dataset updated

Feb 8, 2024

Dataset provided by

Zenodohttp://zenodo.org/

Authors

Ulloa Roberto; Ulloa Roberto

License

MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically

Description

Main dataset (main.csv)

The main file contains an entry (N=28530) per search result in all collected pages. It comprises the following columns:

id: Unique identifier of the file (corresponds to the last part of the filename)
filename: Name of the file associated with the row (the file is in serp_html.zip)
engine: The search engine used (Google Scholar or Semantic Scholar).
browser: The web browser used for the search (Firefox or Chrome)
region: The geographical region where the search was made.
year: The year when the search was made
month: The month when the search was made
day: The day when the search was made
query: The full search query that was used
query_type: The type of the search query (health or technology)
topic: The topic associated with the search query ('covid vaccines', 'cryptocurrencies', 'internet', 'social media', 'vaccines', 'coffee')
trt: Treatment variable associated with the search (benefits or risks).
url: The URL of the (article) search result
title: The title of the (article) search result.
authorship: The author(s) of the (article) search result.
abstract_id: Unique identifier for the abstract of the (article) search result which connects with annotated-abstracts_v0.6.xlsx
abstract_hash: Hash value of the abstract for data integrity
link_n: The total number of results in the search page
rank: The rank of the search result on the search engine results page.
annotation: Any annotations associated with the (article's abstract) search result. One of: '3. Confirms both benefits and risks', '4. Confirms neither benefits nor risks', '1. Confirms benefits', '2. Confirms risks', '5. Abstract not related to {topic}')
valence: -1 for abstracts containing risks, 0 for neutral abstracts, 1 for abstracts only containing benefits

Annotated abstracts (annotated-abstracts_v0.6.xlsx)

Manually annotated abstracts resulting from the searches.

Raw search engine result pages (serp_html.zip)

The zip contains an HTML per search engine result page collected (N=2853). See column filename from the main dataset.

Clear search

Close search

Google apps

Main menu

Data from: Examining bias perpetuation in academic search engines: an...

Main dataset (main.csv)

Annotated abstracts (annotated-abstracts_v0.6.xlsx)

Raw search engine result pages (serp_html.zip)

semantic-scholar-manajemen-proyek

COVID-19 Open Research Dataset (CORD-19)

Academic AI Tools Report

Tab2Know evaluation data

Ballistic Deflection Transistor Report

DORIS-MAE-v1

Datasets for Fair Name-Based Gender Prediction in Scientific Communities

semantic-scholar-teknik-sipil

Reviewed Forecasting Articles

Supplementary materials for the publication “Technological Sovereignty as a...

material-synthesis-papers-s2api-400K

Hepatology International CiteScore 2024-2025 - ResearchHelpDesk

Knowledge Graph of COVID-19 Literature

Data chart with key characteristics of included sources.

Hepatology International FAQ - ResearchHelpDesk

PCC framework for search strategy development.

Data from: Examining bias perpetuation in academic search engines: an algorithm audit of Google and Semantic Scholar

Main dataset (main.csv)

Annotated abstracts (annotated-abstracts_v0.6.xlsx)

Raw search engine result pages (serp_html.zip)