MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
The main file contains an entry (N=28530) per search result in all collected pages. It comprises the following columns:
Manually annotated abstracts resulting from the searches.
The zip contains an HTML per search engine result page collected (N=2853). See column filename from the main dataset.
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Semantic Scholar: Manajemen Proyek
Batch Fetched from Semantic Scholar API, with parameters:
query = '"manajemen proyek" | "project management" | "metodologi manajemen proyek" | "teknik manajemen proyek" | "alat manajemen proyek" | "keterampilan manajemen proyek" | "tantangan manajemen proyek" | "risiko manajemen proyek" | "studi kasus manajemen proyek" | "tren manajemen proyek" | "manajemen proyek di berbagai industri" | "manajemen proyek dalam organisasi" | "manajemen proyek… See the full description on the dataset page: https://huggingface.co/datasets/derhan/semantic-scholar-manajemen-proyek.
A full description of this dataset along with updated information can be found here.
In response to the COVID-19 pandemic, the Allen Institute for AI has partnered with leading research groups to prepare and distribute the COVID-19 Open Research Dataset (CORD-19), a free resource of scholarly articles, including full text content, about COVID-19 and the coronavirus family of viruses for use by the global research community.
This dataset is intended to mobilize researchers to apply recent advances in natural language processing to generate new insights in support of the fight against this infectious disease. The corpus will be updated weekly as new research is published in peer-reviewed publications and archival services like bioRxiv, medRxiv, and others.
By downloading this dataset you are agreeing to the Dataset license. Specific licensing information for individual articles in the dataset is available in the metadata file.
Additional licensing information is available on the PMC website, medRxiv website and bioRxiv website.
Dataset content:
Commercial use subset
Non-commercial use subset
PMC custom license subset
bioRxiv/medRxiv subset (pre-prints that are not peer reviewed)
Metadata file
Readme
Each paper is represented as a single JSON object (see schema file for details).
Description:
The dataset contains all COVID-19 and coronavirus-related research (e.g. SARS, MERS, etc.) from the following sources:
PubMed's PMC open access corpus using this query (COVID-19 and coronavirus research)
Additional COVID-19 research articles from a corpus maintained by the WHO
bioRxiv and medRxiv pre-prints using the same query as PMC (COVID-19 and coronavirus research)
We also provide a comprehensive metadata file of coronavirus and COVID-19 research articles with links to PubMed, Microsoft Academic and the WHO COVID-19 database of publications (includes articles without open access full text).
We recommend using metadata from the comprehensive file when available, instead of parsed metadata in the dataset. Please note the dataset may contain multiple entries for individual PMC IDs in cases when supplementary materials are available.
This repository is linked to the WHO database of publications on coronavirus disease and other resources, such as Microsoft Academic Graph, PubMed, and Semantic Scholar. A coalition including the Chan Zuckerberg Initiative, Georgetown University’s Center for Security and Emerging Technology, Microsoft Research, and the National Library of Medicine of the National Institutes of Health came together to provide this service.
Citation:
When including CORD-19 data in a publication or redistribution, please cite the dataset as follows:
In bibliography:
COVID-19 Open Research Dataset (CORD-19). 2020. Version 2020-MM-DD. Retrieved from https://pages.semanticscholar.org/coronavirus-research. Accessed YYYY-MM-DD. 10.5281/zenodo.3715505
In text:
(CORD-19, 2020)
The Allen Institute for AI and particularly the Semantic Scholar team will continue to provide updates to this dataset as the situation evolves and new research is released.
https://www.datainsightsmarket.com/privacy-policyhttps://www.datainsightsmarket.com/privacy-policy
The academic AI tools market is experiencing rapid growth, driven by increasing research output, the need for efficient literature reviews, and the demand for advanced analytical capabilities. The market, estimated at $1.5 billion in 2025, is projected to witness a Compound Annual Growth Rate (CAGR) of 25% from 2025 to 2033, reaching approximately $8 billion by 2033. This expansion is fueled by several key factors. Firstly, the rising number of academic publications and the complexity of research necessitate tools that can efficiently process and analyze vast amounts of information. Secondly, the integration of AI into various research stages, from literature review and data analysis to writing and editing, is significantly enhancing productivity and accuracy. Thirdly, the emergence of sophisticated AI models capable of understanding nuanced academic language and context is driving adoption among researchers and institutions. However, the market also faces challenges. High initial investment costs for both developers and users can be a barrier to entry. Concerns about data privacy and intellectual property rights in relation to AI-driven research are also significant. Furthermore, the market is currently fragmented, with numerous players competing for market share, leading to a highly dynamic competitive landscape. The continued success of companies like Consensus, Scite Assistant, Research Rabbit, Paperpal, Elicit, Perplexity, Semantic Scholar, Connected Papers, Scholarcy, Gemini, Julius, Bit AI, and Trinka will depend on their ability to differentiate their offerings, improve user experience, and effectively address the challenges related to data security and ethical considerations. Future growth will likely be driven by advancements in natural language processing (NLP), improved integration with existing research workflows, and the development of specialized tools catering to specific research fields.
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
This resource contains the following files:
- `venues.txt`: The venues that were use for selecting PDFs from the [Semantic Scholar Open Research Corpus](http://s2-public-api-prod.us-west-2.elasticbeanstalk.com/corpus/) that were published in the last 5 years.
- `extracted-tables.tar.gz`: All tables that we extracted using [Tabula](https://github.com/tabulapdf/tabula) from these PDFs.
- `sample-400.tar.gz`: A sample of these tables which we used for annotation.
- `ontology.ttl`: The annotation ontology in Turtle format.
- `all_metadata.jsonl`: Annotations for this sample in the JSON format described below.
- `labelqueries.csv`: The label queries used for weak annotation, created using the annotation interface. This CSV file contains 6 columns: a numeric ID, the label query template name (`template`), the template slots (`slots`), the label type (`label`), the annotation value (`value`), and a toggle for the interface (`enabled`).
- `labelqueries-sparql-templates.zip`: The label query templates. These are SPARQL queries with slots of the form `{{slot}}`. The templates in `labelqueries.csv` refer to these files.
- `rules.txt`: Datalog rules that we used for entity resolution.
- `tab2know-graph.nt.gz`: The final RDF graph that contains all extracted table structures, predicted table and column classes, and resolved entity links.
https://www.datainsightsmarket.com/privacy-policyhttps://www.datainsightsmarket.com/privacy-policy
The Ballistic Deflection Transistor (BDT) market is poised for significant growth, driven by the increasing demand for high-speed, low-power electronics in various applications. While precise market sizing data is unavailable, we can reasonably infer substantial expansion based on industry trends. The compound annual growth rate (CAGR) for similar emerging semiconductor technologies often falls within the range of 15-25% during periods of rapid innovation and adoption. Assuming a conservative CAGR of 18% and a 2025 market size of $500 million (a plausible estimate given the involvement of established players like Elsevier and the potential applications), the market is projected to reach approximately $1.7 billion by 2033. Key drivers include the need for enhanced performance in data centers, advanced computing, and 5G/6G infrastructure, pushing the boundaries of traditional transistor technologies. Furthermore, the miniaturization trend in electronics fuels the demand for BDTs, enabling more compact and efficient devices. While challenges remain in terms of manufacturing complexity and cost, ongoing research and development efforts are addressing these limitations. The market segmentation will likely see a strong focus on high-performance computing segments, and early adopters will be key to driving market expansion. The major restraints currently hindering widespread BDT adoption include the high manufacturing costs associated with advanced fabrication techniques and the inherent complexities in designing and integrating these novel transistors into existing systems. This necessitates specialized expertise and infrastructure, limiting immediate accessibility. However, these challenges are progressively being overcome as fabrication technologies mature and economies of scale emerge. The competitive landscape involves key players like Semantic Scholar, Elliott Sound Products (likely in a related but not directly BDT-focused capacity, hence requiring assumptions), and Elsevier (whose involvement suggests potential in licensing or data analysis support), signifying the presence of both technological expertise and investment interest in the field. Geographic distribution will likely mirror trends in semiconductor manufacturing, with regions like North America, Europe, and East Asia capturing significant market share.
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
In scientific research, the ability to effectively retrieve relevant documents based on complex, multifaceted queries is critical. Existing evaluation datasets for this task are limited, primarily due to the high costs and effort required to annotate resources that effectively represent complex queries. To address this, we propose a novel task, Scientific DOcument Retrieval using Multi-level Aspect-based quEries (DORIS-MAE), which is designed to handle the complex nature of user queries in scientific research.
Documentations for the DORIS-MAE dataset is publicly available at https://github.com/Real-Doris-Mae/Doris-Mae-Dataset. This upload contains both DORIS-MAE dataset version 1 and ada-002 vector embeddings for all queries and related abstracts (used in candidate pool creation). DORIS-MAE dataset version 1 is comprised of four main sub-datasets, each serving distinct purposes.
The Query dataset contains 100 human-crafted complex queries spanning across five categories: ML, NLP, CV, AI, and Composite. Each category has 20 associated queries. Queries are broken down into aspects (ranging from 3 to 9 per query) and sub-aspects (from 0 to 6 per aspect, with 0 signifying no further breakdown required). For each query, a corresponding candidate pool of relevant paper abstracts, ranging from 99 to 138, is provided.
The Corpus dataset is composed of 363,133 abstracts from computer science papers, published between 2011-2021, and sourced from arXiv. Each entry includes title, original abstract, URL, primary and secondary categories, as well as citation information retrieved from Semantic Scholar. A masked version of each abstract is also provided, facilitating the automated creation of queries.
The Annotation dataset includes generated annotations for all 165,144 question pairs, each comprising an aspect/sub-aspect and a corresponding paper abstract from the query's candidate pool. It includes the original text generated by ChatGPT (version chatgpt-3.5-turbo-0301) explaining its decision-making process, along with a three-level relevance score (e.g., 0,1,2) representing ChatGPT's final decision.
Finally, the Test Set dataset contains human annotations for a random selection of 250 question pairs used in hypothesis testing. It includes each of the three human annotators' final decisions, recorded as a three-level relevance score (e.g., 0,1,2).
The file "ada_embedding_for_DORIS-MAE_v1.pickle" contains text embeddings for the DORIS-MAE dataset, generated by OpenAI's ada-002 model. The structure of the file is as follows:
├── ada_embedding_for_DORIS-MAE_v1.pickle
├── "Query"
│ ├── query_id_1 (Embedding of query_1)
│ ├── query_id_2 (Embedding of query_2)
│ └── query_id_3 (Embedding of query_3)
│ .
│ .
│ .
└── "Corpus"
├── corpus_id_1 (Embedding of abstract_1)
├── corpus_id_2 (Embedding of abstract_2)
└── corpus_id_3 (Embedding of abstract_3)
.
.
.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The datasets support the evaluation of fair name-based gender prediction software across two scientific domains: energy transition and critical infrastructures. Each dataset contains public information on scientific authors and their gender, determined through manual validation and compared against predictions from multiple automated tools. The gender labels in these datasets represent the assessment of human annotators based solely on the information available (e.g., names) and do not necessarily reflect the self-identified gender or gender perception of the authors.The energy transition dataset is derived from papers retrieved from Scopus using the query terms “energy transition” OR “energy transformation.” The initial set of 17,591 papers was refined to 10,130 using the Energy Systems Ontology (ESO) (De Nicola et al., 2024), authored by 27,363 individuals. From this population, 1,000 authors were randomly selected for manual gender validation, resulting in 260 females, 575 males, and 165 of undetermined gender.The critical infrastructures dataset is based on all 380 papers published between 2006 and 2022 in the proceedings of the International Conference on Critical Information Infrastructures Security (CRITIS), involving 929 authors. All authors were manually validated, yielding 153 females, 768 males, and 8 of undetermined gender.The datasets are provided in JSON format, one file per domain:- ET-report.json contains records for the 1,000 manually validated authors in the energy transition dataset. Each record includes the author’s full name, the Semantic Scholar ID, the manual validation gender label, and the predictions from multiple automated gender prediction tools (Prediction Manager, Gender API, ChatGPT, and NamSor).- CRITIS-report.json contains records for all 929 manually validated authors in the critical infrastructures dataset, with the same structure and fields as the energy transition file, except without the Semantic Scholar ID.These structured files enable reproducible analysis, cross-tool performance comparisons, and integration into further research workflows.ReferenceDe Nicola, A., Patriarca, T., Fresilli, B., Opromolla, A., Guariglia Migliore, M., Leonardi, N., D’Agostino, G., Cellini, M., Mirenda, C., Tagliacozzo, S., Pisacane, L., Vassillo, C. (2024) D.1.2 - Report on gendered assessment of the energy systems knowledge community and EU policies for sustainable energy systems—Horizon Europe Project gEneSys—Transforming gendered interrelations of power and inequalities in transition pathways to sustainable energy systems, grant agreement no. 101094326. https://ec.europa.eu/research/participants/documents/downloadPublic?documentIds=080166e509765b4f&appId=PPGMS
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Semantic Scholar: Teknik Sipil
Batch fetched from Semantic Scholar API, with parameters:
query = '"manajemen proyek" | "manajemen konstruksi" | "manajemen proyek konstruksi" | "proyek konstruksi" | "proyek" | "konstruksi" | "analisa struktur beton" | "desain jembatan" | "ketahanan gempa" | "pemodelan struktur" | "material komposit" | "mekanika tanah" | "pondasi dalam" | "stabilitas lereng" | "perbaikan tanah" | "geoteknik lingkungan" | "rekayasa lalu lintas" | "perencanaan… See the full description on the dataset page: https://huggingface.co/datasets/derhan/semantic-scholar-teknik-sipil.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
As time series forecasting is an essential pillar in many decision-making fields, there is a broad range of academic work concerning forecasting. To this end, we reviewed 100 scientific papers published during the last 40 years. The papers were collected from the search engines: Google Schoolar, Mendely, IEEE Xplore, and Semantic Scholar. We select papers that have attracted at least on average 8.5 cites per year. The selected papers cover different topics, for example, supply chain demand, river flow, tourism, traffic, stock prices, electric/power demand, and many more.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The paper is planned to be published on https://www.preprints.org/ and further in the journal Energy Systems Research (https://esrj.ru/index.php/esr).! Use the JSON files at https://app.vosviewer.com/.Abstract. Energy security is often interpreted as independence from fossil fuels, but a one-sided approach can lead to dependence on high-value-added technologies. The development of artificial intelligence, which requires high energy consumption, chips and servers, is shifting competition in manufacturing and services from energy security to technological sovereignty. With the development of technology, sovereignty has shifted from military independence to freedom from economic coercion by other states and large corporations. The aim of this study was to identify suitable tools for analyzing abstract texts from tens of thousands of bibliometric records and pre-assessing relevant topics related to the energy sector to effectively analyze trends in technological sovereignty issues. In this paper 10 thousand bibliometric records for the year 2024, sorted by relevance and exported from the open abstract database Scilit on the query: “energy AND technology” in [Title, Abstract, Keyword], Content Type: JOURNAL-ARTICLE, English. Filters were applied on the “Subject” category most related to technology: Power Systems & Electric Vehicles, Energy Systems & Technologies, Electrical Energy Management, and Nuclear Technology & Instrumentation. The main theme of the bibliometric data analyzed was renewable energy. Twelve clusters were identified based on keywords, of which three were closest to the topic for which this research was funded: hydrogen, heat energy storage and greenhouse gas emissions. These clusters reflect keywords derived from both Yake! and PatternRank. The Yake! program outperforms PatternRank in terms of run time and representation of found keywords in abstract texts. The feasibility of using AnyAscii for text preprocessing is demonstrated. Using artificial intelligence to create text based on key phrases speeds up text processing, but the need for manual editing remains. The study showed that there is a need to expand data sources, e.g. using OnePetro for oil and gas topics, IEEE Xplore for energy systems issues, Semantic Scholar to evaluate the role of AI in the energy sector.
Query period: Dec 2024 ~ Jan 2024 Row example
{ "paperId": "fe6f7573766e3cd2f268faeb2ef4e24d8820cc48", "externalIds": { "CorpusId": 220514064 }, "publicationVenue": null, "url": "https://www.semanticscholar.org/paper/fe6f7573766e3cd2f268faeb2ef4e24d8820cc48", "title": "THE 14TH INTERNATIONAL CONFERENCE ON FLOW PROCESSING IN COMPOSITE MATERIALS THROUGH THICKNESS THERMOPLASTIC MELT IMPREGNATION: EFFECT OF FABRIC ARCHITECTURE ON COMPACTION AND PERMEABILITY UNDER… See the full description on the dataset page: https://huggingface.co/datasets/iknow-lab/material-synthesis-papers-s2api-400K.
Hepatology International CiteScore 2024-2025 - ResearchHelpDesk - Hepatology International is a peer-reviewed journal featuring articles written by clinicians, clinical researchers and basic scientists is dedicated to research and patient care issues in hepatology. This journal focuses mainly on new and emerging diagnostic and treatment options, protocols and molecular and cellular basis of disease pathogenesis, new technologies, in liver and biliary sciences. Hepatology International publishes original research articles related to clinical care and basic research; review articles; consensus guidelines for diagnosis and treatment; invited editorials, and controversies in contemporary issues. The journal does not publish case reports.. Hepatology International requests that all authors comply to Springer’s ethical policies. These ethical statements should be clearly indicated on all articles for all 3 ethics statements and for all authors mentioned by name. These statements should be placed at the end of each article just before the Reference section. Hepatology International is the official journal of the Asian Pacific Association for the Study of the Liver (APASL). This is a peer-reviewed journal featuring articles written by clinicians, clinical researchers and basic scientists is dedicated to research and patient care issues in hepatology. This journal will focus mainly on new and emerging technologies, cutting-edge science and advances in liver and biliary disorders. Types of articles published: Original Research Articles related to clinical care and basic research Review Articles Consensus guidelines for diagnosis and treatment Clinical cases, images Selected Author Summaries Video Submissions Now indexed by ISI A peer-reviewed journal with global reach and championed and edited by international experts in the field Focuses on the complete spectrum of contemporary clinical and basic science related issues in the field of adult and pediatric hepatobiliary and allied sciences, new and emerging technologies, cutting-edge innovations and future trends in liver and biliary disorders Publishes original research articles, editorials, reviews, consensus guidelines for diagnosis and treatment of liver diseases Abstracted and indexed in BFI List CLOCKSS CNKI CNPIEC Dimensions EBSCO Discovery Service EMBASE Google Scholar Japanese Science and Technology Agency (JST) Journal Citation Reports/Science Edition Medline Naver Norwegian Register for Scientific Journals and Series OCLC WorldCat Discovery Service Portico ProQuest SciTech Premium Collection ProQuest Toxicology Abstracts ProQuest-ExLibris Primo ProQuest-ExLibris Summon Reaxys SCImago SCOPUS Science Citation Index Expanded (SciSearch) Semantic Scholar TD Net Discovery Service UGC-CARE List (India)
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
IBM is providing free access to its COVID-19 Knowledge Graph integrating COVID-19 data from various sources: CORD-19 (https://www.semanticscholar.org/cord19) for literature, Clinicaltrials.gov (https://clinicaltrials.gov/) and WHO ICTRP (https://www.who.int/ictrp/search) for trials, DrugBank (https://www.drugbank.ca/) and GenBank (https://www.ncbi.nlm.nih.gov/genbank) for database data. Prepared search reports at the Reports Page are available on open access. However, to access the COVID-19 Knowledge Graph, it is necessary to request access.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Data chart with key characteristics of included sources.
Hepatology International FAQ - ResearchHelpDesk - Hepatology International is a peer-reviewed journal featuring articles written by clinicians, clinical researchers and basic scientists is dedicated to research and patient care issues in hepatology. This journal focuses mainly on new and emerging diagnostic and treatment options, protocols and molecular and cellular basis of disease pathogenesis, new technologies, in liver and biliary sciences. Hepatology International publishes original research articles related to clinical care and basic research; review articles; consensus guidelines for diagnosis and treatment; invited editorials, and controversies in contemporary issues. The journal does not publish case reports.. Hepatology International requests that all authors comply to Springer’s ethical policies. These ethical statements should be clearly indicated on all articles for all 3 ethics statements and for all authors mentioned by name. These statements should be placed at the end of each article just before the Reference section. Hepatology International is the official journal of the Asian Pacific Association for the Study of the Liver (APASL). This is a peer-reviewed journal featuring articles written by clinicians, clinical researchers and basic scientists is dedicated to research and patient care issues in hepatology. This journal will focus mainly on new and emerging technologies, cutting-edge science and advances in liver and biliary disorders. Types of articles published: Original Research Articles related to clinical care and basic research Review Articles Consensus guidelines for diagnosis and treatment Clinical cases, images Selected Author Summaries Video Submissions Now indexed by ISI A peer-reviewed journal with global reach and championed and edited by international experts in the field Focuses on the complete spectrum of contemporary clinical and basic science related issues in the field of adult and pediatric hepatobiliary and allied sciences, new and emerging technologies, cutting-edge innovations and future trends in liver and biliary disorders Publishes original research articles, editorials, reviews, consensus guidelines for diagnosis and treatment of liver diseases Abstracted and indexed in BFI List CLOCKSS CNKI CNPIEC Dimensions EBSCO Discovery Service EMBASE Google Scholar Japanese Science and Technology Agency (JST) Journal Citation Reports/Science Edition Medline Naver Norwegian Register for Scientific Journals and Series OCLC WorldCat Discovery Service Portico ProQuest SciTech Premium Collection ProQuest Toxicology Abstracts ProQuest-ExLibris Primo ProQuest-ExLibris Summon Reaxys SCImago SCOPUS Science Citation Index Expanded (SciSearch) Semantic Scholar TD Net Discovery Service UGC-CARE List (India)
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
PCC framework for search strategy development.
Not seeing a result you expected?
Learn how you can add new datasets to our index.
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
The main file contains an entry (N=28530) per search result in all collected pages. It comprises the following columns:
Manually annotated abstracts resulting from the searches.
The zip contains an HTML per search engine result page collected (N=2853). See column filename from the main dataset.