17 datasets found
  1. Data from: Examining bias perpetuation in academic search engines: an...

    • zenodo.org
    bin, csv, zip
    Updated Feb 8, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ulloa Roberto; Ulloa Roberto (2024). Examining bias perpetuation in academic search engines: an algorithm audit of Google and Semantic Scholar [Dataset]. http://doi.org/10.5281/zenodo.10636247
    Explore at:
    bin, zip, csvAvailable download formats
    Dataset updated
    Feb 8, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Ulloa Roberto; Ulloa Roberto
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Main dataset (main.csv)

    The main file contains an entry (N=28530) per search result in all collected pages. It comprises the following columns:

    1. id: Unique identifier of the file (corresponds to the last part of the filename)
    2. filename: Name of the file associated with the row (the file is in serp_html.zip)
    3. engine: The search engine used (Google Scholar or Semantic Scholar).
    4. browser: The web browser used for the search (Firefox or Chrome)
    5. region: The geographical region where the search was made.
    6. year: The year when the search was made
    7. month: The month when the search was made
    8. day: The day when the search was made
    9. query: The full search query that was used
    10. query_type: The type of the search query (health or technology)
    11. topic: The topic associated with the search query ('covid vaccines', 'cryptocurrencies', 'internet', 'social media', 'vaccines', 'coffee')
    12. trt: Treatment variable associated with the search (benefits or risks).
    13. url: The URL of the (article) search result
    14. title: The title of the (article) search result.
    15. authorship: The author(s) of the (article) search result.
    16. abstract_id: Unique identifier for the abstract of the (article) search result which connects with annotated-abstracts_v0.6.xlsx
    17. abstract_hash: Hash value of the abstract for data integrity
    18. link_n: The total number of results in the search page
    19. rank: The rank of the search result on the search engine results page.
    20. annotation: Any annotations associated with the (article's abstract) search result. One of: '3. Confirms both benefits and risks', '4. Confirms neither benefits nor risks', '1. Confirms benefits', '2. Confirms risks', '5. Abstract not related to {topic}')
    21. valence: -1 for abstracts containing risks, 0 for neutral abstracts, 1 for abstracts only containing benefits

    Annotated abstracts (annotated-abstracts_v0.6.xlsx)

    Manually annotated abstracts resulting from the searches.

    Raw search engine result pages (serp_html.zip)

    The zip contains an HTML per search engine result page collected (N=2853). See column filename from the main dataset.

  2. h

    semantic-scholar-manajemen-proyek

    • huggingface.co
    Updated Feb 2, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sederhana Gulo (2025). semantic-scholar-manajemen-proyek [Dataset]. https://huggingface.co/datasets/derhan/semantic-scholar-manajemen-proyek
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 2, 2025
    Authors
    Sederhana Gulo
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Semantic Scholar: Manajemen Proyek

    Batch Fetched from Semantic Scholar API, with parameters:

    query = '"manajemen proyek" | "project management" | "metodologi manajemen proyek" | "teknik manajemen proyek" | "alat manajemen proyek" | "keterampilan manajemen proyek" | "tantangan manajemen proyek" | "risiko manajemen proyek" | "studi kasus manajemen proyek" | "tren manajemen proyek" | "manajemen proyek di berbagai industri" | "manajemen proyek dalam organisasi" | "manajemen proyek… See the full description on the dataset page: https://huggingface.co/datasets/derhan/semantic-scholar-manajemen-proyek.

  3. Z

    COVID-19 Open Research Dataset (CORD-19)

    • data.niaid.nih.gov
    • live.european-language-grid.eu
    • +2more
    Updated Jul 22, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sebastian Kohlmeier (2024). COVID-19 Open Research Dataset (CORD-19) [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_3715505
    Explore at:
    Dataset updated
    Jul 22, 2024
    Dataset provided by
    Kyle Lo
    Sebastian Kohlmeier
    JJ Yang
    Lucy Lu Wang
    Description

    A full description of this dataset along with updated information can be found here.

    In response to the COVID-19 pandemic, the Allen Institute for AI has partnered with leading research groups to prepare and distribute the COVID-19 Open Research Dataset (CORD-19), a free resource of scholarly articles, including full text content, about COVID-19 and the coronavirus family of viruses for use by the global research community.

    This dataset is intended to mobilize researchers to apply recent advances in natural language processing to generate new insights in support of the fight against this infectious disease. The corpus will be updated weekly as new research is published in peer-reviewed publications and archival services like bioRxiv, medRxiv, and others.

    By downloading this dataset you are agreeing to the Dataset license. Specific licensing information for individual articles in the dataset is available in the metadata file.

    Additional licensing information is available on the PMC website, medRxiv website and bioRxiv website.

    Dataset content:

    Commercial use subset

    Non-commercial use subset

    PMC custom license subset

    bioRxiv/medRxiv subset (pre-prints that are not peer reviewed)

    Metadata file

    Readme

    Each paper is represented as a single JSON object (see schema file for details).

    Description:

    The dataset contains all COVID-19 and coronavirus-related research (e.g. SARS, MERS, etc.) from the following sources:

    PubMed's PMC open access corpus using this query (COVID-19 and coronavirus research)

    Additional COVID-19 research articles from a corpus maintained by the WHO

    bioRxiv and medRxiv pre-prints using the same query as PMC (COVID-19 and coronavirus research)

    We also provide a comprehensive metadata file of coronavirus and COVID-19 research articles with links to PubMed, Microsoft Academic and the WHO COVID-19 database of publications (includes articles without open access full text).

    We recommend using metadata from the comprehensive file when available, instead of parsed metadata in the dataset. Please note the dataset may contain multiple entries for individual PMC IDs in cases when supplementary materials are available.

    This repository is linked to the WHO database of publications on coronavirus disease and other resources, such as Microsoft Academic Graph, PubMed, and Semantic Scholar. A coalition including the Chan Zuckerberg Initiative, Georgetown University’s Center for Security and Emerging Technology, Microsoft Research, and the National Library of Medicine of the National Institutes of Health came together to provide this service.

    Citation:

    When including CORD-19 data in a publication or redistribution, please cite the dataset as follows:

    In bibliography:

    COVID-19 Open Research Dataset (CORD-19). 2020. Version 2020-MM-DD. Retrieved from https://pages.semanticscholar.org/coronavirus-research. Accessed YYYY-MM-DD. 10.5281/zenodo.3715505

    In text:

    (CORD-19, 2020)

    The Allen Institute for AI and particularly the Semantic Scholar team will continue to provide updates to this dataset as the situation evolves and new research is released.

  4. A

    Academic AI Tools Report

    • datainsightsmarket.com
    doc, pdf, ppt
    Updated Jul 5, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Data Insights Market (2025). Academic AI Tools Report [Dataset]. https://www.datainsightsmarket.com/reports/academic-ai-tools-504839
    Explore at:
    ppt, pdf, docAvailable download formats
    Dataset updated
    Jul 5, 2025
    Dataset authored and provided by
    Data Insights Market
    License

    https://www.datainsightsmarket.com/privacy-policyhttps://www.datainsightsmarket.com/privacy-policy

    Time period covered
    2025 - 2033
    Area covered
    Global
    Variables measured
    Market Size
    Description

    The academic AI tools market is experiencing rapid growth, driven by increasing research output, the need for efficient literature reviews, and the demand for advanced analytical capabilities. The market, estimated at $1.5 billion in 2025, is projected to witness a Compound Annual Growth Rate (CAGR) of 25% from 2025 to 2033, reaching approximately $8 billion by 2033. This expansion is fueled by several key factors. Firstly, the rising number of academic publications and the complexity of research necessitate tools that can efficiently process and analyze vast amounts of information. Secondly, the integration of AI into various research stages, from literature review and data analysis to writing and editing, is significantly enhancing productivity and accuracy. Thirdly, the emergence of sophisticated AI models capable of understanding nuanced academic language and context is driving adoption among researchers and institutions. However, the market also faces challenges. High initial investment costs for both developers and users can be a barrier to entry. Concerns about data privacy and intellectual property rights in relation to AI-driven research are also significant. Furthermore, the market is currently fragmented, with numerous players competing for market share, leading to a highly dynamic competitive landscape. The continued success of companies like Consensus, Scite Assistant, Research Rabbit, Paperpal, Elicit, Perplexity, Semantic Scholar, Connected Papers, Scholarcy, Gemini, Julius, Bit AI, and Trinka will depend on their ability to differentiate their offerings, improve user experience, and effectively address the challenges related to data security and ethical considerations. Future growth will likely be driven by advancements in natural language processing (NLP), improved integration with existing research workflows, and the development of specialized tools catering to specific research fields.

  5. E

    Tab2Know evaluation data

    • live.european-language-grid.eu
    csv
    Updated Sep 13, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2022). Tab2Know evaluation data [Dataset]. https://live.european-language-grid.eu/catalogue/corpus/18315
    Explore at:
    csvAvailable download formats
    Dataset updated
    Sep 13, 2022
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    This resource contains the following files:

    - `venues.txt`: The venues that were use for selecting PDFs from the [Semantic Scholar Open Research Corpus](http://s2-public-api-prod.us-west-2.elasticbeanstalk.com/corpus/) that were published in the last 5 years.

    - `extracted-tables.tar.gz`: All tables that we extracted using [Tabula](https://github.com/tabulapdf/tabula) from these PDFs.

    - `sample-400.tar.gz`: A sample of these tables which we used for annotation.

    - `ontology.ttl`: The annotation ontology in Turtle format.

    - `all_metadata.jsonl`: Annotations for this sample in the JSON format described below.

    - `labelqueries.csv`: The label queries used for weak annotation, created using the annotation interface. This CSV file contains 6 columns: a numeric ID, the label query template name (`template`), the template slots (`slots`), the label type (`label`), the annotation value (`value`), and a toggle for the interface (`enabled`).

    - `labelqueries-sparql-templates.zip`: The label query templates. These are SPARQL queries with slots of the form `{{slot}}`. The templates in `labelqueries.csv` refer to these files.

    - `rules.txt`: Datalog rules that we used for entity resolution.

    - `tab2know-graph.nt.gz`: The final RDF graph that contains all extracted table structures, predicted table and column classes, and resolved entity links.

  6. B

    Ballistic Deflection Transistor Report

    • datainsightsmarket.com
    doc, pdf, ppt
    Updated Jun 14, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Data Insights Market (2025). Ballistic Deflection Transistor Report [Dataset]. https://www.datainsightsmarket.com/reports/ballistic-deflection-transistor-891679
    Explore at:
    pdf, doc, pptAvailable download formats
    Dataset updated
    Jun 14, 2025
    Dataset authored and provided by
    Data Insights Market
    License

    https://www.datainsightsmarket.com/privacy-policyhttps://www.datainsightsmarket.com/privacy-policy

    Time period covered
    2025 - 2033
    Area covered
    Global
    Variables measured
    Market Size
    Description

    The Ballistic Deflection Transistor (BDT) market is poised for significant growth, driven by the increasing demand for high-speed, low-power electronics in various applications. While precise market sizing data is unavailable, we can reasonably infer substantial expansion based on industry trends. The compound annual growth rate (CAGR) for similar emerging semiconductor technologies often falls within the range of 15-25% during periods of rapid innovation and adoption. Assuming a conservative CAGR of 18% and a 2025 market size of $500 million (a plausible estimate given the involvement of established players like Elsevier and the potential applications), the market is projected to reach approximately $1.7 billion by 2033. Key drivers include the need for enhanced performance in data centers, advanced computing, and 5G/6G infrastructure, pushing the boundaries of traditional transistor technologies. Furthermore, the miniaturization trend in electronics fuels the demand for BDTs, enabling more compact and efficient devices. While challenges remain in terms of manufacturing complexity and cost, ongoing research and development efforts are addressing these limitations. The market segmentation will likely see a strong focus on high-performance computing segments, and early adopters will be key to driving market expansion. The major restraints currently hindering widespread BDT adoption include the high manufacturing costs associated with advanced fabrication techniques and the inherent complexities in designing and integrating these novel transistors into existing systems. This necessitates specialized expertise and infrastructure, limiting immediate accessibility. However, these challenges are progressively being overcome as fabrication technologies mature and economies of scale emerge. The competitive landscape involves key players like Semantic Scholar, Elliott Sound Products (likely in a related but not directly BDT-focused capacity, hence requiring assumptions), and Elsevier (whose involvement suggests potential in licensing or data analysis support), signifying the presence of both technological expertise and investment interest in the field. Geographic distribution will likely mirror trends in semiconductor manufacturing, with regions like North America, Europe, and East Asia capturing significant market share.

  7. DORIS-MAE-v1

    • zenodo.org
    • data.niaid.nih.gov
    bin, json
    Updated Oct 17, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jianyou Wang; Kaicheng Wang; Xiaoyue Wang; Prudhviraj Naidu; Leon Bergen; Ramamohan Paturi; Jianyou Wang; Kaicheng Wang; Xiaoyue Wang; Prudhviraj Naidu; Leon Bergen; Ramamohan Paturi (2023). DORIS-MAE-v1 [Dataset]. http://doi.org/10.5281/zenodo.8299749
    Explore at:
    bin, jsonAvailable download formats
    Dataset updated
    Oct 17, 2023
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Jianyou Wang; Kaicheng Wang; Xiaoyue Wang; Prudhviraj Naidu; Leon Bergen; Ramamohan Paturi; Jianyou Wang; Kaicheng Wang; Xiaoyue Wang; Prudhviraj Naidu; Leon Bergen; Ramamohan Paturi
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    In scientific research, the ability to effectively retrieve relevant documents based on complex, multifaceted queries is critical. Existing evaluation datasets for this task are limited, primarily due to the high costs and effort required to annotate resources that effectively represent complex queries. To address this, we propose a novel task, Scientific DOcument Retrieval using Multi-level Aspect-based quEries (DORIS-MAE), which is designed to handle the complex nature of user queries in scientific research.

    Documentations for the DORIS-MAE dataset is publicly available at https://github.com/Real-Doris-Mae/Doris-Mae-Dataset. This upload contains both DORIS-MAE dataset version 1 and ada-002 vector embeddings for all queries and related abstracts (used in candidate pool creation). DORIS-MAE dataset version 1 is comprised of four main sub-datasets, each serving distinct purposes.

    The Query dataset contains 100 human-crafted complex queries spanning across five categories: ML, NLP, CV, AI, and Composite. Each category has 20 associated queries. Queries are broken down into aspects (ranging from 3 to 9 per query) and sub-aspects (from 0 to 6 per aspect, with 0 signifying no further breakdown required). For each query, a corresponding candidate pool of relevant paper abstracts, ranging from 99 to 138, is provided.

    The Corpus dataset is composed of 363,133 abstracts from computer science papers, published between 2011-2021, and sourced from arXiv. Each entry includes title, original abstract, URL, primary and secondary categories, as well as citation information retrieved from Semantic Scholar. A masked version of each abstract is also provided, facilitating the automated creation of queries.

    The Annotation dataset includes generated annotations for all 165,144 question pairs, each comprising an aspect/sub-aspect and a corresponding paper abstract from the query's candidate pool. It includes the original text generated by ChatGPT (version chatgpt-3.5-turbo-0301) explaining its decision-making process, along with a three-level relevance score (e.g., 0,1,2) representing ChatGPT's final decision.

    Finally, the Test Set dataset contains human annotations for a random selection of 250 question pairs used in hypothesis testing. It includes each of the three human annotators' final decisions, recorded as a three-level relevance score (e.g., 0,1,2).

    The file "ada_embedding_for_DORIS-MAE_v1.pickle" contains text embeddings for the DORIS-MAE dataset, generated by OpenAI's ada-002 model. The structure of the file is as follows:

    ├── ada_embedding_for_DORIS-MAE_v1.pickle
    ├── "Query"
    │ ├── query_id_1 (Embedding of query_1)
    │ ├── query_id_2 (Embedding of query_2)
    │ └── query_id_3 (Embedding of query_3)
    │ .
    │ .
    │ .
    └── "Corpus"
    ├── corpus_id_1 (Embedding of abstract_1)
    ├── corpus_id_2 (Embedding of abstract_2)
    └── corpus_id_3 (Embedding of abstract_3)
    .
    .
    .

  8. f

    Datasets for Fair Name-Based Gender Prediction in Scientific Communities

    • figshare.com
    zip
    Updated Aug 14, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Maria Guariglia Migliore; Gregorio D'Agostino; Tatiana Patriarca; Antonio De Nicola (2025). Datasets for Fair Name-Based Gender Prediction in Scientific Communities [Dataset]. http://doi.org/10.6084/m9.figshare.29909603.v1
    Explore at:
    zipAvailable download formats
    Dataset updated
    Aug 14, 2025
    Dataset provided by
    figshare
    Authors
    Maria Guariglia Migliore; Gregorio D'Agostino; Tatiana Patriarca; Antonio De Nicola
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The datasets support the evaluation of fair name-based gender prediction software across two scientific domains: energy transition and critical infrastructures. Each dataset contains public information on scientific authors and their gender, determined through manual validation and compared against predictions from multiple automated tools. The gender labels in these datasets represent the assessment of human annotators based solely on the information available (e.g., names) and do not necessarily reflect the self-identified gender or gender perception of the authors.The energy transition dataset is derived from papers retrieved from Scopus using the query terms “energy transition” OR “energy transformation.” The initial set of 17,591 papers was refined to 10,130 using the Energy Systems Ontology (ESO) (De Nicola et al., 2024), authored by 27,363 individuals. From this population, 1,000 authors were randomly selected for manual gender validation, resulting in 260 females, 575 males, and 165 of undetermined gender.The critical infrastructures dataset is based on all 380 papers published between 2006 and 2022 in the proceedings of the International Conference on Critical Information Infrastructures Security (CRITIS), involving 929 authors. All authors were manually validated, yielding 153 females, 768 males, and 8 of undetermined gender.The datasets are provided in JSON format, one file per domain:- ET-report.json contains records for the 1,000 manually validated authors in the energy transition dataset. Each record includes the author’s full name, the Semantic Scholar ID, the manual validation gender label, and the predictions from multiple automated gender prediction tools (Prediction Manager, Gender API, ChatGPT, and NamSor).- CRITIS-report.json contains records for all 929 manually validated authors in the critical infrastructures dataset, with the same structure and fields as the energy transition file, except without the Semantic Scholar ID.These structured files enable reproducible analysis, cross-tool performance comparisons, and integration into further research workflows.ReferenceDe Nicola, A., Patriarca, T., Fresilli, B., Opromolla, A., Guariglia Migliore, M., Leonardi, N., D’Agostino, G., Cellini, M., Mirenda, C., Tagliacozzo, S., Pisacane, L., Vassillo, C. (2024) D.1.2 - Report on gendered assessment of the energy systems knowledge community and EU policies for sustainable energy systems—Horizon Europe Project gEneSys—Transforming gendered interrelations of power and inequalities in transition pathways to sustainable energy systems, grant agreement no. 101094326. https://ec.europa.eu/research/participants/documents/downloadPublic?documentIds=080166e509765b4f&appId=PPGMS

  9. h

    semantic-scholar-teknik-sipil

    • huggingface.co
    Updated Feb 2, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sederhana Gulo (2025). semantic-scholar-teknik-sipil [Dataset]. https://huggingface.co/datasets/derhan/semantic-scholar-teknik-sipil
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 2, 2025
    Authors
    Sederhana Gulo
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Semantic Scholar: Teknik Sipil

    Batch fetched from Semantic Scholar API, with parameters:

    query = '"manajemen proyek" | "manajemen konstruksi" | "manajemen proyek konstruksi" | "proyek konstruksi" | "proyek" | "konstruksi" | "analisa struktur beton" | "desain jembatan" | "ketahanan gempa" | "pemodelan struktur" | "material komposit" | "mekanika tanah" | "pondasi dalam" | "stabilitas lereng" | "perbaikan tanah" | "geoteknik lingkungan" | "rekayasa lalu lintas" | "perencanaan… See the full description on the dataset page: https://huggingface.co/datasets/derhan/semantic-scholar-teknik-sipil.

  10. Reviewed Forecasting Articles

    • zenodo.org
    Updated Mar 19, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    André Bauer; André Bauer (2020). Reviewed Forecasting Articles [Dataset]. http://doi.org/10.5281/zenodo.3716035
    Explore at:
    Dataset updated
    Mar 19, 2020
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    André Bauer; André Bauer
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    As time series forecasting is an essential pillar in many decision-making fields, there is a broad range of academic work concerning forecasting. To this end, we reviewed 100 scientific papers published during the last 40 years. The papers were collected from the search engines: Google Schoolar, Mendely, IEEE Xplore, and Semantic Scholar. We select papers that have attracted at least on average 8.5 cites per year. The selected papers cover different topics, for example, supply chain demand, river flow, tourism, traffic, stock prices, electric/power demand, and many more.

  11. f

    Supplementary materials for the publication “Technological Sovereignty as a...

    • figshare.com
    png
    Updated May 18, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Boris Chigarev (2025). Supplementary materials for the publication “Technological Sovereignty as a Current Energy Security Challenge. Preliminary analysis” [Dataset]. http://doi.org/10.6084/m9.figshare.29094296.v1
    Explore at:
    pngAvailable download formats
    Dataset updated
    May 18, 2025
    Dataset provided by
    figshare
    Authors
    Boris Chigarev
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The paper is planned to be published on https://www.preprints.org/ and further in the journal Energy Systems Research (https://esrj.ru/index.php/esr).! Use the JSON files at https://app.vosviewer.com/.Abstract. Energy security is often interpreted as independence from fossil fuels, but a one-sided approach can lead to dependence on high-value-added technologies. The development of artificial intelligence, which requires high energy consumption, chips and servers, is shifting competition in manufacturing and services from energy security to technological sovereignty. With the development of technology, sovereignty has shifted from military independence to freedom from economic coercion by other states and large corporations. The aim of this study was to identify suitable tools for analyzing abstract texts from tens of thousands of bibliometric records and pre-assessing relevant topics related to the energy sector to effectively analyze trends in technological sovereignty issues. In this paper 10 thousand bibliometric records for the year 2024, sorted by relevance and exported from the open abstract database Scilit on the query: “energy AND technology” in [Title, Abstract, Keyword], Content Type: JOURNAL-ARTICLE, English. Filters were applied on the “Subject” category most related to technology: Power Systems & Electric Vehicles, Energy Systems & Technologies, Electrical Energy Management, and Nuclear Technology & Instrumentation. The main theme of the bibliometric data analyzed was renewable energy. Twelve clusters were identified based on keywords, of which three were closest to the topic for which this research was funded: hydrogen, heat energy storage and greenhouse gas emissions. These clusters reflect keywords derived from both Yake! and PatternRank. The Yake! program outperforms PatternRank in terms of run time and representation of found keywords in abstract texts. The feasibility of using AnyAscii for text preprocessing is demonstrated. Using artificial intelligence to create text based on key phrases speeds up text processing, but the need for manual editing remains. The study showed that there is a need to expand data sources, e.g. using OnePetro for oil and gas topics, IEEE Xplore for energy systems issues, Semantic Scholar to evaluate the role of AI in the energy sector.

  12. h

    material-synthesis-papers-s2api-400K

    • huggingface.co
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    iknow-lab, material-synthesis-papers-s2api-400K [Dataset]. https://huggingface.co/datasets/iknow-lab/material-synthesis-papers-s2api-400K
    Explore at:
    Dataset authored and provided by
    iknow-lab
    Description

    Query period: Dec 2024 ~ Jan 2024 Row example

    { "paperId": "fe6f7573766e3cd2f268faeb2ef4e24d8820cc48", "externalIds": { "CorpusId": 220514064 }, "publicationVenue": null, "url": "https://www.semanticscholar.org/paper/fe6f7573766e3cd2f268faeb2ef4e24d8820cc48", "title": "THE 14TH INTERNATIONAL CONFERENCE ON FLOW PROCESSING IN COMPOSITE MATERIALS THROUGH THICKNESS THERMOPLASTIC MELT IMPREGNATION: EFFECT OF FABRIC ARCHITECTURE ON COMPACTION AND PERMEABILITY UNDER… See the full description on the dataset page: https://huggingface.co/datasets/iknow-lab/material-synthesis-papers-s2api-400K.

  13. r

    Hepatology International CiteScore 2024-2025 - ResearchHelpDesk

    • researchhelpdesk.org
    Updated May 8, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Research Help Desk (2022). Hepatology International CiteScore 2024-2025 - ResearchHelpDesk [Dataset]. https://www.researchhelpdesk.org/journal/sjr/585/hepatology-international
    Explore at:
    Dataset updated
    May 8, 2022
    Dataset authored and provided by
    Research Help Desk
    Description

    Hepatology International CiteScore 2024-2025 - ResearchHelpDesk - Hepatology International is a peer-reviewed journal featuring articles written by clinicians, clinical researchers and basic scientists is dedicated to research and patient care issues in hepatology. This journal focuses mainly on new and emerging diagnostic and treatment options, protocols and molecular and cellular basis of disease pathogenesis, new technologies, in liver and biliary sciences. Hepatology International publishes original research articles related to clinical care and basic research; review articles; consensus guidelines for diagnosis and treatment; invited editorials, and controversies in contemporary issues. The journal does not publish case reports.. Hepatology International requests that all authors comply to Springer’s ethical policies. These ethical statements should be clearly indicated on all articles for all 3 ethics statements and for all authors mentioned by name. These statements should be placed at the end of each article just before the Reference section. Hepatology International is the official journal of the Asian Pacific Association for the Study of the Liver (APASL). This is a peer-reviewed journal featuring articles written by clinicians, clinical researchers and basic scientists is dedicated to research and patient care issues in hepatology. This journal will focus mainly on new and emerging technologies, cutting-edge science and advances in liver and biliary disorders. Types of articles published: Original Research Articles related to clinical care and basic research Review Articles Consensus guidelines for diagnosis and treatment Clinical cases, images Selected Author Summaries Video Submissions Now indexed by ISI A peer-reviewed journal with global reach and championed and edited by international experts in the field Focuses on the complete spectrum of contemporary clinical and basic science related issues in the field of adult and pediatric hepatobiliary and allied sciences, new and emerging technologies, cutting-edge innovations and future trends in liver and biliary disorders Publishes original research articles, editorials, reviews, consensus guidelines for diagnosis and treatment of liver diseases Abstracted and indexed in BFI List CLOCKSS CNKI CNPIEC Dimensions EBSCO Discovery Service EMBASE Google Scholar Japanese Science and Technology Agency (JST) Journal Citation Reports/Science Edition Medline Naver Norwegian Register for Scientific Journals and Series OCLC WorldCat Discovery Service Portico ProQuest SciTech Premium Collection ProQuest Toxicology Abstracts ProQuest-ExLibris Primo ProQuest-ExLibris Summon Reaxys SCImago SCOPUS Science Citation Index Expanded (SciSearch) Semantic Scholar TD Net Discovery Service UGC-CARE List (India)

  14. M

    Knowledge Graph of COVID-19 Literature

    • catalog.midasnetwork.us
    json
    Updated Jul 6, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    MIDAS Coordination Center (2023). Knowledge Graph of COVID-19 Literature [Dataset]. https://catalog.midasnetwork.us/collection/130
    Explore at:
    jsonAvailable download formats
    Dataset updated
    Jul 6, 2023
    Dataset authored and provided by
    MIDAS Coordination Center
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Variables measured
    disease, COVID-19, pathogen, Homo sapiens, data service, host organism, clinical trial, infectious disease, sequence collection, Severe acute respiratory syndrome coronavirus 2
    Dataset funded by
    National Institute of General Medical Sciences
    Description

    IBM is providing free access to its COVID-19 Knowledge Graph integrating COVID-19 data from various sources: CORD-19 (https://www.semanticscholar.org/cord19) for literature, Clinicaltrials.gov (https://clinicaltrials.gov/) and WHO ICTRP (https://www.who.int/ictrp/search) for trials, DrugBank (https://www.drugbank.ca/) and GenBank (https://www.ncbi.nlm.nih.gov/genbank) for database data. Prepared search reports at the Reports Page are available on open access. However, to access the COVID-19 Knowledge Graph, it is necessary to request access.

  15. f

    Data chart with key characteristics of included sources.

    • plos.figshare.com
    xls
    Updated Jun 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Phuc Pham-Duc; Kavitha Sriparamananthan (2023). Data chart with key characteristics of included sources. [Dataset]. http://doi.org/10.1371/journal.pone.0259069.t002
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Phuc Pham-Duc; Kavitha Sriparamananthan
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Data chart with key characteristics of included sources.

  16. r

    Hepatology International FAQ - ResearchHelpDesk

    • researchhelpdesk.org
    Updated May 25, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Research Help Desk (2022). Hepatology International FAQ - ResearchHelpDesk [Dataset]. https://www.researchhelpdesk.org/journal/faq/585/hepatology-international
    Explore at:
    Dataset updated
    May 25, 2022
    Dataset authored and provided by
    Research Help Desk
    Description

    Hepatology International FAQ - ResearchHelpDesk - Hepatology International is a peer-reviewed journal featuring articles written by clinicians, clinical researchers and basic scientists is dedicated to research and patient care issues in hepatology. This journal focuses mainly on new and emerging diagnostic and treatment options, protocols and molecular and cellular basis of disease pathogenesis, new technologies, in liver and biliary sciences. Hepatology International publishes original research articles related to clinical care and basic research; review articles; consensus guidelines for diagnosis and treatment; invited editorials, and controversies in contemporary issues. The journal does not publish case reports.. Hepatology International requests that all authors comply to Springer’s ethical policies. These ethical statements should be clearly indicated on all articles for all 3 ethics statements and for all authors mentioned by name. These statements should be placed at the end of each article just before the Reference section. Hepatology International is the official journal of the Asian Pacific Association for the Study of the Liver (APASL). This is a peer-reviewed journal featuring articles written by clinicians, clinical researchers and basic scientists is dedicated to research and patient care issues in hepatology. This journal will focus mainly on new and emerging technologies, cutting-edge science and advances in liver and biliary disorders. Types of articles published: Original Research Articles related to clinical care and basic research Review Articles Consensus guidelines for diagnosis and treatment Clinical cases, images Selected Author Summaries Video Submissions Now indexed by ISI A peer-reviewed journal with global reach and championed and edited by international experts in the field Focuses on the complete spectrum of contemporary clinical and basic science related issues in the field of adult and pediatric hepatobiliary and allied sciences, new and emerging technologies, cutting-edge innovations and future trends in liver and biliary disorders Publishes original research articles, editorials, reviews, consensus guidelines for diagnosis and treatment of liver diseases Abstracted and indexed in BFI List CLOCKSS CNKI CNPIEC Dimensions EBSCO Discovery Service EMBASE Google Scholar Japanese Science and Technology Agency (JST) Journal Citation Reports/Science Edition Medline Naver Norwegian Register for Scientific Journals and Series OCLC WorldCat Discovery Service Portico ProQuest SciTech Premium Collection ProQuest Toxicology Abstracts ProQuest-ExLibris Primo ProQuest-ExLibris Summon Reaxys SCImago SCOPUS Science Citation Index Expanded (SciSearch) Semantic Scholar TD Net Discovery Service UGC-CARE List (India)

  17. f

    PCC framework for search strategy development.

    • plos.figshare.com
    xls
    Updated Jun 9, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Phuc Pham-Duc; Kavitha Sriparamananthan (2023). PCC framework for search strategy development. [Dataset]. http://doi.org/10.1371/journal.pone.0259069.t001
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 9, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Phuc Pham-Duc; Kavitha Sriparamananthan
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    PCC framework for search strategy development.

  18. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Ulloa Roberto; Ulloa Roberto (2024). Examining bias perpetuation in academic search engines: an algorithm audit of Google and Semantic Scholar [Dataset]. http://doi.org/10.5281/zenodo.10636247
Organization logo

Data from: Examining bias perpetuation in academic search engines: an algorithm audit of Google and Semantic Scholar

Related Article
Explore at:
bin, zip, csvAvailable download formats
Dataset updated
Feb 8, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Ulloa Roberto; Ulloa Roberto
License

MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically

Description

Main dataset (main.csv)

The main file contains an entry (N=28530) per search result in all collected pages. It comprises the following columns:

  1. id: Unique identifier of the file (corresponds to the last part of the filename)
  2. filename: Name of the file associated with the row (the file is in serp_html.zip)
  3. engine: The search engine used (Google Scholar or Semantic Scholar).
  4. browser: The web browser used for the search (Firefox or Chrome)
  5. region: The geographical region where the search was made.
  6. year: The year when the search was made
  7. month: The month when the search was made
  8. day: The day when the search was made
  9. query: The full search query that was used
  10. query_type: The type of the search query (health or technology)
  11. topic: The topic associated with the search query ('covid vaccines', 'cryptocurrencies', 'internet', 'social media', 'vaccines', 'coffee')
  12. trt: Treatment variable associated with the search (benefits or risks).
  13. url: The URL of the (article) search result
  14. title: The title of the (article) search result.
  15. authorship: The author(s) of the (article) search result.
  16. abstract_id: Unique identifier for the abstract of the (article) search result which connects with annotated-abstracts_v0.6.xlsx
  17. abstract_hash: Hash value of the abstract for data integrity
  18. link_n: The total number of results in the search page
  19. rank: The rank of the search result on the search engine results page.
  20. annotation: Any annotations associated with the (article's abstract) search result. One of: '3. Confirms both benefits and risks', '4. Confirms neither benefits nor risks', '1. Confirms benefits', '2. Confirms risks', '5. Abstract not related to {topic}')
  21. valence: -1 for abstracts containing risks, 0 for neutral abstracts, 1 for abstracts only containing benefits

Annotated abstracts (annotated-abstracts_v0.6.xlsx)

Manually annotated abstracts resulting from the searches.

Raw search engine result pages (serp_html.zip)

The zip contains an HTML per search engine result page collected (N=2853). See column filename from the main dataset.

Search
Clear search
Close search
Google apps
Main menu