20 datasets found

d
August 2025 data-update for "Updated science-wide author databases of...
elsevier.digitalcommonsdata.com
Updated Sep 19, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
John P.A. Ioannidis (2025). August 2025 data-update for "Updated science-wide author databases of standardized citation indicators" [Dataset]. http://doi.org/10.17632/btchxktzyw.8
Explore at:
Unique identifier
https://doi.org/10.17632/btchxktzyw.8
Dataset updated
Sep 19, 2025
Authors
John P.A. Ioannidis
License
Attribution-NonCommercial 3.0 (CC BY-NC 3.0)https://creativecommons.org/licenses/by-nc/3.0/
License information was derived automatically
Description
Citation metrics are widely used and misused. We have created a publicly available database of top-cited scientists that provides standardized information on citations, h-index, co-authorship adjusted hm-index, citations to papers in different authorship positions and a composite indicator (c-score). Separate data are shown for career-long and, separately, for single recent year impact. Metrics with and without self-citations and ratio of citations to citing papers are given and data on retracted papers (based on Retraction Watch database) as well as citations to/from retracted papers have been added. Scientists are classified into 22 scientific fields and 174 sub-fields according to the standard Science-Metrix classification. Field- and subfield-specific percentiles are also provided for all scientists with at least 5 papers. Career-long data are updated to end-of-2024 and single recent year data pertain to citations received during calendar year 2024. The selection is based on the top 100,000 scientists by c-score (with and without self-citations) or a percentile rank of 2% or above in the sub-field. This version (7) is based on the August 1, 2025 snapshot from Scopus, updated to end of citation year 2024. This work uses Scopus data. Calculations were performed using all Scopus author profiles as of August 1, 2025. If an author is not on the list, it is simply because the composite indicator value was not high enough to appear on the list. It does not mean that the author does not do good work. PLEASE ALSO NOTE THAT THE DATABASE HAS BEEN PUBLISHED IN AN ARCHIVAL FORM AND WILL NOT BE CHANGED. The published version reflects Scopus author profiles at the time of calculation. We thus advise authors to ensure that their Scopus profiles are accurate. REQUESTS FOR CORRECIONS OF THE SCOPUS DATA (INCLUDING CORRECTIONS IN AFFILIATIONS) SHOULD NOT BE SENT TO US. They should be sent directly to Scopus, preferably by use of the Scopus to ORCID feedback wizard (https://orcid.scopusfeedback.com/) so that the correct data can be used in any future annual updates of the citation indicator databases. The c-score focuses on impact (citations) rather than productivity (number of publications) and it also incorporates information on co-authorship and author positions (single, first, last author). If you have additional questions, see attached file on FREQUENTLY ASKED QUESTIONS. Finally, we alert users that all citation metrics have limitations and their use should be tempered and judicious. For more reading, we refer to the Leiden manifesto: https://www.nature.com/articles/520429a
Data from: Higher Education Institutions in Poland Dataset
zenodo.org
data.niaid.nih.gov
zip
Updated Sep 11, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jackson Junior; Jackson Junior; Paulina Rutecka; Paulina Rutecka; Pedro Pinto; Pedro Pinto (2023). Higher Education Institutions in Poland Dataset [Dataset]. http://doi.org/10.5281/zenodo.8333574
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.8333574
Dataset updated
Sep 11, 2023
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Jackson Junior; Jackson Junior; Paulina Rutecka; Paulina Rutecka; Pedro Pinto; Pedro Pinto
License
Attribution 1.0 (CC BY 1.0)https://creativecommons.org/licenses/by/1.0/
License information was derived automatically
Area covered
Poland
Description
Higher Education Institutions in Poland Dataset

This repository contains a dataset of higher education institutions in Poland. The dataset comprises 131 public higher education institutions and 216 private higher education institutions in Poland. The data was collected on 24/11/2022.
This dataset was compiled in response to a cybersecurity investigation of Poland's higher education institutions' websites [1]. The data is being made publicly available to promote open science principles [2].

Data

The data includes the following fields for each institution:

Id: A unique identifier assigned to each institution.

Region: The federal state in which the institution is located.

Name: The original name of the institution in Polish.

Name_EN: The international name of the institution in English.

Category: Indicates whether the institution is public or private.

Url: The website of the institution.

Methodology

The dataset was compiled using data from two primary sources:

Public Higher Education Institutions: Data was sourced from the official website of the Ministry of Education and Science of Poland [3].

Private Higher Education Institutions: Data was obtained from the RAD-on system, which is part of the Integrated Information Network on Science and Higher Education [4].

For the international names in English, the following methodology was employed:

Both Polish and English names were retained for each institution. This decision was based on the fact that some universities do not have their English versions available in official sources.

English names were primarily sourced from:

The Polish National Agency for Academic Exchange's official document [5].

The website Studies in English [6].

Official websites of the respective Higher Education Institutions.

In instances where English names were not readily available from the aforementioned sources, the GPT-3.5 model was employed to propose suitable names. These proposed names are distinctly marked in blue within the dataset file (hei_poland_en.xls).

Usage

This data is available under the Creative Commons Zero (CC0) license and can be used for academic research purposes. We encourage the sharing of knowledge and the advancement of research in this field by adhering to open science principles [2].

If you use this data in your research, please cite the source and include a link to this repository. To properly attribute this data, please use the following DOI:
10.5281/zenodo.8333573

Contribution

If you have any updates or corrections to the data, please feel free to open a pull request or contact us directly. Let's work together to keep this data accurate and up-to-date.

Acknowledgment

We would like to express our gratitude to the Ministry of Education and Science of Poland and the RAD-on system for providing the information used in this dataset.

We would like to acknowledge the support of the Norte Portugal Regional Operational Programme (NORTE 2020), under the PORTUGAL 2020 Partnership Agreement, through the European Regional Development Fund (ERDF), within the project "Cybers SeC IP" (NORTE-01-0145-FEDER-000044). This study was also developed as part of the Master in Cybersecurity Program at the Polytechnic University of Viana do Castelo, Portugal.

References

Pending.

S. Bezjak, A. Clyburne-Sherin, P. Conzett, P. Fernandes, E. Görögh, K. Helbig, B. Kramer, I. Labastida, K. Niemeyer, F. Psomopoulos, T. Ross-Hellauer, R. Schneider, J. Tennant, E. Verbakel, H. Brinken, and L. Heller, Open Science Training Handbook. Zenodo, Apr. 2018. [Online]. Available: [https://doi.org/10.5281/zenodo.1212496]

Ministry of Education and Science of Poland. "Wykaz uczelni publicznych nadzorowanych przez Ministra właściwego ds. szkolnictwa wyższego - publiczne uczelnie akademickie." Nov 2022. [Online]. Available: https://www.gov.pl/web/edukacja-i-nauka/wykaz-uczelni-publicznych-nadzorowanych-przez-ministra-wlasciwego-ds-szkolnictwa-wyzszego-publiczne-uczelnie-akademickie

RAD-on System. "Dane instytucji systemu szkolnictwa wyższego i nauki." Nov 2022. [Online]. Available: https://radon.nauka.gov.pl/dane/instytucje-systemu-szkolnictwa-wyzszego-i-nauki

Polish National Agency for Academic Exchange. "List of the university-type HEIs." 2023. [Online]. Available: https://nawa.gov.pl/images/Aktualnosci/2023/Att.-2.-List-of-the-university-type-HEIs.pdf

Studies in English. [Online]. Available: www.studies-in-english.pl
Higher Education Institutions in Germany Dataset 2025
zenodo.org
zip
Updated Oct 22, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jackson Barreto; Jackson Barreto; Rodrigo Costa; Rodrigo Costa (2025). Higher Education Institutions in Germany Dataset 2025 [Dataset]. http://doi.org/10.5281/zenodo.14960633
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.14960633
Dataset updated
Oct 22, 2025
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Jackson Barreto; Jackson Barreto; Rodrigo Costa; Rodrigo Costa
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Area covered
Germany
Description
Higher Education Institutions in Germany Dataset 2025

This repository contains a dataset of higher education institutions in Germany. This includes 400 higher education institutions in Germany, including universities, universities of applied sciences and Higher Institutes as Higher Institute of Engineering, Higher Institute of biotechnologies and few others. This dataset was compiled in response to a cybersecurity investigation of Germany higher education institutions' websites [1]. The data is being made publicly available to promote open science principles [2].

Data

The data includes the following fields for each institution:

ETER_Id: A unique identifier assigned to each institution.

Name: The full name of the institution.

Category: Indicates whether the institution is public or private.

Institution_Category_Standardized: Indicates whether the institution is University, University of applied sciences or other.

Member_of_European_University_alliance: Indicates if the institution is member of European University Alliance (A kind of collaborative higher education institutions network in Europe).

Url: The website of the institution.

NUTS2: Nomenclature of Territorial Units for Statistics (NUTS): A classification by the European Union to divide member states' territories into statistical units. The NUTS system has three hierarchical levels, with NUTS2 being the second level.

NUTS2_Label_2016: Refers to the classification of regions at the NUTS2 level according to the 2016 criteria set by the European Union.

NUTS2_Label_2021: Refers to the classification of regions at the NUTS2 level according to the 2021 criteria set by the European Union.

NUTS3: Nomenclature of Territorial Units for Statistics (NUTS): A classification by the European Union to divide member states' territories into statistical units. The NUTS system has three hierarchical levels, with NUTS3 being the third level.

NUTS3_Label_2016: Refers to the classification of regions at the NUTS3 level according to the 2016 criteria set by the European Union.

NUTS3_Label_2021: Refers to the classification of regions at the NUTS3 level according to the 2021 criteria set by the European Union.

Methodology

The methodology for creating the dataset involved obtaining data from two sources: The European Higher Education Sector Observatory (ETER)[3]. The data was collected on December 26, 2024, the Eurostat for NUTS - Nomenclature of territorial units for statistics 2013-16[4] and 2021[5].

This section outlines the methodology used to create the dataset for Higher Education Institutions (HEIs) in France. The dataset consolidates information from various sources, processes the data, and enriches it to provide accurate and reliable insights.

Data Sources

ETER Database: The primary dataset was sourced from the ETER database, containing detailed information about HEIs in Europe.

File: eter-export-2021-DE.xlsx

Eurostat NUTS Data: Two datasets from Eurostat were used for regional information:

NUTS 2013-2016 regions: NUTS2013-NUTS2016.xlsx

NUTS 2021 regions: NUTS2021.xlsx

Data Cleaning and Preprocessing Column Renaming Columns in the raw dataset were renamed for consistency and readability. Examples include:

ETER ID → ETER_ID

Institution Name → Name

Legal status → Category

Value Replacement

HEI Categories: The Category column was cleaned, with government-dependent institutions classified as "public."

Standardized Institution Categories: Mapped numerical values to descriptive labels such as "University" and "University of applied sciences."

European University Alliance Membership: Replaced binary values with "Yes" or "No."

Handling Missing or Incorrect Data

Specific entries with missing or incorrect data were updated manually based on their ETER_ID. For instance:

Adjusted URLs for entries like DE0012 (updated to www.zeppelin-university.com)

Adjusted URLs for entries like FR0906 (updated to hmtm.de)

Adjusted URLs for entries like FR0104 (updated to www.dhfpg.de)

Adjusted URLs for entries like FR0466 (updated to fhf.brandenburg.de)

Adjusted URLs for entries like FR0907 (updated to hr-nord.niedersachsen.de)

Adjusted URLs for entries like FR0333 (updated to www.srh-university.de)

Regional Data Integration

Merged NUTS 2016 and NUTS 2021 data to enrich the dataset with regional labels.

Final Dataset The final dataset was saved as a CSV file: germany-heis.csv, encoded in UTF-8 for compatibility. It includes detailed information about HEIs in France, their categories, regional affiliations, and membership in European alliances.

Summary This methodology ensures that the dataset is accurate, consistent, and enriched with valuable regional and institutional details. The final dataset is intended to serve as a reliable resource for analyzing French HEIs.

Usage

This data is available under the Creative Commons Zero (CC0) license and can be used for any purpose, including academic research purposes. We encourage the sharing of knowledge and the advancement of research in this field by adhering to open science principles [2].

If you use this data in your research, please cite the source and include a link to this repository. To properly attribute this data, please use the following DOI: 10.5281/zenodo.7614862

Contribution

If you have any updates or corrections to the data, please feel free to open a pull request or contact us directly. Let's work together to keep this data accurate and up-to-date.

Acknowledgment

We would like to acknowledge the support of the Norte Portugal Regional Operational Programme (NORTE 2020), under the PORTUGAL 2020 Partnership Agreement, through the European Regional Development Fund (ERDF), within the project "Cybers SeC IP" (NORTE-01-0145-FEDER-000044). This study was also developed as part of the Master in Cybersecurity Program at the Instituto Politécnico de Viana do Castelo, Portugal.

References

Pending

S. Bezjak, A. Clyburne-Sherin, P. Conzett, P. Fernandes, E. Görögh, K. Helbig, B. Kramer, I. Labastida, K. Niemeyer, F. Psomopoulos, T. Ross-Hellauer, R. Schneider, J. Tennant, E. Verbakel, H. Brinken, and L. Heller, Open Science Training Handbook. Zenodo, Apr. 2018. [Online]. Available: [https://doi.org/10.5281/zenodo.1212496]

The European Higher Education Sector Observatory, Dec 2024. Available: ETER

NUTS - Nomenclature of territorial units for statistics, Dec 2024. Available: NUTS-2013-2016

NUTS - Nomenclature of territorial units for statistics, Dec 2024. Available: NUTS-2021.
Higher Education Institutions in the USA
kaggle.com
zip
Updated Apr 8, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jackson Júnior (2023). Higher Education Institutions in the USA [Dataset]. https://www.kaggle.com/datasets/jacksonbarreto/higher-education-institutions-in-the-usa/data
Explore at:
zip(35907 bytes)Available download formats
Dataset updated
Apr 8, 2023
Authors
Jackson Júnior
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Area covered
United States
Description
Higher Education Institutions in the United States of America Dataset

This repository contains a dataset of higher education institutions in the United States of America. This dataset was compiled in response to a cybersecurity research of American higher education institutions' websites [1]. The data is being made publicly available to promote open science principles [2].

Data

The data includes the following fields for each institution:

Id: A unique identifier assigned to each institution.

Region: The federal state in which the institution is located.

Name: The full name of the institution.

Category: Indicates whether the institution is public or private.

Url: The website of the institution.

Methodology

The dataset was obtained from the Higher Education Integrated Data System (IPEDS) website [3], which is administered by the National Center for Education Statistics (NCES). NCES serves as the primary federal entity for collecting and analyzing education-related data in the United States. The data was collected on February 2, 2023.

The initial list of institutions was derived from the IPEDS database using the following criteria: (1) US institutions only, (2) degree-granting institutions, primarily bachelor's or higher, and (3) industry classification, which includes: public 4 - year or above, private not-for-profit 4 years or more, private for-profit 4 years or more, public 2 years, private not-for-profit 2 years, private for-profit 2 years, public less than 2 years, private not-for-profit for-profit less than 2 years and private for-profit less than 2 years.

The following variables have been added to the list of institutions: Control of the institution, state abbreviation, degree-granting status, Status of the institution, and Institution's internet website address. This resulted in a report with 1,979 institutions.

The institution's status was labeled with the following values: A (Active), N (New), R (Restored), M (Closed in the current year), C (Combined with another institution), D (Deleted out of business), I (Inactive due to hurricane-related issues), O (Outside IPEDS scope), P (Potential new/add institution), Q (Potential institution reestablishment), W (Potential addition outside IPEDS scope), X ( Potential restoration outside the scope of IPEDS) and G (Perfect Children's Campus).

A filter was applied to the report to retain only institutions with an A, N, or R status, resulting in 1,978 institutions. Finally, a data cleaning process was applied, which involved removing the whitespace at the beginning and end of cell content and duplicate whitespace. The final data were compiled into the dataset included in this repository.

Usage

This data is available under the Creative Commons Zero (CC0) license and can be used for any purpose, including academic research purposes. We encourage the sharing of knowledge and the advancement of research in this field by adhering to open science principles [2].

If you use this data in your research, please cite the source and include a link to this repository. To properly attribute this data, please use the following DOI: 10.5281/zenodo.7614862

Contribution

If you have any updates or corrections to the data, please feel free to open a pull request or contact us directly. Let's work together to keep this data accurate and up-to-date.

Acknowledgment

We would like to acknowledge the support of the Norte Portugal Regional Operational Programme (NORTE 2020), under the PORTUGAL 2020 Partnership Agreement, through the European Regional Development Fund (ERDF), within the project "Cybers SeC IP" (NORTE-01-0145-FEDER-000044). This study was also developed as part of the Master in Cybersecurity Program at the Instituto Politécnico de Viana do Castelo, Portugal.

References

Pending.

S. Bezjak, A. Clyburne-Sherin, P. Conzett, P. Fernandes, E. Görögh, K. Helbig, B. Kramer, I. Labastida, K. Niemeyer, F. Psomopoulos, T. Ross-Hellauer, R. Schneider, J. Tennant, E. Verbakel, H. Brinken, and L. Heller, Open Science Training Handbook. Zenodo, Apr. 2018. [Online]. Available: [https://doi.org/10.5281/zenodo.1212496]

Integrated Postsecondary Education Data System, "Compare Institutions", Fev 2023. [online]. Available: https://nces.ed.gov/ipeds/use-the-data
H
Extracted Data From: TRI Basic Data Plus Files
dataverse.harvard.edu
Updated Feb 18, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
US EPA (2025). Extracted Data From: TRI Basic Data Plus Files [Dataset]. http://doi.org/10.7910/DVN/PFMTZR
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.7910/DVN/PFMTZR
Dataset updated
Feb 18, 2025
Dataset provided by
Harvard Dataverse
Authors
US EPA
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Time period covered
Jan 1, 2016 - Dec 31, 2023
Area covered
United States
Description
This submission includes publicly available data extracted in its original form. Please reference the Related Publication listed here for source and citation information: TRI basic plus data files guides. (2024, September 18). US EPA. https://www.epa.gov/toxics-release-inventory-tri-program/tri-basic-plus-data-files-guides If you have questions about the underlying data stored here, please contact tri.help@epa.gov. If you have questions or recommendations related to this metadata entry and extracted data, please contact the CAFE Data Management team at: climatecafe@bu.edu. "EPA has been collecting Toxics Release Inventory (TRI) data since 1987. The "Basic Plus" data files include ten file types that collectively contain all of the data fields from the TRI Reporting Form R and Form A. The files themselves are in tab-delimited .txt format and then compressed into a .zip file. 1a: Facility, chemical, releases and other waste management summary information 1b: Chemical activities and uses 2a: On- and off-site disposal, treatment, energy recovery, and recycling information; non-production-related waste managed quantities; production/activity ratio information; and source reduction activities 2b: Detailed on-site waste treatment methods and efficiency 3a: Transfers off site for disposal and further waste management 3b: Transfers to Publicly Owned Treatment Works (POTWs) (RY1987 - RY2010) 3c: Transfers to Publicly Owned Treatment Works (POTWs) (RY2011 - Present) 4: Facility information 5: Optional information on source reduction, recycling and pollution control (RY2005 - Present) 6: Additional miscellaneous and optional information (RY2010 - Present) Quantities of dioxin and dioxin-like compounds are reported in grams, while all other chemicals are reported in pounds. This webpage contains the most recent versions of all TRI data files; facilities may revise previous years' TRI submissions if necessary, and any such changes will be reflected in these files. For this reason, data contained in these files may differ from data used to construct the TRI National Analysis." [Quote from https://www.epa.gov/toxics-release-inventory-tri-program/tri-basic-plus-data-files-calendar-years-1987-present]
l
Data from: Where do engineering students really get their information? :...
opal.latrobe.edu.au
researchdata.edu.au
pdf
Updated Mar 13, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Clayton Bolitho (2025). Where do engineering students really get their information? : using reference list analysis to improve information literacy programs [Dataset]. http://doi.org/10.4225/22/59d45f4b696e4
Explore at:
pdfAvailable download formats
Unique identifier
https://doi.org/10.4225/22/59d45f4b696e4
Dataset updated
Mar 13, 2025
Dataset provided by
La Trobe
Authors
Clayton Bolitho
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
BackgroundAn understanding of the resources which engineering students use to write their academic papers provides information about student behaviour as well as the effectiveness of information literacy programs designed for engineering students. One of the most informative sources of information which can be used to determine the nature of the material that students use is the bibliography at the end of the students’ papers. While reference list analysis has been utilised in other disciplines, few studies have focussed on engineering students or used the results to improve the effectiveness of information literacy programs. Gadd, Baldwin and Norris (2010) found that civil engineering students undertaking a finalyear research project cited journal articles more than other types of material, followed by books and reports, with web sites ranked fourth. Several studies, however, have shown that in their first year at least, most students prefer to use Internet search engines (Ellis & Salisbury, 2004; Wilkes & Gurney, 2009).PURPOSEThe aim of this study was to find out exactly what resources undergraduate students studying civil engineering at La Trobe University were using, and in particular, the extent to which students were utilising the scholarly resources paid for by the library. A secondary purpose of the research was to ascertain whether information literacy sessions delivered to those students had any influence on the resources used, and to investigate ways in which the information literacy component of the unit can be improved to encourage students to make better use of the resources purchased by the Library to support their research.DESIGN/METHODThe study examined student bibliographies for three civil engineering group projects at the Bendigo Campus of La Trobe University over a two-year period, including two first-year units (CIV1EP – Engineering Practice) and one-second year unit (CIV2GR – Engineering Group Research). All units included a mandatory library session at the start of the project where student groups were required to meet with the relevant faculty librarian for guidance. In each case, the Faculty Librarian highlighted specific resources relevant to the topic, including books, e-books, video recordings, websites and internet documents. The students were also shown tips for searching the Library catalogue, Google Scholar, LibSearch (the LTU Library’s research and discovery tool) and ProQuest Central. Subject-specific databases for civil engineering and science were also referred to. After the final reports for each project had been submitted and assessed, the Faculty Librarian contacted the lecturer responsible for the unit, requesting copies of the student bibliographies for each group. References for each bibliography were then entered into EndNote. The Faculty Librarian grouped them according to various facets, including the name of the unit and the group within the unit; the material type of the item being referenced; and whether the item required a Library subscription to access it. A total of 58 references were collated for the 2010 CIV1EP unit; 237 references for the 2010 CIV2GR unit; and 225 references for the 2011 CIV1EP unit.INTERIM FINDINGSThe initial findings showed that student bibliographies for the three group projects were primarily made up of freely available internet resources which required no library subscription. For the 2010 CIV1EP unit, all 58 resources used were freely available on the Internet. For the 2011 CIV1EP unit, 28 of the 225 resources used (12.44%) required a Library subscription or purchase for access, while the second-year students (CIV2GR) used a greater variety of resources, with 71 of the 237 resources used (29.96%) requiring a Library subscription or purchase for access. The results suggest that the library sessions had little or no influence on the 2010 CIV1EP group, but the sessions may have assisted students in the 2011 CIV1EP and 2010 CIV2GR groups to find books, journal articles and conference papers, which were all represented in their bibliographiesFURTHER RESEARCHThe next step in the research is to investigate ways to increase the representation of scholarly references (found by resources other than Google) in student bibliographies. It is anticipated that such a change would lead to an overall improvement in the quality of the student papers. One way of achieving this would be to make it mandatory for students to include a specified number of journal articles, conference papers, or scholarly books in their bibliographies. It is also anticipated that embedding La Trobe University’s Inquiry/Research Quiz (IRQ) using a constructively aligned approach will further enhance the students’ research skills and increase their ability to find suitable scholarly material which relates to their topic. This has already been done successfully (Salisbury, Yager, & Kirkman, 2012)CONCLUSIONS & CHALLENGESThe study shows that most students rely heavily on the free Internet for information. Students don’t naturally use Library databases or scholarly resources such as Google Scholar to find information, without encouragement from their teachers, tutors and/or librarians. It is acknowledged that the use of scholarly resources doesn’t automatically lead to a high quality paper. Resources must be used appropriately and students also need to have the skills to identify and synthesise key findings in the existing literature and relate these to their own paper. Ideally, students should be able to see the benefit of using scholarly resources in their papers, and continue to seek these out even when it’s not a specific assessment requirement, though it can’t be assumed that this will be the outcome.REFERENCESEllis, J., & Salisbury, F. (2004). Information literacy milestones: building upon the prior knowledge of first-year students. Australian Library Journal, 53(4), 383-396.Gadd, E., Baldwin, A., & Norris, M. (2010). The citation behaviour of civil engineering students. Journal of Information Literacy, 4(2), 37-49.Salisbury, F., Yager, Z., & Kirkman, L. (2012). Embedding Inquiry/Research: Moving from a minimalist model to constructive alignment. Paper presented at the 15th International First Year in Higher Education Conference, Brisbane. Retrieved from http://www.fyhe.com.au/past_papers/papers12/Papers/11A.pdfWilkes, J., & Gurney, L. J. (2009). Perceptions and applications of information literacy by first year applied science students. Australian Academic & Research Libraries, 40(3), 159-171.
Data used in the manuscript - A Hierarchical Approach for Evaluating Athlete...
zenodo.org
data.niaid.nih.gov
csv, txt
Updated Jun 20, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Thiago de Paula Oliveira; Thiago de Paula Oliveira (2023). Data used in the manuscript - A Hierarchical Approach for Evaluating Athlete Performance with an Application in Elite Basketball [Dataset]. http://doi.org/10.5281/zenodo.8056757
Explore at:
txt, csvAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.8056757
Dataset updated
Jun 20, 2023
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Thiago de Paula Oliveira; Thiago de Paula Oliveira
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The database contains several datasets and files with NBA statistical data spanning four seasons (2015-2016 to 2018-2019). These datasets were procured from the Basketball Reference database (https://www.basketball-reference.com/), a publicly accessible source of NBA data.

The main file, `dat.cleaned.csv`, includes the Win/Loss records for all thirty NBA teams, along with box scores and advanced statistics. The data captured over the four seasons correspond to about 4,920 regular-season games. A distinguishing feature of this dataset is the repeated measurements per player within a team across the seasons. However, it's important to note that these repeated measurements are not independent, necessitating the use of hierarchical modelling to properly handle the data.

Two sets of additional text files (`per_2017.txt`, `per_2018.txt`, `rpm_2017.txt`, `rpm_2018.txt`) provide specific metrics for player performance. The 'PER' files contain the Athlete Efficiency Rating (PER) for the years 2017 and 2018. The 'RPM' files contain the ESPN-developed score called Real Plus-Minus (RPM) for the same years.

However, potential biases or limitations within the datasets should be acknowledged. For instance, the Basketball Reference website might not include data from some matches or may exclude certain variables, potentially affecting the quality and accuracy of the dataset.
Data from: WikiHist.html: English Wikipedia's Full Revision History in HTML...
zenodo.org
application/gzip, zip
Updated Jun 8, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Blagoj Mitrevski; Tiziano Piccardi; Tiziano Piccardi; Robert West; Robert West; Blagoj Mitrevski (2020). WikiHist.html: English Wikipedia's Full Revision History in HTML Format [Dataset]. http://doi.org/10.5281/zenodo.3605388
Explore at:
application/gzip, zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.3605388
Dataset updated
Jun 8, 2020
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Blagoj Mitrevski; Tiziano Piccardi; Tiziano Piccardi; Robert West; Robert West; Blagoj Mitrevski
License
Attribution 3.0 (CC BY 3.0)https://creativecommons.org/licenses/by/3.0/
License information was derived automatically
Description
Introduction

Wikipedia is written in the wikitext markup language. When serving content, the MediaWiki software that powers Wikipedia parses wikitext to HTML, thereby inserting additional content by expanding macros (templates and modules). Hence, researchers who intend to analyze Wikipedia as seen by its readers should work with HTML, rather than wikitext. Since Wikipedia’s revision history is made publicly available by the Wikimedia Foundation exclusively in wikitext format, researchers have had to produce HTML themselves, typically by using Wikipedia’s REST API for ad-hoc wikitext-to-HTML parsing. This approach, however, (1) does not scale to very large amounts of data and (2) does not correctly expand macros in historical article revisions.

We have solved these problems by developing a parallelized architecture for parsing massive amounts of wikitext using local instances of MediaWiki, enhanced with the capacity of correct historical macro expansion. By deploying our system, we produce and hereby release WikiHist.html, English Wikipedia’s full revision history in HTML format. It comprises the HTML content of 580M revisions of 5.8M articles generated from the full English Wikipedia history spanning 18 years from 1 January 2001 to 1 March 2019. Boilerplate content such as page headers, footers, and navigation sidebars are not included in the HTML.

For more details, please refer to the description below and to the dataset paper:
Blagoj Mitrevski, Tiziano Piccardi, and Robert West: WikiHist.html: English Wikipedia’s Full Revision History in HTML Format. In Proceedings of the 14th International AAAI Conference on Web and Social Media, 2020.
https://arxiv.org/abs/2001.10256

When using the dataset, please cite the above paper.

Dataset summary

The dataset consists of three parts:

English Wikipedia’s full revision history parsed to HTML,

a table of the creation times of all Wikipedia pages (page_creation_times.json.gz),

a table that allows for resolving redirects for any point in time (redirect_history.json.gz).

Part 1 is our main contribution, while parts 2 and 3 contain complementary information that can aid researchers in their analyses.

Getting the data

Parts 2 and 3 are hosted in this Zenodo repository. Part 1 is 7TB large -- too large for Zenodo -- and is therefore hosted externally on the Internet Archive. For downloading part 1, you have multiple options:

use a Torrent-based solution as described at https://github.com/epfl-dlab/WikiHist.html - Option 1 (recommended approach for the full download)

use our download scripts by following the instructions at https://github.com/epfl-dlab/WikiHist.html - Option 2 (the download scripts allow you to bulk-download all data as well as to download revisions for specific articles only).

download it manually from the Internet Archive at https://archive.org/details/WikiHist_html

Dataset details

Part 1: HTML revision history
The data is split into 558 directories, named enwiki-20190301-pages-meta-history$1.xml-p$2p$3, where $1 ranges from 1 to 27, and p$2p$3 indicates that the directory contains revisions for pages with ids between $2 and $3. (This naming scheme directly mirrors that of the wikitext revision history from which WikiHist.html was derived.) Each directory contains a collection of gzip-compressed JSON files, each containing 1,000 HTML article revisions. Each row in the gzipped JSON files represents one article revision. Rows are sorted by page id, and revisions of the same page are sorted by revision id. We include all revision information from the original wikitext dump, the only difference being that we replace the revision’s wikitext content with its parsed HTML version (and that we store the data in JSON rather than XML):

id: id of this revision

parentid: id of revision modified by this revision

timestamp: time when revision was made

cont_username: username of contributor

cont_id: id of contributor

cont_ip: IP address of contributor

comment: comment made by contributor

model: content model (usually "wikitext")

format: content format (usually "text/x-wiki")

sha1: SHA-1 hash

title: page title

ns: namespace (always 0)

page_id: page id

redirect_title: if page is redirect, title of target page

html: revision content in HTML format

Part 2: Page creation times (page_creation_times.json.gz)

This JSON file specifies the creation time of each English Wikipedia page. It can, e.g., be used to determine if a wiki link was blue or red at a specific time in the past. Format:

page_id: page id

title: page title

ns: namespace (0 for articles)

timestamp: time when page was created

Part 3: Redirect history (redirect_history.json.gz)

This JSON file specifies all revisions corresponding to redirects, as well as the target page to which the respective page redirected at the time of the revision. This information is useful for reconstructing Wikipedia's link network at any time in the past. Format:

page_id: page id of redirect source

title: page title of redirect source

ns: namespace (0 for articles)

revision_id: revision id of redirect source

timestamp: time at which redirect became active

redirect: page title of redirect target (in 1st item of array; 2nd item can be ignored)

The repository also contains two additional files, metadata.zip and mysql_database.zip. These two files are not part of WikiHist.html per se, and most users will not need to download them manually. The file metadata.zip is required by the download script (and will be fetched by the script automatically), and mysql_database.zip is required by the code used to produce WikiHist.html. The code that uses these files is hosted at GitHub, but the files are too big for GitHub and are therefore hosted here.

WikiHist.html was produced by parsing the 1 March 2019 dump of https://dumps.wikimedia.org/enwiki/20190301 from wikitext to HTML. That old dump is not available anymore on Wikimedia's servers, so we make a copy available at https://archive.org/details/enwiki-20190301-original-full-history-dump_dlab .
IAEA’s MODARIA II Soil-Plant Transfer Parameter Dataset for Tropical...
data.iaea.org
csv
Updated Oct 16, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The International Atomic Energy Agency (2025). IAEA’s MODARIA II Soil-Plant Transfer Parameter Dataset for Tropical Environments [Dataset]. https://data.iaea.org/dataset/modaria
Explore at:
csvAvailable download formats
Dataset updated
Oct 16, 2025
Dataset provided by
International Atomic Energy Agencyhttp://iaea.org/
License
https://www.iaea.org/about/terms-of-usehttps://www.iaea.org/about/terms-of-use
Description
Transfer parameter data are essential inputs to models for radiological environmental impact assessment and are used to quantify the extent of movement of radionuclides from one environmental compartment to another, relevant for estimating the transfer of radionuclides through food chains to humans. International data compilations (i.e. transfer parameter data for temperate environments from the IAEA Technical Reports Series No. 472) have been frequently used by regulators and professionals in radiological impact assessment for dose estimations when site-specific data are not available.

Description of Dataset Content

This international compilation of radionuclide and stable isotope soil-plant concentration ratio values for tropical environments is an output of IAEA’s Modelling and Data for Radiological Impact Assessments II (MODARIA II) programme (2016–2019) and is based on the Köppen-Geiger climate classification (BECK et al. 2018). The IAEA’s MODARIA II tropical dataset is associated with IAEA’s TECDOC-1979: Soil-Plant Transfer of Radionuclides in Non-Temperate Environments (2021).

The dataset contains over 7000 records. Each record includes a concentration ratio value and/or plant and soil concentrations, provided in a consistent way, from which a concentration ratio value can be calculated. Where available, environmentally relevant information is included with each record to allow categorization of the plant and soil data into more refined subsets.

The dataset includes information for over 100 plant species, including many that are common crops and staple foods in tropical environments. Data are included for all measured plant compartments, including both the edible and inedible parts of the plant.

Information in the dataset is organized into 41 fields, with individual lines in ascending order of their source reference. These headline fields are described in the associated ‘Explanatory Information’ file, while a description of the dataset content can be found in the ‘Dataset content‘ file.

Use of Data

The IAEA’s MODARIA II tropical dataset is freely available for all external users, without prejudice to the applicable IAEA’ Terms and Conditions.

Any use of the tropical dataset shall contain appropriate acknowledgement of the data source(s) and the IAEA’s Data Platform [online].

The preferred form of citation of IAEA’s MODARIA II tropical dataset is:

INTERNATIONAL ATOMIC ENERGY AGENCY, IAEA’s MODARIA II Soil-Plant Transfer Parameter Dataset for Tropical Environments. In: IAEA Data Platform [online], IAEA, Vienna (2021). https://ckan.iaea.production.datopian.com/dataset/modaria

Acknowledgement

The IAEA wishes to express its gratitude to C. Doering (Australia) for compiling this comprehensive dataset as part of the activities of Working Group 4 of the MODARIA II programme, led by B. Howard (UK). The IAEA also gratefully acknowledges the valuable contributions of J. Twining (Australia) and S. Rout (India).

How do I Search for Data?

The ‘Explore’ tab, on the right corner of the first page, allows users to explore the data online (by selecting the ‘Preview’ tab or by accessing the CSV-type file under ‘Data and Resources’) or to retrieve the whole dataset as a CSV-type file by selecting the ‘Download’ tab. To search for data in the online preview mode, use the filter control panel on the left of the ‘Data Explorer’ page. Click ‘Download’ at the top right of the page to download the data as a CSV file.

Get Involved

Would you like to learn more about the IAEA’s MODARIA II tropical dataset, or do you have questions related to data compilation? Get in touch with the IAEA’s team at the Terrestrial Environmental Radiochemistry Laboratory and at the Assessment and Management of Environmental Releases Unit by accessing the ‘Contact dataset maintainer’ tab. We will get back to you soon.
H
Extracted Data From: Open FEMA Data Emergency Management, Preparedness, and...
datasetcatalog.nlm.nih.gov
Updated Feb 28, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Agency, Emergency Management (2025). Extracted Data From: Open FEMA Data Emergency Management, Preparedness, and Alerts [Dataset]. http://doi.org/10.7910/DVN/PJM1IF
Explore at:
Unique identifier
https://doi.org/10.7910/DVN/PJM1IF
Dataset updated
Feb 28, 2025
Authors
Agency, Emergency Management
Description
This submission includes publicly available data extracted in its original form. Please reference the Related Publication listed here for source and citation information "This page is intended to be a one stop shop for OpenFEMA—FEMA’s data delivery platform which provides datasets to the public in open, industry standard, machine-readable formats. Datasets are available in multiple formats, including downloadable files and through an easily digestible Application Programming Interface (API). Each page includes information about the specific dataset, links to downloadable files, a data dictionary describing each field, and an endpoint link (if applicable for those datasets available via the API)." [Quote from https://www.fema.gov/about/openfema/data-sets] This dataset includes: Annual NFIRS Public Data Emergency Management Performance Grants IPAWS Archived Alerts National Household Survey Non-Disaster and Assistance to Firefighter Grants Sandy PMO: Disaster Relief Appropriations Act of 2013 (Sandy Supplemental Bill) Financial Data Please review the updated PDF/HTML documentation for more details. (2025-01-31)
Basic and other measurements of radiation at Neumayer Station in 2013,...
search.datacite.org
doi.pangaea.de
Updated 2014
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Gert König-Langlo (2014). Basic and other measurements of radiation at Neumayer Station in 2013, reference list of 12 datasets [Dataset]. http://doi.org/10.1594/pangaea.150002
Explore at:
Unique identifier
https://doi.org/10.1594/pangaea.150002
Dataset updated
2014
Dataset provided by
DataCitehttps://www.datacite.org/
PANGAEA - Data Publisher for Earth & Environmental Science
Authors
Gert König-Langlo
License
Attribution 3.0 (CC BY 3.0)https://creativecommons.org/licenses/by/3.0/
License information was derived automatically
Area covered

Description
The data collection contains 12 links to basic and other measurements of radiation at Neumayer station from the Baseline Surface Radiation Network (BSRN). It covers all available measurements from the time period between 2013-01 and 2013-12.Any user who accepts the BSRN data release guidelines (http://bsrn.awi.de/data/conditions-of-data-release) may ask Amelie Driemel (mailto:Amelie.Driemel@awi.de) to obtain an account to download these datasets.
Z
Corpus of Decisions: International Court of Justice (CD-ICJ)
data.niaid.nih.gov
zenodo.org
Updated Jul 11, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Fobbe, Sean (2024). Corpus of Decisions: International Court of Justice (CD-ICJ) [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_3826444
Explore at:
Dataset updated
Jul 11, 2024
Dataset provided by
Ludwig-Maximilians-Universität München
Authors
Fobbe, Sean
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
Overview

The Corpus of Decisions: International Court of Justice (CD-ICJ) collects and presents for the first time in human- and machine-readable form all published decisions of the International Court of Justice (ICJ). Among these are judgments, advisory opinions and orders, as well as their respective appended minority opinions (declarations, separate opinions and dissenting opinions). The International Court of Justice has kindly made available these documents on its website.

The International Court of Justice (ICJ) is the primary judicial organ of the United Nations and one of the most consequential courts in international law. Called the 'World Court' by many, it is the only international court with general thematic jurisdiction. While critics occasionally note the lack of compulsory jurisdiction and sharply limited access to the Court, its opinions continue to have an outsize influence on the modern interpretation, codification and wider development of international law. Every international legal textbook covers the workings and decisions of the Court in extenso and participation in international moot courts such as the Philip C. Jessup Moot Court without regular reference to and citation of the International Court of Justice's decisions is unthinkable.

This data set is designed to be complementary to and fully compatible with the Corpus of Decisions: Permanent Court of International Justice (CD-PCIJ), which is also available open access.

Citation

A peer-reviewed academic paper describing the construction and relevance of the data set entitled 'Introducing Twin Corpora of Decisions for the International Court of Justice (ICJ) and the Permanent Court of International Justice (PCIJ)' was published open access in the Journal of Empirical Legal Studies (JELS). It is also available in print at JELS 2022, Vol. 19, No. 2, pp. 491-524.

If you use the data set for academic work, please cite both the JELS paper and the precise version of the data set you used for your analysis.

New in Version 2023-10-22

Full recompilation of data set

Scope extended up to case number 190: Aerial Incident of 8 January 2020 (Canada, Sweden, Ukraine and United Kingdom v. Islamic Republic of Iran)

Add fix for lowercase components in URL basenames

Updated Python toolchain

Align docker config with Debian as host system

Updates

The CD-ICJ cannot be updated anymore, as the website of the Court is blocking automated access to its decisions. Updates will resume if this situation changes.

In case of serious errors an update will be provided at the earliest opportunity and a highlighted advisory issued on the Zenodo page of the current version. Minor errors will be documented in the GitHub issue tracker and fixed with the next scheduled release.

The CD-ICJ is versioned according to the day the data was acquired from the website of the Court, in the ISO format YYYY-MM-DD. Its initial release version was 2021-11-23.

Notifications regarding new and updated data sets will be published on my academic website at www.seanfobbe.com or via Mastodon at @seanfobbe@fediscience.org

Recommended Variants

Practitioners PDF_BEST_MajorityOpinions

Traditional Scholars PDF_BEST_FULL

Quantitative Analysts CSV_BEST_FULL

Please refer to the Codebook regarding the relative merits of each variant. All variants are available in either English or French. Unless you have very specific needs you should only use the variants denoted 'BEST' for serious work.

Features

Fully compatible with the Corpus of Decisions: Permanent Court of International Justice (CD-PCIJ)

27 variables

Public Domain (CC-Zero 1.0)

Open and platform independent file formats (PDF, TXT, CSV)

Extensive Codebook

Compilation Report explains construction and validation of the data set in detail

Large number of diagrams for all purposes (see the 'ANALYSIS' archive)

Diagrams are available as PDF (for printing) and PNG (for web display), tables are available as CSV for easy readability by humans and machines

Secure cryptographic signatures

Publication of full source code (Open Source)

Key Metrics

Version: 2023-10-22

Temporal Coverage: 31 July 1947 – 16 October 2023

Documents: 2289 (English) / 2276 (French)

Tokens: 15,767,521 (English) / 16,239,787 (French)

File Formats: PDF, TXT, CSV

Source Code and Compilation Report

With every compilation of the full data set an extensive Compilation Report is created in a professionally layouted PDF format (comparable to the Codebook). The Compilation Report includes the Source Code, comments and explanations of design decisions, relevant computational results, exact timestamps and a table of contents with clickable internal hyperlinks to each section. The Compilation Report and Source Code are published under the same DOI.

For details of the construction and validation of the data set please refer to the Compilation Report.

Disclaimer

This data set has been created by Mr Seán Fobbe using documents available on the website of the International Court of Justice (https://www.icj-cij.org). It is a personal academic initiative and is not associated with or endorsed by the International Court of Justice or the United Nations.

The Court accepts no responsibility or liability arising out of my use, or that of third parties, of the documents and information produced, used or published on the Zenodo website. Neither the Court nor its staff members nor its contractors may be held responsible or liable for the consequences, financial or otherwise, resulting from the use of these documents and information.

Academic Publications (Fobbe)

Website — www.seanfobbe.com

Open Data — zenodo.org/communities/sean-fobbe-data

Code Repository — zenodo.org/communities/sean-fobbe-code

Regular Publications — zenodo.org/communities/sean-fobbe-publications

Contact

Did you discover any errors? Do you have suggestions on how to improve the data set? You can either post these to the Issue Tracker on GitHub or write me an e-mail at fobbe-data@posteo.de
U.S. Facebook data requests from government agencies 2013-2023
statista.com
de.statista.com
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Stacy Jo Dixon, U.S. Facebook data requests from government agencies 2013-2023 [Dataset]. https://www.statista.com/topics/1164/social-networks/
Explore at:
Dataset provided by
Statistahttp://statista.com/
Authors
Stacy Jo Dixon
Description
Facebook received 73,390 user data requests from federal agencies and courts in the United States during the second half of 2023. The social network produced some user data in 88.84 percent of requests from U.S. federal authorities. The United States accounts for the largest share of Facebook user data requests worldwide.
Number of internet and social media users worldwide 2025
statista.com
abripper.com
Updated Oct 16, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2025). Number of internet and social media users worldwide 2025 [Dataset]. https://www.statista.com/statistics/617136/digital-population-worldwide/
Explore at:
Dataset updated
Oct 16, 2025
Dataset authored and provided by
Statistahttp://statista.com/
Area covered
World
Description
As of October 2025, 6.04 billion individuals worldwide were internet users, which amounted to 73.2 percent of the global population. Of this total, 5.66 billion, or 68.7 percent of the world's population, were social media users. Global internet usage Connecting billions of people worldwide, the internet is a core pillar of the modern information society. Northern Europe ranked first among worldwide regions by the share of the population using the internet in 2025. In the Netherlands, Norway, and Saudi Arabia, 99 percent of the population used the internet as of February 2025. North Korea was at the opposite end of the spectrum, with virtually no internet usage penetration among the general population, ranking last worldwide. Eastern Asia was home to the largest number of online users worldwide—over 1.34 billion at the latest count. Southern Asia ranked second, with around 1.2 billion internet users. China, India, and the United States rank ahead of other countries worldwide by the number of internet users. Worldwide internet user demographics As of 2024, the share of female internet users worldwide was 65 percent, five percent less than that of men. Gender disparity in internet usage was bigger in African countries, with around a 10-percent difference. Worldwide regions, like the Commonwealth of Independent States and Europe, showed a smaller usage gap between these two genders. As of 2024, global internet usage was higher among individuals between 15 and 24 years old across all regions, with young people in Europe representing the most considerable usage penetration, 98 percent. In comparison, the worldwide average for the age group of 15 to 24 years was 79 percent. The income level of the countries was also an essential factor for internet access, as 93 percent of the population of the countries with high income reportedly used the internet, as opposed to only 27 percent of the low-income markets.

MIT AI news dataset

kaggle.com

zip

Updated Aug 21, 2025

Facebook

Twitter

Click to copy link

Link copied

Cite

Yousef Fawzi (2025). MIT AI news dataset [Dataset]. https://www.kaggle.com/datasets/losif01/mit-ai-news-dataset

Explore at:

zip(808350 bytes)Available download formats

Dataset updated

Aug 21, 2025

Authors

Yousef Fawzi

License

Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
License information was derived automatically

Description

📄 Dataset Description

This dataset contains articles scraped from the Massachusetts Institute of Technology (MIT) News website, specifically focusing on topics related to Artificial Intelligence, Machine Learning, Robotics, and Emerging Technologies.

The data was collected from the MIT News topic page:
👉 https://news.mit.edu/topic/artificial-intelligence2

Each entry includes: - Title of the article - Author(s) - Publication date - Summary (dek) - Full article body text - URL to the original article - Link to related research paper (e.g., Nature, Science) when available

The dataset spans multiple research domains, including: - AI for drug discovery & healthcare - Protein language models - Sustainable AI and eco-driving - Robotics and embodied intelligence - Chemistry and materials science - Climate and clean energy

This dataset is ideal for: - Natural Language Processing (NLP) tasks (summarization, topic modeling, sentiment analysis) - Trend analysis in AI and scientific research - Text classification and information retrieval - Educational projects and AI literacy - Knowledge graph construction of AI research

⚠️ Important Notes

All content is copyright of MIT News and is shared under non-commercial, educational use only.
This dataset was collected respectfully, with delays between requests, in accordance with MIT’s robots.txt and ethical web scraping practices.
The full text of articles is included to enable research, but users are encouraged to cite original sources and visit the MIT News website for the latest updates.

📁 Columns

Column	Description
`title`	Article headline
`author`	Author(s) of the article
`publication_date`	Human-readable publication date
`datetime`	ISO-formatted publication timestamp
`summary`	Article summary (lead paragraph)
`body`	Full article text
`paper_link`	URL to the related research paper (e.g., Nature)
`url`	Direct link to the MIT News article

🔗 Source

Official Website: https://news.mit.edu
Topic Page: https://news.mit.edu/topic/artificial-intelligence2

🙌 Inspiration

Use this dataset to: - Track how AI is being applied across scientific disciplines - Build a news aggregator for AI research - Train a model to predict research trends - Create a search engine for MIT’s AI breakthroughs

✅ License

This dataset is shared under Kaggle’s Terms of Service for non-commercial, educational, and research purposes.
The original content remains the property of MIT News and should be properly attributed.

Basic measurements of radiation at station Ilorin (ILO), 1996-1999,...
search.datacite.org
doi.pangaea.de
+1more
Updated 2015
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
T O Aro (2015). Basic measurements of radiation at station Ilorin (ILO), 1996-1999, reference list of 42 datasets [Dataset]. http://doi.org/10.1594/pangaea.150023
Explore at:
Unique identifier
https://doi.org/10.1594/pangaea.150023
Dataset updated
2015
Dataset provided by
DataCitehttps://www.datacite.org/
PANGAEA - Data Publisher for Earth & Environmental Science
Authors
T O Aro
License
Attribution 3.0 (CC BY 3.0)https://creativecommons.org/licenses/by/3.0/
License information was derived automatically
Area covered

Description
Any user who accepts the BSRN data release guidelines (http://bsrn.awi.de/data/conditions-of-data-release) may ask Amelie Driemel (mailto:Amelie.Driemel@awi.de) to obtain an account to download these datasets.
VocalSketch Data Set v1.0.4
zenodo.org
data-staging.niaid.nih.gov
bin
Updated Jan 24, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mark Cartwright; Bryan Pardo; Mark Cartwright; Bryan Pardo (2020). VocalSketch Data Set v1.0.4 [Dataset]. http://doi.org/10.5281/zenodo.13862
Explore at:
binAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.13862
Dataset updated
Jan 24, 2020
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Mark Cartwright; Bryan Pardo; Mark Cartwright; Bryan Pardo
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This data set contains thousands of vocal imitations of a large set of diverse sounds. These imitations were collected from hundreds of contributors via Amazon's Mechanical Turk website. The data set also contains data on hundreds of people's ability to correctly label these vocal imitations, also collected via Amazon's Mechanical Turk. This data set will help the research community understand which audio concepts can be effectively communicated with this approach. We have released this data so the community can study the related issues and build systems that leverage vocal imitation as an interaction modality, such as search engines that can be queried by vocally imitating the desired sound.

This data set is a supplement to a paper. Please cite the following paper to reference this data set in a publication:

Cartwright, M., Pardo, B. VocalSketch: Vocally Imitating Audio Concepts. In Proceedings of ACM Conference on Human Factors in Computing Systems (2015). http://dx.doi.org/10.1145/2702123.2702387

See https://github.com/interactiveaudiolab/VocalSketchDataSet for the latest updates to this data set.

Interactive Audio Lab: http://music.eecs.northwestern.edu
Continuous meteorological observations at Neumayer Station (2002-2014),...
search.datacite.org
doi.pangaea.de
+1more
Updated 2015
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Gert König-Langlo (2015). Continuous meteorological observations at Neumayer Station (2002-2014), reference list of 156 datasets [Dataset]. http://doi.org/10.1594/pangaea.150012
Explore at:
Unique identifier
https://doi.org/10.1594/pangaea.150012
Dataset updated
2015
Dataset provided by
DataCitehttps://www.datacite.org/
PANGAEA - Data Publisher for Earth & Environmental Science
Authors
Gert König-Langlo
License
Attribution 3.0 (CC BY 3.0)https://creativecommons.org/licenses/by/3.0/
License information was derived automatically
Area covered

Description
The data collection contains 156 links to continuous meteorological observations at Neumayer Station from the Baseline Surface Radiation Network (BSRN). It covers all available measurements from the time period between 2002-01 and 2014-12.Any user who accepts the BSRN data release guidelines (http://bsrn.awi.de/data/conditions-of-data-release) may ask Amelie Driemel (mailto:Amelie.Driemel@awi.de) to obtain an account to download these datasets.
H
Extracted Data From: Clean Water Act Approved Jurisdictional Determinations
dataverse.harvard.edu
Updated Mar 3, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. Army Corps of Engineers (2025). Extracted Data From: Clean Water Act Approved Jurisdictional Determinations [Dataset]. http://doi.org/10.7910/DVN/EDEPID
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.7910/DVN/EDEPID
Dataset updated
Mar 3, 2025
Dataset provided by
Harvard Dataverse
Authors
U.S. Army Corps of Engineers
License
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Time period covered
Aug 28, 2015 - Sep 13, 2023
Area covered
United States
Description
This submission includes publicly available data extracted in its original form. Please reference the Related Publication listed here for source and citation information: US Army Corps of Engineers (Corps) Pre-2015 Regulatory Regime Approved Jurisdictional Determination in Light of Sackett v. EPA, 143 S. Ct. 1322 (2023), NWW-2023-00554, MFR 1 of 1 Clean Water Act Approved Jurisdictional Determinations This upload includes data and screenshots of the landing page and FAQs. "This website presents information on approved jurisdictional determinations (JDs) made by the U.S. Army Corps of Engineers (Corps) and the U.S. Environmental Protection Agency (EPA) under the Clean Water Act since August 28, 2015. Users are able to search, sort, map, view, and download approved JDs from both agencies using different search parameters (e.g., by year, State, watershed). An approved JD is an official Corps determination that jurisdictional waters of the United States are either present or absent on a particular site." Quote from https://watersgeo.epa.gov/cwa/CWA-JDs/
Basic measurements of radiation at station Solar Village (1998-09 to...
search.datacite.org
search.dataone.org
Updated 2016
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Naif Al-Abbadi (2016). Basic measurements of radiation at station Solar Village (1998-09 to 2002-12), reference list of 51 datasets [Dataset]. http://doi.org/10.1594/pangaea.860279
Explore at:
Unique identifier
https://doi.org/10.1594/pangaea.860279
Dataset updated
2016
Dataset provided by
DataCitehttps://www.datacite.org/
PANGAEA - Data Publisher for Earth & Environmental Science
Authors
Naif Al-Abbadi
License
Attribution 3.0 (CC BY 3.0)https://creativecommons.org/licenses/by/3.0/
License information was derived automatically
Area covered

Description
Any user who accepts the BSRN data release guidelines (http://bsrn.awi.de/data/conditions-of-data-release) may ask Amelie Driemel (mailto:Amelie.Driemel@awi.de) to obtain an account to download these datasets.
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

John P.A. Ioannidis (2025). August 2025 data-update for "Updated science-wide author databases of standardized citation indicators" [Dataset]. http://doi.org/10.17632/btchxktzyw.8

August 2025 data-update for "Updated science-wide author databases of standardized citation indicators"

Explore at:

4 scholarly articles cite this dataset (View in Google Scholar)

Unique identifier

https://doi.org/10.17632/btchxktzyw.8

Dataset updated

Sep 19, 2025

Authors

John P.A. Ioannidis

License

Attribution-NonCommercial 3.0 (CC BY-NC 3.0)https://creativecommons.org/licenses/by-nc/3.0/
License information was derived automatically

Description

Citation metrics are widely used and misused. We have created a publicly available database of top-cited scientists that provides standardized information on citations, h-index, co-authorship adjusted hm-index, citations to papers in different authorship positions and a composite indicator (c-score). Separate data are shown for career-long and, separately, for single recent year impact. Metrics with and without self-citations and ratio of citations to citing papers are given and data on retracted papers (based on Retraction Watch database) as well as citations to/from retracted papers have been added. Scientists are classified into 22 scientific fields and 174 sub-fields according to the standard Science-Metrix classification. Field- and subfield-specific percentiles are also provided for all scientists with at least 5 papers. Career-long data are updated to end-of-2024 and single recent year data pertain to citations received during calendar year 2024. The selection is based on the top 100,000 scientists by c-score (with and without self-citations) or a percentile rank of 2% or above in the sub-field. This version (7) is based on the August 1, 2025 snapshot from Scopus, updated to end of citation year 2024. This work uses Scopus data. Calculations were performed using all Scopus author profiles as of August 1, 2025. If an author is not on the list, it is simply because the composite indicator value was not high enough to appear on the list. It does not mean that the author does not do good work. PLEASE ALSO NOTE THAT THE DATABASE HAS BEEN PUBLISHED IN AN ARCHIVAL FORM AND WILL NOT BE CHANGED. The published version reflects Scopus author profiles at the time of calculation. We thus advise authors to ensure that their Scopus profiles are accurate. REQUESTS FOR CORRECIONS OF THE SCOPUS DATA (INCLUDING CORRECTIONS IN AFFILIATIONS) SHOULD NOT BE SENT TO US. They should be sent directly to Scopus, preferably by use of the Scopus to ORCID feedback wizard (https://orcid.scopusfeedback.com/) so that the correct data can be used in any future annual updates of the citation indicator databases. The c-score focuses on impact (citations) rather than productivity (number of publications) and it also incorporates information on co-authorship and author positions (single, first, last author). If you have additional questions, see attached file on FREQUENTLY ASKED QUESTIONS. Finally, we alert users that all citation metrics have limitations and their use should be tempered and judicious. For more reading, we refer to the Leiden manifesto: https://www.nature.com/articles/520429a

Clear search

Close search

Google apps

Main menu

August 2025 data-update for "Updated science-wide author databases of...

Data from: Higher Education Institutions in Poland Dataset

Higher Education Institutions in Germany Dataset 2025

Higher Education Institutions in Germany Dataset 2025

Data

Methodology

Usage

Contribution

Acknowledgment

References

Higher Education Institutions in the USA

Higher Education Institutions in the United States of America Dataset

Data

Methodology

Usage

Contribution

Acknowledgment

References

Extracted Data From: TRI Basic Data Plus Files

Data from: Where do engineering students really get their information? :...

Data used in the manuscript - A Hierarchical Approach for Evaluating Athlete...

Data from: WikiHist.html: English Wikipedia's Full Revision History in HTML...

IAEA’s MODARIA II Soil-Plant Transfer Parameter Dataset for Tropical...

Description of Dataset Content

Use of Data

Acknowledgement

How do I Search for Data?

Get Involved

Extracted Data From: Open FEMA Data Emergency Management, Preparedness, and...

Basic and other measurements of radiation at Neumayer Station in 2013,...

Corpus of Decisions: International Court of Justice (CD-ICJ)

U.S. Facebook data requests from government agencies 2013-2023

Number of internet and social media users worldwide 2025

MIT AI news dataset

📄 Dataset Description

⚠️ Important Notes

📁 Columns

🔗 Source

🙌 Inspiration

✅ License

Basic measurements of radiation at station Ilorin (ILO), 1996-1999,...

VocalSketch Data Set v1.0.4

Continuous meteorological observations at Neumayer Station (2002-2014),...

Extracted Data From: Clean Water Act Approved Jurisdictional Determinations

Basic measurements of radiation at station Solar Village (1998-09 to...

August 2025 data-update for "Updated science-wide author databases of standardized citation indicators"See More Versions

August 2025 data-update for "Updated science-wide author databases of standardized citation indicators"