6 datasets found

d
August 2024 data-update for "Updated science-wide author databases of...
elsevier.digitalcommonsdata.com
Updated Sep 16, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
John P.A. Ioannidis (2024). August 2024 data-update for "Updated science-wide author databases of standardized citation indicators" [Dataset]. http://doi.org/10.17632/btchxktzyw.7
Explore at:
Unique identifier
https://doi.org/10.17632/btchxktzyw.7
Dataset updated
Sep 16, 2024
Authors
John P.A. Ioannidis
License
Attribution-NonCommercial 3.0 (CC BY-NC 3.0)https://creativecommons.org/licenses/by-nc/3.0/
License information was derived automatically
Description
Citation metrics are widely used and misused. We have created a publicly available database of top-cited scientists that provides standardized information on citations, h-index, co-authorship adjusted hm-index, citations to papers in different authorship positions and a composite indicator (c-score). Separate data are shown for career-long and, separately, for single recent year impact. Metrics with and without self-citations and ratio of citations to citing papers are given and data on retracted papers (based on Retraction Watch database) as well as citations to/from retracted papers have been added in the most recent iteration. Scientists are classified into 22 scientific fields and 174 sub-fields according to the standard Science-Metrix classification. Field- and subfield-specific percentiles are also provided for all scientists with at least 5 papers. Career-long data are updated to end-of-2023 and single recent year data pertain to citations received during calendar year 2023. The selection is based on the top 100,000 scientists by c-score (with and without self-citations) or a percentile rank of 2% or above in the sub-field. This version (7) is based on the August 1, 2024 snapshot from Scopus, updated to end of citation year 2023. This work uses Scopus data. Calculations were performed using all Scopus author profiles as of August 1, 2024. If an author is not on the list it is simply because the composite indicator value was not high enough to appear on the list. It does not mean that the author does not do good work. PLEASE ALSO NOTE THAT THE DATABASE HAS BEEN PUBLISHED IN AN ARCHIVAL FORM AND WILL NOT BE CHANGED. The published version reflects Scopus author profiles at the time of calculation. We thus advise authors to ensure that their Scopus profiles are accurate. REQUESTS FOR CORRECIONS OF THE SCOPUS DATA (INCLUDING CORRECTIONS IN AFFILIATIONS) SHOULD NOT BE SENT TO US. They should be sent directly to Scopus, preferably by use of the Scopus to ORCID feedback wizard (https://orcid.scopusfeedback.com/) so that the correct data can be used in any future annual updates of the citation indicator databases. The c-score focuses on impact (citations) rather than productivity (number of publications) and it also incorporates information on co-authorship and author positions (single, first, last author). If you have additional questions, see attached file on FREQUENTLY ASKED QUESTIONS. Finally, we alert users that all citation metrics have limitations and their use should be tempered and judicious. For more reading, we refer to the Leiden manifesto: https://www.nature.com/articles/520429a
ufpb-ci-faculty-articles-titles
kaggle.com
zip
Updated May 2, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Vinicius Trajano (2025). ufpb-ci-faculty-articles-titles [Dataset]. https://www.kaggle.com/datasets/viniciustrajano/ufpb-ci-faculty-articles-titles
Explore at:
zip(91768 bytes)Available download formats
Dataset updated
May 2, 2025
Authors
Vinicius Trajano
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
Context and Motivation

The “UFPB CI Faculty Articles” dataset aggregates metadata for academic articles published by professors of the Centro de Informática (CI) at the Federal University of Paraíba (UFPB). By centralizing title, authors, publication venue, date, and citation count, this dataset enables you to:

Map the CI’s scientific output over time

Identify faculty collaboration patterns

Analyze topic distributions and emerging research areas

Benchmark citation impact within and beyond UFPB

Data Collection

Source: Google Scholar profiles for each CI faculty member

Method: Automated scraping with Selenium + Python

Collection period: [insert date range here, e.g., Jan–Mar 2025]

Dataset Structure

Each row corresponds to one article and includes the following columns: - professor – name of the faculty member queried
- title – full article title
- authors – list of contributing authors
- publication_date – publication date (day, month, year)
- venue – journal or conference name
- pages – page range or count
- publisher – publishing entity
- abstract – article abstract (when available)
- citation_count – total citations as recorded on Google Scholar
- status – “OK” if the profile was found; “Name not found” otherwise

Potential Uses

Bibliometric analysis of CI research trends

Co-authorship network visualizations

Time-series studies of publication and citation growth

Text mining on titles and abstracts to identify thematic clusters

License and Citation

Licensed under the MIT License.

Please cite this dataset as:

Lopes C., Medeiros F., Mangueira V. (2025). UFPB CI Faculty Articles. Kaggle.
https://kaggle.com/your-username/ufpb-ci-faculty-articles
Z
Data from: Twitter historical dataset: March 21, 2006 (first tweet) to July...
data.niaid.nih.gov
live.european-language-grid.eu
+2more
Updated May 20, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Gayo-Avello, Daniel (2020). Twitter historical dataset: March 21, 2006 (first tweet) to July 31, 2009 (3 years, 1.5 billion tweets) [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_3833781
Explore at:
Dataset updated
May 20, 2020
Dataset provided by
University of Oviedo
Authors
Gayo-Avello, Daniel
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Disclaimer: This dataset is distributed by Daniel Gayo-Avello, an associate professor at the Department of Computer Science in the University of Oviedo, for the sole purpose of non-commercial research and it just includes tweet ids.

The dataset contains tweet IDs for all the published tweets (in any language) bettween March 21, 2006 and July 31, 2009 thus comprising the first whole three years of Twitter from its creation, that is, about 1.5 billion tweets (see file Twitter-historical-20060321-20090731.zip).

It covers several defining issues in Twitter, such as the invention of hashtags, retweets and trending topics, and it includes tweets related to the 2008 US Presidential Elections, the first Obama’s inauguration speech or the 2009 Iran Election protests (one of the so-called Twitter Revolutions).

Finally, it does contain tweets in many major languages (mainly English, Portuguese, Japanese, Spanish, German and French) so it should be possible–at least in theory–to analyze international events from different cultural perspectives.

The dataset was completed in November 2016 and, therefore, the tweet IDs it contains were publicly available at that moment. This means that there could be tweets public during that period that do not appear in the dataset and also that a substantial part of tweets in the dataset has been deleted (or locked) since 2016.

To make easier to understand the decay of tweet IDs in the dataset a number of representative samples (99% confidence level and 0.5 confidence interval) are provided.

In general terms, 85.5% ±0.5 of the historical tweets are available as of May 19, 2020 (see file Twitter-historical-20060321-20090731-sample.txt). However, since the amount of tweets vary greatly throughout the period of three years covered in the dataset, additional representative samples are provided for 90-day intervals (see the file 90-day-samples.zip).

In that regard, the ratio of publicly available tweets (as of May 19, 2020) is as follows:

March 21, 2006 to June 18, 2006: 88.4% ±0.5 (from 5,512 tweets).

June 18, 2006 to September 16, 2006: 82.7% ±0.5 (from 14,820 tweets).

September 16, 2006 to December 15, 2006: 85.7% ±0.5 (from 107,975 tweets).

December 15, 2006 to March 15, 2007: 88.2% ±0.5 (from 852,463 tweets).

March 15, 2007 to June 13, 2007: 89.6% ±0.5 (from 6,341,665 tweets).

June 13, 2007 to September 11, 2007: 88.6% ±0.5 (from 11,171,090 tweets).

September 11, 2007 to December 10, 2007: 87.9% ±0.5 (from 15,545,532 tweets).

December 10, 2007 to March 9, 2008: 89.0% ±0.5 (from 23,164,663 tweets).

March 9, 2008 to June 7, 2008: 66.5% ±0.5 (from 56,416,772 tweets; see below for more details on this).

June 7, 2008 to September 5, 2008: 78.3% ±0.5 (from 62,868,189 tweets; see below for more details on this).

September 5, 2008 to December 4, 2008: 87.3% ±0.5 (from 89,947,498 tweets).

December 4, 2008 to March 4, 2009: 86.9% ±0.5 (from 169,762,425 tweets).

March 4, 2009 to June 2, 2009: 86.4% ±0.5 (from 474,581,170 tweets).

June 2, 2009 to July 31, 2009: 85.7% ±0.5 (from 589,116,341 tweets).

The apparent drop in available tweets from March 9, 2008 to September 5, 2008 has an easy, although embarrassing, explanation.

At the moment of cleaning all the data to publish this dataset there seemed to be a gap between April 1, 2008 to July 7, 2008 (actually, the data was not missing but in a different backup). Since tweet IDs are easy to regenerate for that Twitter era (source code is provided in generate-ids.m) I simply produced all those that were created between those two dates. All those tweets actually existed but a number of them were obviously private and not crawlable. For those regenerated IDs the actual ratio of public tweets (as of May 19, 2020) is 62.3% ±0.5.

In other words, what you see in that period (April to July, 2008) is not actually a huge number of tweets having been deleted but the combination of deleted and non-public tweets (whose IDs should not be in the dataset for performance purposes when rehydrating the dataset).

Additionally, given that not everybody will need the whole period of time the earliest tweet ID for each date is provided in the file date-tweet-id.tsv.

For additional details regarding this dataset please see: Gayo-Avello, Daniel. "How I Stopped Worrying about the Twitter Archive at the Library of Congress and Learned to Build a Little One for Myself." arXiv preprint arXiv:1611.08144 (2016).

If you use this dataset in any way please cite that preprint (in addition to the dataset itself).

If you need to contact me you can find me as @PFCdgayo in Twitter.
H
Replication Data for: Systematic peak-load pricing, congestion premia and...
dataverse.harvard.edu
search.dataone.org
Updated Oct 30, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Diego Escobari (2023). Replication Data for: Systematic peak-load pricing, congestion premia and demand diverting: Empirical evidence [Dataset]. http://doi.org/10.7910/DVN/FTWECY
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.7910/DVN/FTWECY
Dataset updated
Oct 30, 2023
Dataset provided by
Harvard Dataverse
Authors
Diego Escobari
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
Contains the following files: 1) EconLetters.txt 2) EconLetters.dta 3) DoFileEconLetters.do The first two contain the data used in: Escobari, Diego. "Systematic Peak-load Pricing, Congestion Premia and Demand Diverting: Empirical Evidence." Economics Letters, 103 (1), April 2009, 59-61. (1) is a text file, while (2) is in Stata format. The third file is the Stata Do file to replicate all the tables in the paper. Please cite the paper if you use these data in your own research. Stephanie C. Reynolds helped in the collection of the data. This research received financial support from the Private Enterprise Research Center at Texas A&M and the Bradley Foundation. Feel free to e-mail me if you have any questions: escobaridiego@gmail.com Diego Escobari, Ph.D. Professor of Economics Department of Economics Robert C. Vackar College of Business and Entrepreneurship The University of Texas Rio Grande Valley 1201 West University Drive Edinburg, Texas 78541 Phone: (956) 665 3366 https://faculty.utrgv.edu/diego.escobari/
Dataset of River Ganga and Sangam (INDIA)
kaggle.com
zip
Updated May 4, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
MANISH KUMAR (2025). Dataset of River Ganga and Sangam (INDIA) [Dataset]. https://www.kaggle.com/datasets/dal4206/dataset-of-river-ganga-and-sangam-india
Explore at:
zip(3661623 bytes)Available download formats
Dataset updated
May 4, 2025
Authors
MANISH KUMAR
License
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Area covered
Ganges
Description
IoT sensor data for water quality monitoring at the Ganga River and the confluence of Ganga and Yamuna rivers (Sangam). Dataset Overview This dataset comprises water quality measurements collected from two distinct sites: the Ganga River and the confluence point of the Ganga and Yamuna rivers, known as the Sangam, in India. Utilizing an IoT-based infrastructure, the data captures various physicochemical parameters essential for assessing river water quality. The data samples were collected over 15 months using an IoT-based smart water sensors kit. The total 52372 and 46479 number of data samples were collected from Sangam and Ganga River, respectively. The dataset supports research in environmental monitoring, IoT applications in water management, and machine learning-based water quality prediction. “More details regarding the dataset's locations, device used and labeling are provided in the associated research paper.”

Data Features

Parameters Monitored: pH, Dissolved Oxygen (DO), Temperature, Electrical Conductivity (EC), and Oxidation-Reduction Potential (ORP). Temporal Coverage: The dataset was collected in between Jan 2019 to March 2020 fortnightly so that effect of festivals such as Moni Amavasya, Jyestha Ganga Dusshera, Kartik Purnima, Maha Shivaratri, Moni Amavasya and Magh Mela could be observed. Data Format: CSV files with timestamped entries for each parameter. Geographical Coordinates: The Sangam (E 78054′45”; N 25025′15”) is the confluence of Ganga and Yamuna rivers situated at Prayagraj district in Uttar Pradesh, India.

Usages The dataset is suitable for: Developing and validating machine learning models for water quality prediction. Analyzing temporal trends and patterns in river water quality. Assessing the impact of environmental factors on water quality parameters. Supporting policy-making and environmental management strategies.

Citation Acknowledgment: This dataset is the outcome of the project ”IoT based Water Quality Assessment for River Ganga at Allahabad” funded by IIIT-Allahabad, through Ministry of Education (Govt. of India). The Principal investigator of this project was Dr. Manish Kumar (Associate Professor) in Department of Information at IIIT-Allahabad, India. If you utilize this dataset in your research or projects, please cite the following publication: 1. Manish Kumar, Tinku Singh, Manish Kumar Maurya, Anubhav Shivhare, Ashwin Raut, and Pramod Kumar Singh, "Quality Assessment and Monitoring of River Water Using IoT Infrastructure," IEEE Internet of Things Journal, vol. 10, no. 12, pp. 10280–10290, 2023. DOI: https://doi.org/10.1109/JIOT.2023.3238123

M. K. Maurya, A. Raut, M. Kumar, Satakshi and M. Kumar, "A Domain-Aware Data Fusion Scheme for Cold Start Problem in Early-Stage IoT Systems," 2025 IEEE 14th International Conference on Communication Systems and Network Technologies (CSNT), Bhopal, India, 2025, pp. 1017-1023, DOI: https://doi.org/10.1109/CSNT64827.2025.10968973
Dataset of Positive Politeness Strategies by the Main Characters in The...
figshare.com
xlsx
Updated Jun 5, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
febri aspitasari; I Made Budiarsa; Gede Primahadi Wijaya Rajeg (2023). Dataset of Positive Politeness Strategies by the Main Characters in The Greatest Showman Movie [Dataset]. http://doi.org/10.6084/m9.figshare.19428821.v1
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.19428821.v1
Dataset updated
Jun 5, 2023
Dataset provided by
Figsharehttp://figshare.com/
Authors
febri aspitasari; I Made Budiarsa; Gede Primahadi Wijaya Rajeg
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This is the data repository for an undergraduate thesis entitled Politeness Strategies in Main Characters The Greatest Showman Movie. If you use the data in this repository for your research, please cite this work as follows: Aspitasari, Febri; Budiarsa, I Made; Rajeg, Gede Primahadi Wijaya (2022): Dataset of Positive Politeness Strategies in Main Characters The Greatest Showman Movie. figshare. Dataset. https://doi.org/10.6084/m9.figshare.19428821 First Supervisor: Prof. Dr. I Made Budiarsa, M.A. Second Supervisor: Gede Primahadi Wijaya Rajeg, S.S., M.Hum., Ph.D.
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

John P.A. Ioannidis (2024). August 2024 data-update for "Updated science-wide author databases of standardized citation indicators" [Dataset]. http://doi.org/10.17632/btchxktzyw.7

August 2024 data-update for "Updated science-wide author databases of standardized citation indicators"

Explore at:

42 scholarly articles cite this dataset (View in Google Scholar)

Unique identifier

https://doi.org/10.17632/btchxktzyw.7

Dataset updated

Sep 16, 2024

Authors

John P.A. Ioannidis

License

Attribution-NonCommercial 3.0 (CC BY-NC 3.0)https://creativecommons.org/licenses/by-nc/3.0/
License information was derived automatically

Description

Citation metrics are widely used and misused. We have created a publicly available database of top-cited scientists that provides standardized information on citations, h-index, co-authorship adjusted hm-index, citations to papers in different authorship positions and a composite indicator (c-score). Separate data are shown for career-long and, separately, for single recent year impact. Metrics with and without self-citations and ratio of citations to citing papers are given and data on retracted papers (based on Retraction Watch database) as well as citations to/from retracted papers have been added in the most recent iteration. Scientists are classified into 22 scientific fields and 174 sub-fields according to the standard Science-Metrix classification. Field- and subfield-specific percentiles are also provided for all scientists with at least 5 papers. Career-long data are updated to end-of-2023 and single recent year data pertain to citations received during calendar year 2023. The selection is based on the top 100,000 scientists by c-score (with and without self-citations) or a percentile rank of 2% or above in the sub-field. This version (7) is based on the August 1, 2024 snapshot from Scopus, updated to end of citation year 2023. This work uses Scopus data. Calculations were performed using all Scopus author profiles as of August 1, 2024. If an author is not on the list it is simply because the composite indicator value was not high enough to appear on the list. It does not mean that the author does not do good work. PLEASE ALSO NOTE THAT THE DATABASE HAS BEEN PUBLISHED IN AN ARCHIVAL FORM AND WILL NOT BE CHANGED. The published version reflects Scopus author profiles at the time of calculation. We thus advise authors to ensure that their Scopus profiles are accurate. REQUESTS FOR CORRECIONS OF THE SCOPUS DATA (INCLUDING CORRECTIONS IN AFFILIATIONS) SHOULD NOT BE SENT TO US. They should be sent directly to Scopus, preferably by use of the Scopus to ORCID feedback wizard (https://orcid.scopusfeedback.com/) so that the correct data can be used in any future annual updates of the citation indicator databases. The c-score focuses on impact (citations) rather than productivity (number of publications) and it also incorporates information on co-authorship and author positions (single, first, last author). If you have additional questions, see attached file on FREQUENTLY ASKED QUESTIONS. Finally, we alert users that all citation metrics have limitations and their use should be tempered and judicious. For more reading, we refer to the Leiden manifesto: https://www.nature.com/articles/520429a

Clear search

Close search

Google apps

Main menu

August 2024 data-update for "Updated science-wide author databases of...

ufpb-ci-faculty-articles-titles

Context and Motivation

Data Collection

Dataset Structure

Potential Uses

License and Citation

Data from: Twitter historical dataset: March 21, 2006 (first tweet) to July...

Replication Data for: Systematic peak-load pricing, congestion premia and...

Dataset of River Ganga and Sangam (INDIA)

Dataset of Positive Politeness Strategies by the Main Characters in The...

August 2024 data-update for "Updated science-wide author databases of standardized citation indicators"See More Versions

August 2024 data-update for "Updated science-wide author databases of standardized citation indicators"