Facebook
TwitterAttribution-NonCommercial 3.0 (CC BY-NC 3.0)https://creativecommons.org/licenses/by-nc/3.0/
License information was derived automatically
Citation metrics are widely used and misused. We have created a publicly available database of top-cited scientists that provides standardized information on citations, h-index, co-authorship adjusted hm-index, citations to papers in different authorship positions and a composite indicator (c-score). Separate data are shown for career-long and, separately, for single recent year impact. Metrics with and without self-citations and ratio of citations to citing papers are given and data on retracted papers (based on Retraction Watch database) as well as citations to/from retracted papers have been added in the most recent iteration. Scientists are classified into 22 scientific fields and 174 sub-fields according to the standard Science-Metrix classification. Field- and subfield-specific percentiles are also provided for all scientists with at least 5 papers. Career-long data are updated to end-of-2023 and single recent year data pertain to citations received during calendar year 2023. The selection is based on the top 100,000 scientists by c-score (with and without self-citations) or a percentile rank of 2% or above in the sub-field. This version (7) is based on the August 1, 2024 snapshot from Scopus, updated to end of citation year 2023. This work uses Scopus data. Calculations were performed using all Scopus author profiles as of August 1, 2024. If an author is not on the list it is simply because the composite indicator value was not high enough to appear on the list. It does not mean that the author does not do good work. PLEASE ALSO NOTE THAT THE DATABASE HAS BEEN PUBLISHED IN AN ARCHIVAL FORM AND WILL NOT BE CHANGED. The published version reflects Scopus author profiles at the time of calculation. We thus advise authors to ensure that their Scopus profiles are accurate. REQUESTS FOR CORRECIONS OF THE SCOPUS DATA (INCLUDING CORRECTIONS IN AFFILIATIONS) SHOULD NOT BE SENT TO US. They should be sent directly to Scopus, preferably by use of the Scopus to ORCID feedback wizard (https://orcid.scopusfeedback.com/) so that the correct data can be used in any future annual updates of the citation indicator databases. The c-score focuses on impact (citations) rather than productivity (number of publications) and it also incorporates information on co-authorship and author positions (single, first, last author). If you have additional questions, see attached file on FREQUENTLY ASKED QUESTIONS. Finally, we alert users that all citation metrics have limitations and their use should be tempered and judicious. For more reading, we refer to the Leiden manifesto: https://www.nature.com/articles/520429a
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
The “UFPB CI Faculty Articles” dataset aggregates metadata for academic articles published by professors of the Centro de Informática (CI) at the Federal University of Paraíba (UFPB). By centralizing title, authors, publication venue, date, and citation count, this dataset enables you to:
Each row corresponds to one article and includes the following columns:
- professor – name of the faculty member queried
- title – full article title
- authors – list of contributing authors
- publication_date – publication date (day, month, year)
- venue – journal or conference name
- pages – page range or count
- publisher – publishing entity
- abstract – article abstract (when available)
- citation_count – total citations as recorded on Google Scholar
- status – “OK” if the profile was found; “Name not found” otherwise
Licensed under the MIT License.
Please cite this dataset as:
Lopes C., Medeiros F., Mangueira V. (2025). UFPB CI Faculty Articles. Kaggle.
https://kaggle.com/your-username/ufpb-ci-faculty-articles
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Disclaimer: This dataset is distributed by Daniel Gayo-Avello, an associate professor at the Department of Computer Science in the University of Oviedo, for the sole purpose of non-commercial research and it just includes tweet ids.
The dataset contains tweet IDs for all the published tweets (in any language) bettween March 21, 2006 and July 31, 2009 thus comprising the first whole three years of Twitter from its creation, that is, about 1.5 billion tweets (see file Twitter-historical-20060321-20090731.zip).
It covers several defining issues in Twitter, such as the invention of hashtags, retweets and trending topics, and it includes tweets related to the 2008 US Presidential Elections, the first Obama’s inauguration speech or the 2009 Iran Election protests (one of the so-called Twitter Revolutions).
Finally, it does contain tweets in many major languages (mainly English, Portuguese, Japanese, Spanish, German and French) so it should be possible–at least in theory–to analyze international events from different cultural perspectives.
The dataset was completed in November 2016 and, therefore, the tweet IDs it contains were publicly available at that moment. This means that there could be tweets public during that period that do not appear in the dataset and also that a substantial part of tweets in the dataset has been deleted (or locked) since 2016.
To make easier to understand the decay of tweet IDs in the dataset a number of representative samples (99% confidence level and 0.5 confidence interval) are provided.
In general terms, 85.5% ±0.5 of the historical tweets are available as of May 19, 2020 (see file Twitter-historical-20060321-20090731-sample.txt). However, since the amount of tweets vary greatly throughout the period of three years covered in the dataset, additional representative samples are provided for 90-day intervals (see the file 90-day-samples.zip).
In that regard, the ratio of publicly available tweets (as of May 19, 2020) is as follows:
March 21, 2006 to June 18, 2006: 88.4% ±0.5 (from 5,512 tweets).
June 18, 2006 to September 16, 2006: 82.7% ±0.5 (from 14,820 tweets).
September 16, 2006 to December 15, 2006: 85.7% ±0.5 (from 107,975 tweets).
December 15, 2006 to March 15, 2007: 88.2% ±0.5 (from 852,463 tweets).
March 15, 2007 to June 13, 2007: 89.6% ±0.5 (from 6,341,665 tweets).
June 13, 2007 to September 11, 2007: 88.6% ±0.5 (from 11,171,090 tweets).
September 11, 2007 to December 10, 2007: 87.9% ±0.5 (from 15,545,532 tweets).
December 10, 2007 to March 9, 2008: 89.0% ±0.5 (from 23,164,663 tweets).
March 9, 2008 to June 7, 2008: 66.5% ±0.5 (from 56,416,772 tweets; see below for more details on this).
June 7, 2008 to September 5, 2008: 78.3% ±0.5 (from 62,868,189 tweets; see below for more details on this).
September 5, 2008 to December 4, 2008: 87.3% ±0.5 (from 89,947,498 tweets).
December 4, 2008 to March 4, 2009: 86.9% ±0.5 (from 169,762,425 tweets).
March 4, 2009 to June 2, 2009: 86.4% ±0.5 (from 474,581,170 tweets).
June 2, 2009 to July 31, 2009: 85.7% ±0.5 (from 589,116,341 tweets).
The apparent drop in available tweets from March 9, 2008 to September 5, 2008 has an easy, although embarrassing, explanation.
At the moment of cleaning all the data to publish this dataset there seemed to be a gap between April 1, 2008 to July 7, 2008 (actually, the data was not missing but in a different backup). Since tweet IDs are easy to regenerate for that Twitter era (source code is provided in generate-ids.m) I simply produced all those that were created between those two dates. All those tweets actually existed but a number of them were obviously private and not crawlable. For those regenerated IDs the actual ratio of public tweets (as of May 19, 2020) is 62.3% ±0.5.
In other words, what you see in that period (April to July, 2008) is not actually a huge number of tweets having been deleted but the combination of deleted and non-public tweets (whose IDs should not be in the dataset for performance purposes when rehydrating the dataset).
Additionally, given that not everybody will need the whole period of time the earliest tweet ID for each date is provided in the file date-tweet-id.tsv.
For additional details regarding this dataset please see: Gayo-Avello, Daniel. "How I Stopped Worrying about the Twitter Archive at the Library of Congress and Learned to Build a Little One for Myself." arXiv preprint arXiv:1611.08144 (2016).
If you use this dataset in any way please cite that preprint (in addition to the dataset itself).
If you need to contact me you can find me as @PFCdgayo in Twitter.
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Contains the following files: 1) EconLetters.txt 2) EconLetters.dta 3) DoFileEconLetters.do The first two contain the data used in: Escobari, Diego. "Systematic Peak-load Pricing, Congestion Premia and Demand Diverting: Empirical Evidence." Economics Letters, 103 (1), April 2009, 59-61. (1) is a text file, while (2) is in Stata format. The third file is the Stata Do file to replicate all the tables in the paper. Please cite the paper if you use these data in your own research. Stephanie C. Reynolds helped in the collection of the data. This research received financial support from the Private Enterprise Research Center at Texas A&M and the Bradley Foundation. Feel free to e-mail me if you have any questions: escobaridiego@gmail.com Diego Escobari, Ph.D. Professor of Economics Department of Economics Robert C. Vackar College of Business and Entrepreneurship The University of Texas Rio Grande Valley 1201 West University Drive Edinburg, Texas 78541 Phone: (956) 665 3366 https://faculty.utrgv.edu/diego.escobari/
Facebook
TwitterAttribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
IoT sensor data for water quality monitoring at the Ganga River and the confluence of Ganga and Yamuna rivers (Sangam). Dataset Overview This dataset comprises water quality measurements collected from two distinct sites: the Ganga River and the confluence point of the Ganga and Yamuna rivers, known as the Sangam, in India. Utilizing an IoT-based infrastructure, the data captures various physicochemical parameters essential for assessing river water quality. The data samples were collected over 15 months using an IoT-based smart water sensors kit. The total 52372 and 46479 number of data samples were collected from Sangam and Ganga River, respectively. The dataset supports research in environmental monitoring, IoT applications in water management, and machine learning-based water quality prediction. “More details regarding the dataset's locations, device used and labeling are provided in the associated research paper.”
Data Features
Parameters Monitored: pH, Dissolved Oxygen (DO), Temperature, Electrical Conductivity (EC), and Oxidation-Reduction Potential (ORP). Temporal Coverage: The dataset was collected in between Jan 2019 to March 2020 fortnightly so that effect of festivals such as Moni Amavasya, Jyestha Ganga Dusshera, Kartik Purnima, Maha Shivaratri, Moni Amavasya and Magh Mela could be observed. Data Format: CSV files with timestamped entries for each parameter. Geographical Coordinates: The Sangam (E 78054′45”; N 25025′15”) is the confluence of Ganga and Yamuna rivers situated at Prayagraj district in Uttar Pradesh, India.
Usages The dataset is suitable for: Developing and validating machine learning models for water quality prediction. Analyzing temporal trends and patterns in river water quality. Assessing the impact of environmental factors on water quality parameters. Supporting policy-making and environmental management strategies.
Citation Acknowledgment: This dataset is the outcome of the project ”IoT based Water Quality Assessment for River Ganga at Allahabad” funded by IIIT-Allahabad, through Ministry of Education (Govt. of India). The Principal investigator of this project was Dr. Manish Kumar (Associate Professor) in Department of Information at IIIT-Allahabad, India. If you utilize this dataset in your research or projects, please cite the following publication: 1. Manish Kumar, Tinku Singh, Manish Kumar Maurya, Anubhav Shivhare, Ashwin Raut, and Pramod Kumar Singh, "Quality Assessment and Monitoring of River Water Using IoT Infrastructure," IEEE Internet of Things Journal, vol. 10, no. 12, pp. 10280–10290, 2023. DOI: https://doi.org/10.1109/JIOT.2023.3238123
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This is the data repository for an undergraduate thesis entitled Politeness Strategies in Main Characters The Greatest Showman Movie. If you use the data in this repository for your research, please cite this work as follows: Aspitasari, Febri; Budiarsa, I Made; Rajeg, Gede Primahadi Wijaya (2022): Dataset of Positive Politeness Strategies in Main Characters The Greatest Showman Movie. figshare. Dataset. https://doi.org/10.6084/m9.figshare.19428821 First Supervisor: Prof. Dr. I Made Budiarsa, M.A. Second Supervisor: Gede Primahadi Wijaya Rajeg, S.S., M.Hum., Ph.D.
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Facebook
TwitterAttribution-NonCommercial 3.0 (CC BY-NC 3.0)https://creativecommons.org/licenses/by-nc/3.0/
License information was derived automatically
Citation metrics are widely used and misused. We have created a publicly available database of top-cited scientists that provides standardized information on citations, h-index, co-authorship adjusted hm-index, citations to papers in different authorship positions and a composite indicator (c-score). Separate data are shown for career-long and, separately, for single recent year impact. Metrics with and without self-citations and ratio of citations to citing papers are given and data on retracted papers (based on Retraction Watch database) as well as citations to/from retracted papers have been added in the most recent iteration. Scientists are classified into 22 scientific fields and 174 sub-fields according to the standard Science-Metrix classification. Field- and subfield-specific percentiles are also provided for all scientists with at least 5 papers. Career-long data are updated to end-of-2023 and single recent year data pertain to citations received during calendar year 2023. The selection is based on the top 100,000 scientists by c-score (with and without self-citations) or a percentile rank of 2% or above in the sub-field. This version (7) is based on the August 1, 2024 snapshot from Scopus, updated to end of citation year 2023. This work uses Scopus data. Calculations were performed using all Scopus author profiles as of August 1, 2024. If an author is not on the list it is simply because the composite indicator value was not high enough to appear on the list. It does not mean that the author does not do good work. PLEASE ALSO NOTE THAT THE DATABASE HAS BEEN PUBLISHED IN AN ARCHIVAL FORM AND WILL NOT BE CHANGED. The published version reflects Scopus author profiles at the time of calculation. We thus advise authors to ensure that their Scopus profiles are accurate. REQUESTS FOR CORRECIONS OF THE SCOPUS DATA (INCLUDING CORRECTIONS IN AFFILIATIONS) SHOULD NOT BE SENT TO US. They should be sent directly to Scopus, preferably by use of the Scopus to ORCID feedback wizard (https://orcid.scopusfeedback.com/) so that the correct data can be used in any future annual updates of the citation indicator databases. The c-score focuses on impact (citations) rather than productivity (number of publications) and it also incorporates information on co-authorship and author positions (single, first, last author). If you have additional questions, see attached file on FREQUENTLY ASKED QUESTIONS. Finally, we alert users that all citation metrics have limitations and their use should be tempered and judicious. For more reading, we refer to the Leiden manifesto: https://www.nature.com/articles/520429a