Facebook
TwitterTopic Modeling for Research Articles Researchers have access to large online archives of scientific articles. As a consequence, finding relevant articles has become more difficult. Tagging or topic modelling provides a way to give token of identification to research articles which facilitates recommendation and search process.
Given the abstract and title for a set of research articles, predict the topics for each article included in the test set.
Note that a research article can possibly have more than 1 topic. The research article abstracts and titles are sourced from the following 6 topics:
Computer Science
Physics
Mathematics
Statistics
Quantitative Biology
Quantitative Finance
| Column | Description |
|---|---|
| ID | Unique ID for each article |
| TITLE | Title of the research article |
| ABSTRACT | Abstract of the research article |
| Computer Science | Whether article belongs to topic computer science (1/0) |
| Physics | Whether article belongs to topic physics (1/0) |
| Mathematics | Whether article belongs to topic Mathematics (1/0) |
| Statistics | Whether article belongs to topic Statistics (1/0) |
| Quantitative Biology | Whether article belongs to topic Quantitative Biology (1/0) |
| Quantitative Finance | Whether article belongs to topic Quantitative Finance (1/0) |
| ID | Unique ID for each article |
|---|---|
| TITLE | Title of the research article |
| ABSTRACT | Abstract of the research article |
| ID | Unique ID for each article |
|---|---|
| TITLE | Title of the research article |
| ABSTRACT | Abstract of the research article |
| Computer Science | Whether article belongs to topic computer science (1/0) |
| Physics | Whether article belongs to topic physics (1/0) |
| Mathematics | Whether article belongs to topic Mathematics (1/0) |
| Statistics | Whether article belongs to topic Statistics (1/0) |
| Quantitative Biology | Whether article belongs to topic Quantitative Biology (1/0) |
| Quantitative Finance | Whether article belongs to topic Quantitative Finance (1/0) |
Your data will be in front of the world's largest data science community. What questions do you want to see answered?
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
What do the alt-metrics of figshare items tell us? This dataset lists Altmetric data for the top 100 figshare repository items, categorised by type (retrieved on 9 March 2013). The data appear in an Interactions post on the Altmetric blog.
Facebook
TwitterBy downloading the data, you agree with the terms & conditions mentioned below:
Data Access: The data in the research collection may only be used for research purposes. Portions of the data are copyrighted and have commercial value as data, so you must be careful to use them only for research purposes.
Summaries, analyses and interpretations of the linguistic properties of the information may be derived and published, provided it is impossible to reconstruct the information from these summaries. You may not try identifying the individuals whose texts are included in this dataset. You may not try to identify the original entry on the fact-checking site. You are not permitted to publish any portion of the dataset besides summary statistics or share it with anyone else.
We grant you the right to access the collection's content as described in this agreement. You may not otherwise make unauthorised commercial use of, reproduce, prepare derivative works, distribute copies, perform, or publicly display the collection or parts of it. You are responsible for keeping and storing the data in a way that others cannot access. The data is provided free of charge.
Citation
Please cite our work as
@InProceedings{clef-checkthat:2022:task3, author = {K{"o}hler, Juliane and Shahi, Gautam Kishore and Stru{\ss}, Julia Maria and Wiegand, Michael and Siegel, Melanie and Mandl, Thomas}, title = "Overview of the {CLEF}-2022 {CheckThat}! Lab Task 3 on Fake News Detection", year = {2022}, booktitle = "Working Notes of CLEF 2022---Conference and Labs of the Evaluation Forum", series = {CLEF~'2022}, address = {Bologna, Italy},}
@article{shahi2021overview, title={Overview of the CLEF-2021 CheckThat! lab task 3 on fake news detection}, author={Shahi, Gautam Kishore and Stru{\ss}, Julia Maria and Mandl, Thomas}, journal={Working Notes of CLEF}, year={2021} }
Problem Definition: Given the text of a news article, determine whether the main claim made in the article is true, partially true, false, or other (e.g., claims in dispute) and detect the topical domain of the article. This task will run in English and German.
Task 3: Multi-class fake news detection of news articles (English) Sub-task A would detect fake news designed as a four-class classification problem. Given the text of a news article, determine whether the main claim made in the article is true, partially true, false, or other. The training data will be released in batches and roughly about 1264 articles with the respective label in English language. Our definitions for the categories are as follows:
False - The main claim made in an article is untrue.
Partially False - The main claim of an article is a mixture of true and false information. The article contains partially true and partially false information but cannot be considered 100% true. It includes all articles in categories like partially false, partially true, mostly true, miscaptioned, misleading etc., as defined by different fact-checking services.
True - This rating indicates that the primary elements of the main claim are demonstrably true.
Other- An article that cannot be categorised as true, false, or partially false due to a lack of evidence about its claims. This category includes articles in dispute and unproven articles.
Cross-Lingual Task (German)
Along with the multi-class task for the English language, we have introduced a task for low-resourced language. We will provide the data for the test in the German language. The idea of the task is to use the English data and the concept of transfer to build a classification model for the German language.
Input Data
The data will be provided in the format of Id, title, text, rating, the domain; the description of the columns is as follows:
ID- Unique identifier of the news article
Title- Title of the news article
text- Text mentioned inside the news article
our rating - class of the news article as false, partially false, true, other
Output data format
public_id- Unique identifier of the news article
predicted_rating- predicted class
Sample File
public_id, predicted_rating 1, false 2, true
IMPORTANT!
We have used the data from 2010 to 2022, and the content of fake news is mixed up with several topics like elections, COVID-19 etc.
Baseline: For this task, we have created a baseline system. The baseline system can be found at https://zenodo.org/record/6362498
Related Work
Shahi GK. AMUSED: An Annotation Framework of Multi-modal Social Media Data. arXiv preprint arXiv:2010.00502. 2020 Oct 1.https://arxiv.org/pdf/2010.00502.pdf
G. K. Shahi and D. Nandini, “FakeCovid – a multilingual cross-domain fact check news dataset for covid-19,” in workshop Proceedings of the 14th International AAAI Conference on Web and Social Media, 2020. http://workshop-proceedings.icwsm.org/abstract?id=2020_14
Shahi, G. K., Dirkson, A., & Majchrzak, T. A. (2021). An exploratory study of covid-19 misinformation on twitter. Online Social Networks and Media, 22, 100104. doi: 10.1016/j.osnem.2020.100104
Shahi, G. K., Struß, J. M., & Mandl, T. (2021). Overview of the CLEF-2021 CheckThat! lab task 3 on fake news detection. Working Notes of CLEF.
Nakov, P., Da San Martino, G., Elsayed, T., Barrón-Cedeno, A., Míguez, R., Shaar, S., ... & Mandl, T. (2021, March). The CLEF-2021 CheckThat! lab on detecting check-worthy claims, previously fact-checked claims, and fake news. In European Conference on Information Retrieval (pp. 639-649). Springer, Cham.
Nakov, P., Da San Martino, G., Elsayed, T., Barrón-Cedeño, A., Míguez, R., Shaar, S., ... & Kartal, Y. S. (2021, September). Overview of the CLEF–2021 CheckThat! Lab on Detecting Check-Worthy Claims, Previously Fact-Checked Claims, and Fake News. In International Conference of the Cross-Language Evaluation Forum for European Languages (pp. 264-291). Springer, Cham.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This study estimates the effect of data sharing on the citations of academic articles, using journal policies as a natural experiment. We begin by examining 17 high-impact journals that have adopted the requirement that data from published articles be publicly posted. We match these 17 journals to 13 journals without policy changes and find that empirical articles published just before their change in editorial policy have citation rates with no statistically significant difference from those published shortly after the shift. We then ask whether this null result stems from poor compliance with data sharing policies, and use the data sharing policy changes as instrumental variables to examine more closely two leading journals in economics and political science with relatively strong enforcement of new data policies. We find that articles that make their data available receive 97 additional citations (estimate standard error of 34). We conclude that: a) authors who share data may be rewarded eventually with additional scholarly citations, and b) data-posting policies alone do not increase the impact of articles published in a journal unless those policies are enforced.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset includes google scholar articles listing on data mining, this is very helpful in many educational research works. This dataset contains 936 unique entries. including title, description, author names, article link, cited by and related articles.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
1The totals in this column equal the number of articles using a particular type of data, minus instances of duplicate classification by type of company within category of type of data. These instances were: Other types of data were used by articles classified as both tobacco and transportation, both mining and manufacturing, and both tobacco and alcohol, and quantitative data from internal company studies were used by the article classified as both mining and manufacturing. The overall column total is not shown, as it is greater than the total number of included articles (n = 361) because several articles used multiple types of internal documents.2The totals in this row equal the total number of articles for each type of company, minus instances where articles used multiple types of data, of which there are too many to list. The totals for the columns are therefore not equal to the sum of the classifications within the columns. The overall row total is not shown, as it is greater than the total number of included articles (N = 361) because three articles were classified with two types of companies.
Facebook
TwitterBackground Disruption of the balance between apoptosis and proliferation is considered to be an important factor in the development and progression of tumours. In the present study we determined the in vivo cell kinetics along the spectrum of apparently normal epithelium, hyperplasia, preinvasive lesions and invasive carcinoma, in breast tissues affected by fibrocystic changes in which preinvasive and/or invasive lesions developed, as a model of breast carcinogenesis. Materials and methods A total of 32 areas of apparently normal epithelium and 135 ductal proliferative and neoplastic lesions were studied. More than one epithelial lesion per case were analyzed. The apoptotic index (AI) and the proliferative index (PI) were expressed as the percentage of TdT-mediated dUTP-nick end-labelling (TUNEL) and Ki-67-positive cells, respectively. The PI/AI (P/A index) was calculated for each case. Results The AIs and PIs were significantly higher in hyperplasia than in apparently normal epithelium (P = 0.04 and P = 0.0005, respectively), in atypical hyperplasia than in hyperplasia (P = 0.01 and P = 0.04, respectively) and in invasive carcinoma than in in situ carcinoma (P < 0.001 and P < 0.001, respectively). The two indices were similar in atypical hyperplasia and in in situ carcinoma. The P/A index increased significantly from normal epithelium to hyperplasia (P = 0.01) and from preinvasive lesions to invasive carcinoma (P = 0.04) whereas it was decreased (non-significantly) from hyperplasia to preinvasive lesions. A strong positive correlation between the AIs and the PIs was found (r = 0.83, P < 0.001). Conclusion These findings suggest accelerating cell turnover along the continuum of breast carcinogenesis. Atypical hyperplasias and in situ carcinomas might be kinetically similar lesions. In the transition from normal epithelium to hyperplasia and from preinvasive lesions to invasive carcinoma the net growth of epithelial cells results from a growth imbalance in favour of proliferation. In the transition from hyperplasia to preinvasive lesions there is an imbalance in favour of apoptosis.
Facebook
TwitterAttribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
As part of forthcoming publications, I collect data that might interesting in association with other data.
On 2022-07-18 the Italian business newspaper published an article that reminded me of a book I read few years ago, "Peoplequake" (you can read here a couple of articles that I posted in Italian in 2017 and 2018).
The population of Italy, along with Japan, is old and getting older, and most commentators focus on the health system impacts.
In reality, coupled with a contraction of births well below the "replacement level" (i.e. to keep population steady), this implies the need to rethinksomething more than just the health system.
For the time being, see article in Italian referencing part of the data
As for the data: * it is the same information contained within the article, i.e. at the county ("provincia") level * to ease clustering analysis and comparison with other data that usually are by region or aggregation of regions within Italy, added clustering by Region/Area from ISTAT, the National Statistics Bureau of Italy
More information about other indicators at the county level will be gradually added.
Sources: * for the main table- Il Sole 24 Ore (paper edition, manually re-entered) * for the region and area list ISTAT
Facebook
TwitterAttribution-NoDerivs 3.0 (CC BY-ND 3.0)https://creativecommons.org/licenses/by-nd/3.0/
License information was derived automatically
Statistics illustrates consumption, production, prices, and trade of Ceramic Household Articles and Toilet Articles in Austria from Jan 2019 to Nov 2025.
Facebook
Twitterhttps://spdx.org/licenses/etalab-2.0.htmlhttps://spdx.org/licenses/etalab-2.0.html
Data and codes for reproducing the MCMC inferences, tables and the main figures described in Kwame Adrakey et al. 2023. Bayesian inference for spatio-temporal stochastic transmission of plant disease in the presence of roguing: a case study to estimate the dispersal distance of Flavescence dorée. The codes are written in C and R languages.
Facebook
TwitterThis dataset supports the following publication: "Solar and sensor geometry, not vegetation response, drive satellite NDVI phenology in widespread ecosystems of the western United States" (DOI:10.1016/j.rse.2020.112013). The data release allows users to replicate, test, or further explore results. The dataset consists of 4 separate items based on the analysis approach used in the original publication 1) the 'Phenocam' dataset uses images from a phenocam in a pinyon juniper ecosystem in Grand Canyon National Park to determine phenological patterns of multiple plant species. The 'Phenocam' dataset consists of scripts and tabular data developed while performing analyses and includes the final NDVI values for all areas of interest (AOIs) described in the associated publication. 2) the 'SolarSensorAnalysis' dataset uses downloaded tabular MODIS data to explore relationships between NDVI and multiple solar and sensor angles. The 'SolarSensorAnalysis' dataset consists of download and analysis scripts in Google Earth Engine and R. The source MODIS data used in the analysis are too large to include but are provided through MODIS providers and can be accessed through Google Earth Engine using the included script. A csv file includes solar and sensor angle information for the MODIS pixel closest to the phenocam as well as for a sample of 100 randomly selected MODIS pixels within the GRCA-PJ ecosystem. 3) the 'WinterPeakExtent' dataset includes final geotiffs showing the temporal frequency extent and associated vegetation physiognomic types experiencing winter NDVI peaks in the western US. 4) the "SensorComparison" dataset contains the NDVI time series at the phenocam location from 4 other satellites as well as the code used to download these data.
Facebook
TwitterAttribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
List of Top Authors of Journal of Big Data sorted by article citations.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Descriptive statistics of extracted article titles.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Slovakia Exports of other articles of plastics to Italy was US$43.07 Million during 2024, according to the United Nations COMTRADE database on international trade. Slovakia Exports of other articles of plastics to Italy - data, historical chart and statistics - was last updated on November of 2025.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
China Other Daily Sundry Article: YoY: Product Inventory data was reported at 14.061 % in Oct 2015. This records an increase from the previous number of 11.419 % for Sep 2015. China Other Daily Sundry Article: YoY: Product Inventory data is updated monthly, averaging 10.963 % from Jan 2006 (Median) to Oct 2015, with 89 observations. The data reached an all-time high of 43.130 % in Feb 2008 and a record low of -3.172 % in May 2013. China Other Daily Sundry Article: YoY: Product Inventory data remains active status in CEIC and is reported by National Bureau of Statistics. The data is categorized under China Premium Database’s Industrial Sector – Table CN.BIM: Daily Sundry Article: Other Daily Sundry Article.
Facebook
TwitterAttribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Article on Awassi Sheep in Palmyra and Its Surrounding Desert Areas: Statistics and Renowned Breeders in English
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
China Other Daily Sundry Article: Account Receivable data was reported at 9.647 RMB bn in Oct 2015. This records an increase from the previous number of 9.364 RMB bn for Sep 2015. China Other Daily Sundry Article: Account Receivable data is updated monthly, averaging 7.258 RMB bn from Dec 2003 (Median) to Oct 2015, with 97 observations. The data reached an all-time high of 9.781 RMB bn in Jul 2015 and a record low of 2.043 RMB bn in Dec 2003. China Other Daily Sundry Article: Account Receivable data remains active status in CEIC and is reported by National Bureau of Statistics. The data is categorized under China Premium Database’s Industrial Sector – Table CN.BIM: Daily Sundry Article: Other Daily Sundry Article.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Prevalence of journal-specific features (peer-reviewed journal articles only).
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Spain Imports of statuettes and other ornamental ceramic articles from Kyrgyzstan was US$834 during 2024, according to the United Nations COMTRADE database on international trade. Spain Imports of statuettes and other ornamental ceramic articles from Kyrgyzstan - data, historical chart and statistics - was last updated on November of 2025.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Spain Exports of statuettes and other ornamental ceramic articles to Bulgaria was US$78.51 Thousand during 2024, according to the United Nations COMTRADE database on international trade. Spain Exports of statuettes and other ornamental ceramic articles to Bulgaria - data, historical chart and statistics - was last updated on December of 2025.
Facebook
TwitterTopic Modeling for Research Articles Researchers have access to large online archives of scientific articles. As a consequence, finding relevant articles has become more difficult. Tagging or topic modelling provides a way to give token of identification to research articles which facilitates recommendation and search process.
Given the abstract and title for a set of research articles, predict the topics for each article included in the test set.
Note that a research article can possibly have more than 1 topic. The research article abstracts and titles are sourced from the following 6 topics:
Computer Science
Physics
Mathematics
Statistics
Quantitative Biology
Quantitative Finance
| Column | Description |
|---|---|
| ID | Unique ID for each article |
| TITLE | Title of the research article |
| ABSTRACT | Abstract of the research article |
| Computer Science | Whether article belongs to topic computer science (1/0) |
| Physics | Whether article belongs to topic physics (1/0) |
| Mathematics | Whether article belongs to topic Mathematics (1/0) |
| Statistics | Whether article belongs to topic Statistics (1/0) |
| Quantitative Biology | Whether article belongs to topic Quantitative Biology (1/0) |
| Quantitative Finance | Whether article belongs to topic Quantitative Finance (1/0) |
| ID | Unique ID for each article |
|---|---|
| TITLE | Title of the research article |
| ABSTRACT | Abstract of the research article |
| ID | Unique ID for each article |
|---|---|
| TITLE | Title of the research article |
| ABSTRACT | Abstract of the research article |
| Computer Science | Whether article belongs to topic computer science (1/0) |
| Physics | Whether article belongs to topic physics (1/0) |
| Mathematics | Whether article belongs to topic Mathematics (1/0) |
| Statistics | Whether article belongs to topic Statistics (1/0) |
| Quantitative Biology | Whether article belongs to topic Quantitative Biology (1/0) |
| Quantitative Finance | Whether article belongs to topic Quantitative Finance (1/0) |
Your data will be in front of the world's largest data science community. What questions do you want to see answered?