Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Medium is an American online publishing platform launched in August 2012. Crawl Feeds team extracted data from medium articles for research and analysis purposes.
Fields
Total fields: 15
url, crawled_at, id, title, author, published_at, author_url, reading_time, total_claps, raw_description, source, description, tags, images, modified_at
Get complete dataset from crawl feeds over more than 500K+ records Link
Facebook
TwitterBy downloading the data, you agree with the terms & conditions mentioned below:
Data Access: The data in the research collection may only be used for research purposes. Portions of the data are copyrighted and have commercial value as data, so you must be careful to use them only for research purposes.
Summaries, analyses and interpretations of the linguistic properties of the information may be derived and published, provided it is impossible to reconstruct the information from these summaries. You may not try identifying the individuals whose texts are included in this dataset. You may not try to identify the original entry on the fact-checking site. You are not permitted to publish any portion of the dataset besides summary statistics or share it with anyone else.
We grant you the right to access the collection's content as described in this agreement. You may not otherwise make unauthorised commercial use of, reproduce, prepare derivative works, distribute copies, perform, or publicly display the collection or parts of it. You are responsible for keeping and storing the data in a way that others cannot access. The data is provided free of charge.
Citation
Please cite our work as
@InProceedings{clef-checkthat:2022:task3,
author = {K{\"o}hler, Juliane and Shahi, Gautam Kishore and Stru{\ss}, Julia Maria and Wiegand, Michael and Siegel, Melanie and Mandl, Thomas},
title = "Overview of the {CLEF}-2022 {CheckThat}! Lab Task 3 on Fake News Detection",
year = {2022},
booktitle = "Working Notes of CLEF 2022---Conference and Labs of the Evaluation Forum",
series = {CLEF~'2022},
address = {Bologna, Italy},}
@article{shahi2021overview,
title={Overview of the CLEF-2021 CheckThat! lab task 3 on fake news detection},
author={Shahi, Gautam Kishore and Stru{\ss}, Julia Maria and Mandl, Thomas},
journal={Working Notes of CLEF},
year={2021}
}
Problem Definition: Given the text of a news article, determine whether the main claim made in the article is true, partially true, false, or other (e.g., claims in dispute) and detect the topical domain of the article. This task will run in English and German.
Task 3: Multi-class fake news detection of news articles (English) Sub-task A would detect fake news designed as a four-class classification problem. Given the text of a news article, determine whether the main claim made in the article is true, partially true, false, or other. The training data will be released in batches and roughly about 1264 articles with the respective label in English language. Our definitions for the categories are as follows:
False - The main claim made in an article is untrue.
Partially False - The main claim of an article is a mixture of true and false information. The article contains partially true and partially false information but cannot be considered 100% true. It includes all articles in categories like partially false, partially true, mostly true, miscaptioned, misleading etc., as defined by different fact-checking services.
True - This rating indicates that the primary elements of the main claim are demonstrably true.
Other- An article that cannot be categorised as true, false, or partially false due to a lack of evidence about its claims. This category includes articles in dispute and unproven articles.
Cross-Lingual Task (German)
Along with the multi-class task for the English language, we have introduced a task for low-resourced language. We will provide the data for the test in the German language. The idea of the task is to use the English data and the concept of transfer to build a classification model for the German language.
Input Data
The data will be provided in the format of Id, title, text, rating, the domain; the description of the columns is as follows:
Output data format
Sample File
public_id, predicted_rating
1, false
2, true
IMPORTANT!
Baseline: For this task, we have created a baseline system. The baseline system can be found at https://zenodo.org/record/6362498
Related Work
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset contains 20k news headlines, descriptions & articles from August 11, 2019 to June 8, 2020 obtained from Indian Express.
This dataset was obtained from www.indianexpress.com
article_id: has generated article id's.
headline: headline of the article.
desc: description of the article
date: date and time of the article
url: url of the article
articles: full article
article_type: short, mid, long values to show the length of the article.
article_length: ength of the article.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Statistics from the paper: Are scholarly articles disproportionately read in their own country? An analysis of Mendeley readers
by Mike Thelwall and Nabeil Maflahi
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Reservoir operation rules(1) continuous flood discharge with ecological priority(2) pulse flood discharge with ecological priority
(3) pulse flood discharge with equal weight of ecology and power generation(4) pulse flood discharge with power generation priority
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Participant demographics and summary statistics.
Facebook
TwitterPubMed Central® (PMC) is a free full-text archive of biomedical and life sciences journal article at the U.S. National Institutes of Health's National Library of Medicine (NIH/NLM). The PubMed Central (PMC) Article Datasets include full-text articles archived in PMC and made available under license terms that allow for text mining and other types of secondary analysis and reuse. The articles are organized on AWS based on general license type:
The PMC Open Access (OA) Subset, which includes all articles in PMC with a machine-readable Creative Commons license
The Author Manuscript Dataset, which includes all articles collected under a funder policy in PMC and made available in machine-readable formats for text mining
These datasets collectively span more than half of PMC’s total collection of full-text articles. PMC enables access to these datasets to expand the impact of open access and publicly-funded research; enable greater machine learning across the spectrum of scientific research; reach new audiences; and open new doors for discovery. The bucket in this registry contains individual articles in NISO Z39.96-2015 JATS XML format as well as in plain text as extracted from the XML. The bucket is updated daily with new and updated articles. Also included are file lists that include metadata for articles in each dataset.
Facebook
TwitterAttribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Article on Awassi Sheep in Palmyra and Its Surrounding Desert Areas: Statistics and Renowned Breeders in English
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This study estimates the effect of data sharing on the citations of academic articles, using journal policies as a natural experiment. We begin by examining 17 high-impact journals that have adopted the requirement that data from published articles be publicly posted. We match these 17 journals to 13 journals without policy changes and find that empirical articles published just before their change in editorial policy have citation rates with no statistically significant difference from those published shortly after the shift. We then ask whether this null result stems from poor compliance with data sharing policies, and use the data sharing policy changes as instrumental variables to examine more closely two leading journals in economics and political science with relatively strong enforcement of new data policies. We find that articles that make their data available receive 97 additional citations (estimate standard error of 34). We conclude that: a) authors who share data may be rewarded eventually with additional scholarly citations, and b) data-posting policies alone do not increase the impact of articles published in a journal unless those policies are enforced.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
CBLEED simulation results associated to Figures 4, 5 and 6 in Autoencoder latent space sensitivity to material structure in convergent-beam low energy electron diffraction. Files are in png format and txt format with raw data corresponding to the image
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
What do the alt-metrics of figshare items tell us? This dataset lists Altmetric data for the top 100 figshare repository items, categorised by type (retrieved on 9 March 2013). The data appear in an Interactions post on the Altmetric blog.
Facebook
TwitterAttribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
This dataset contains data collected during an in vivo experiment with pigs at the Wageningen University as part of the PhD Thesis Projects of Mirelle Geervliet and Hugo de Vries (first authors of the manuscript). This research project was made possible by The Netherlands Organisation for Scientific Research and Vereniging Diervoeders Nederland (VDN).
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset consists of raw data and 10-minute statistics of wind speed and wind direction measurements from a 22m mast near Utlangan, Sweden. The period includes more than 27 hours of measurements from 1989.Detailed site documentation is available here.Public data
Run statistics stored as NetCDF format
1) Utlangan_all.nc2) Utlangan_concurent.nc
Raw time series (ascii), with a duration of 600 sec, sampled with 20 Hz.:
3) Utlangan.zip
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
There are 2 processed Excel files and 19 raw JSON files
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Ecological theories often encompass multiple levels of biological organization, such as genes, individuals, populations, and communities. Despite substantial progress toward ecological theory spanning multiple levels, ecological data rarely are connected in this way. This is unfortunate because different types of ecological data often emerge from the same underlying processes and, therefore, are naturally connected among levels. Here, we describe an approach to integrate data collected at multiple levels (e.g., individuals, populations) in a single statistical analysis. The resulting integrated models make full use of existing data and might strengthen links between statistical ecology and ecological models and theories that span multiple levels of organization. Integrated models are increasingly feasible due to recent advances in computational statistics, which allow fast calculations of multiple likelihoods that depend on complex mechanistic models. We discuss recently developed integrated models and outline a simple application using data on freshwater fishes in south-eastern Australia. Available data on freshwater fishes include population survey data, mark-recapture data, and individual growth trajectories. We use these data to estimate age-specific survival and reproduction from size-structured data, accounting for imperfect detection of individuals. Given that such parameter estimates would be infeasible without an integrated model, we argue that integrated models will strengthen ecological theory by connecting theoretical and mathematical models directly to empirical data. Although integrated models remain conceptually and computationally challenging, integrating ecological data among levels is likely to be an important step toward unifying ecology among levels.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Network and loading data for a real-world distribution network in the North-East of England.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Data supporting a publication in npj Biofilms and Microbiomes (Rostami et al.). A single Excel spreadsheet contains all quantitative data underpinning figures.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Fig.2 - Moderately doped n-Ge and p-Ge were used. Data compares IMF and CMF contact I-V curves (magnitudes), Fig.3 - Material studies : HRSEM, TEM, XRD data, Phi scan data, Fig.4 - Theta probe electron emission data, Fig.5 - Cryogenic I-V data on low doped n- and p-Ge (4.2 K) using IMF and CMF contacts. Includes Room temperature I-V data, Fig. S1 - Low energy HRSEM of IMF, Fig. S2 - EFTEM to show Cr encapsulation, Fig. S3 - EBSD to show orientation relationships, including Ge substrate reference scans
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This data package contains two types of data for the Jornada Experimental Range (JER) from 1915 to 1952: 1) shape files containing polygons and attribute tables that represent the pasture configurations on the Jornada Experimental Range and 2) monthly stocking data from these pastures. The livestock represented in the stocking data comprise cattle, horse, sheep, and goats. Grazing goats were infrequent and are grouped with sheep in the source data. As such for this data set, they are included in the sheep category. Stocking data are expressed in animal unit months (AUM), which is based on metabolic weight.This data package provides finer resolution AUM data than knb-lter-jrn.210412001, which presents the annual stocking data for the entire JER from 1916 to 2001. The stocking data in this package begins in June of 1915 and continues through December of 1952, the last year for which the researchers on this project have verified and digitized historical pasture configurations on the JER.https://portal.edirepository.org/nis/mapbrowse?scope=knb-lter-jrn&identifier=210412001
Facebook
TwitterAttribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
License information was derived automatically
The 2 data sets contain the processed data used for the discussion section of the paper "Input Torque Measurements for Wind Turbine Gearboxes Using Fiber Optical Strain Sensors":
There are 2 data sets, one for each torque a linealy variable torque test and the other for a test where torque is changed in steps.
Each data sets includes the following variables: -str_fos: strain from the 54 fiber optical strain sensors-t_fos: time associated to strain data-LSS_taco: data from inductive sensor at low-speed shaft (once per revolution pulse)-HSS_M1_torque: torque data from test bench torque transducer at position 1.-HSS_M2_torque: torque data from test bench torque transducer at position 2.
-t_dq: time associated to analogue signals LSS_taco, HSS_M1 and HSS_M2.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Medium is an American online publishing platform launched in August 2012. Crawl Feeds team extracted data from medium articles for research and analysis purposes.
Fields
Total fields: 15
url, crawled_at, id, title, author, published_at, author_url, reading_time, total_claps, raw_description, source, description, tags, images, modified_at
Get complete dataset from crawl feeds over more than 500K+ records Link