DRAKO is a Mobile Location Audience Targeting provider with a programmatic trading desk specialising in geolocation analytics and programmatic advertising. Through our customised approach, we offer business and consumer insights as well as addressable audiences for advertising.
Mobile Location Data can be meaningfully transformed into Audience Targeting when used in conjunction with other dataset. Our expansive POI Data allows us to segment users by visitation to major brands and retailers as well as categorizes them into syndicated segments. Beyond POI visits, our proprietary Home Location Model determines residents of geographic areas such as Designated Market Areas, Counties, or States. Relatedly, our Home Location Model also fuels our Geodemographic Census Data segments as we are able to determine residents of the smallest census units. Additionally, we also have audiences of: ticketed event and venue visitors; survey data; and retail data.
All of our Audience Targeting is 100% deterministic in that it only includes high-quality, real visits to locations as defined by a POIs satellite imagery buildings contour. We never use a radius when building an audience unless requested. We have a horizontal accuracy of 5m.
Additionally, we can always cross reference your audience targeting with our syndicated segments:
Overview of our Syndicated Audience Data Segments: - Brand/POI segments (specific named stores and locations) - Categories (behavioural segments - revealed habits) - Census demographic segments (HH income, race, religion, age, family structure, language, etc.,) - Events segments (ticketed live events, conferences, and seminars) - Resident segments (State/province, CMAs, DMAs, city, county, sub-county) - Political segments (Canadian Federal and Provincial, US Congressional Upper and Lower House, US States, City elections, etc.,) - Survey Data (Psychosocial/Demographic survey data) - Retail Data (Receipt/transaction data)
All of our syndicated segments are customizable. That means you can limit them to people within a certain geography, remove employees, include only the most frequent visitors, define your own custom lookback, or extend our audiences using our Home, Work, and Social Extensions.
In addition to our syndicated segments, we’re also able to run custom queries return to you all the Mobile Ad IDs (MAIDs) seen at in a specific location (address; latitude and longitude; or WKT84 Polygon) or in your defined geographic area of interest (political districts, DMAs, Zip Codes, etc.,)
Beyond just returning all the MAIDs seen within a geofence, we are also able to offer additional customizable advantages: - Average precision between 5 and 15 meters - CRM list activation + extension - Extend beyond Mobile Location Data (MAIDs) with our device graph - Filter by frequency of visitations - Home and Work targeting (retrieve only employees or residents of an address) - Home extensions (devices that reside in the same dwelling from your seed geofence) - Rooftop level address geofencing precision (no radius used EVER unless user specified) - Social extensions (devices in the same social circle as users in your seed geofence) - Turn analytics into addressable audiences - Work extensions (coworkers of users in your seed geofence)
Data Compliance: All of our Audience Targeting Data is fully CCPA compliant and 100% sourced from SDKs (Software Development Kits), the most reliable and consistent mobile data stream with end user consent available with only a 4-5 day delay. This means that our location and device ID data comes from partnerships with over 1,500+ mobile apps. This data comes with an associated location which is how we are able to segment using geofences.
Data Quality: In addition to partnering with trusted SDKs, DRAKO has additional screening methods to ensure that our mobile location data is consistent and reliable. This includes data harmonization and quality scoring from all of our partners in order to disregard MAIDs with a low quality score.
By downloading the data, you agree with the terms & conditions mentioned below:
Data Access: The data in the research collection may only be used for research purposes. Portions of the data are copyrighted and have commercial value as data, so you must be careful to use them only for research purposes.
Summaries, analyses and interpretations of the linguistic properties of the information may be derived and published, provided it is impossible to reconstruct the information from these summaries. You may not try identifying the individuals whose texts are included in this dataset. You may not try to identify the original entry on the fact-checking site. You are not permitted to publish any portion of the dataset besides summary statistics or share it with anyone else.
We grant you the right to access the collection's content as described in this agreement. You may not otherwise make unauthorised commercial use of, reproduce, prepare derivative works, distribute copies, perform, or publicly display the collection or parts of it. You are responsible for keeping and storing the data in a way that others cannot access. The data is provided free of charge.
Citation
Please cite our work as
@InProceedings{clef-checkthat:2022:task3, author = {K{"o}hler, Juliane and Shahi, Gautam Kishore and Stru{\ss}, Julia Maria and Wiegand, Michael and Siegel, Melanie and Mandl, Thomas}, title = "Overview of the {CLEF}-2022 {CheckThat}! Lab Task 3 on Fake News Detection", year = {2022}, booktitle = "Working Notes of CLEF 2022---Conference and Labs of the Evaluation Forum", series = {CLEF~'2022}, address = {Bologna, Italy},}
@article{shahi2021overview, title={Overview of the CLEF-2021 CheckThat! lab task 3 on fake news detection}, author={Shahi, Gautam Kishore and Stru{\ss}, Julia Maria and Mandl, Thomas}, journal={Working Notes of CLEF}, year={2021} }
Problem Definition: Given the text of a news article, determine whether the main claim made in the article is true, partially true, false, or other (e.g., claims in dispute) and detect the topical domain of the article. This task will run in English and German.
Task 3: Multi-class fake news detection of news articles (English) Sub-task A would detect fake news designed as a four-class classification problem. Given the text of a news article, determine whether the main claim made in the article is true, partially true, false, or other. The training data will be released in batches and roughly about 1264 articles with the respective label in English language. Our definitions for the categories are as follows:
False - The main claim made in an article is untrue.
Partially False - The main claim of an article is a mixture of true and false information. The article contains partially true and partially false information but cannot be considered 100% true. It includes all articles in categories like partially false, partially true, mostly true, miscaptioned, misleading etc., as defined by different fact-checking services.
True - This rating indicates that the primary elements of the main claim are demonstrably true.
Other- An article that cannot be categorised as true, false, or partially false due to a lack of evidence about its claims. This category includes articles in dispute and unproven articles.
Cross-Lingual Task (German)
Along with the multi-class task for the English language, we have introduced a task for low-resourced language. We will provide the data for the test in the German language. The idea of the task is to use the English data and the concept of transfer to build a classification model for the German language.
Input Data
The data will be provided in the format of Id, title, text, rating, the domain; the description of the columns is as follows:
ID- Unique identifier of the news article
Title- Title of the news article
text- Text mentioned inside the news article
our rating - class of the news article as false, partially false, true, other
Output data format
public_id- Unique identifier of the news article
predicted_rating- predicted class
Sample File
public_id, predicted_rating 1, false 2, true
IMPORTANT!
We have used the data from 2010 to 2022, and the content of fake news is mixed up with several topics like elections, COVID-19 etc.
Baseline: For this task, we have created a baseline system. The baseline system can be found at https://zenodo.org/record/6362498
Related Work
Shahi GK. AMUSED: An Annotation Framework of Multi-modal Social Media Data. arXiv preprint arXiv:2010.00502. 2020 Oct 1.https://arxiv.org/pdf/2010.00502.pdf
G. K. Shahi and D. Nandini, “FakeCovid – a multilingual cross-domain fact check news dataset for covid-19,” in workshop Proceedings of the 14th International AAAI Conference on Web and Social Media, 2020. http://workshop-proceedings.icwsm.org/abstract?id=2020_14
Shahi, G. K., Dirkson, A., & Majchrzak, T. A. (2021). An exploratory study of covid-19 misinformation on twitter. Online Social Networks and Media, 22, 100104. doi: 10.1016/j.osnem.2020.100104
Shahi, G. K., Struß, J. M., & Mandl, T. (2021). Overview of the CLEF-2021 CheckThat! lab task 3 on fake news detection. Working Notes of CLEF.
Nakov, P., Da San Martino, G., Elsayed, T., Barrón-Cedeno, A., Míguez, R., Shaar, S., ... & Mandl, T. (2021, March). The CLEF-2021 CheckThat! lab on detecting check-worthy claims, previously fact-checked claims, and fake news. In European Conference on Information Retrieval (pp. 639-649). Springer, Cham.
Nakov, P., Da San Martino, G., Elsayed, T., Barrón-Cedeño, A., Míguez, R., Shaar, S., ... & Kartal, Y. S. (2021, September). Overview of the CLEF–2021 CheckThat! Lab on Detecting Check-Worthy Claims, Previously Fact-Checked Claims, and Fake News. In International Conference of the Cross-Language Evaluation Forum for European Languages (pp. 264-291). Springer, Cham.
This data set was created so as to analyze the latest shows available on Amazon Prime as well as the shows with a high rating.
The data set contains the name of the show or title, year of the release which is the year in which the show was released or went on-air, No.of seasons means the number of seasons of the show which are available on Prime, Language is for the audio language of the show and does not take into consideration the language of the subtitles, genre of the show like Kids, Drama, Action and so on, IMDB ratings of the show: though for many tv shows and kid shows the rating was not available, Age of Viewers is to specify the age of the target audience- All in age means that the content is not restricted to any particular age group and all audiences can view it.
I have collected this data from Amazon Prime's Website.
Since a lot many TV shows have high IMDB ratings but don't get viewed that much because the audience is not aware of it or it is not advertised much. I have created this data set so as to find out the highest-rated shows in each category or in a particular genre.
https://dataful.in/terms-and-conditionshttps://dataful.in/terms-and-conditions
This Dataset contains year-wise list of movies with all certification categories in Hinglish language. It has other details like certification date, movie length, certificate registration office, producer name etc. Note: 1) A-Certification means movies restricted to adult audiences 2) UA-certification means Unrestricted public exhibition subject to parental guidance for children below the age of twelve 3) U-Certificate means Unrestricted for Public Exhibition 4) S-Certification means movies restricted to specialized audiences such as doctors or scientists 5) The movie_length column is not properly defined at the source. The value mentioned in the movie_length column can either mean meters (for celluloid version) or minutes (for video version). For both meters and minutes, the unit is given as Mts at the source. 6) The data include not just feature films, but also short films, promos, songs, etc.
https://dataful.in/terms-and-conditionshttps://dataful.in/terms-and-conditions
This Dataset contains year-wise list of movies with all certification categories in Russian language. It has other details like certification date, movie length, certificate registration office, producer name etc. Note: 1) A-Certification means movies restricted to adult audiences 2) UA-certification means Unrestricted public exhibition subject to parental guidance for children below the age of twelve 3) U-Certificate means Unrestricted for Public Exhibition 4) S-Certification means movies restricted to specialized audiences such as doctors or scientists 5) The movie_length column is not properly defined at the source. The value mentioned in the movie_length column can either mean meters (for celluloid version) or minutes (for video version). For both meters and minutes, the unit is given as Mts at the source. 6) The data include not just feature films, but also short films, promos, songs, etc.
https://dataful.in/terms-and-conditionshttps://dataful.in/terms-and-conditions
This Dataset contains year-wise list of movies with all certification categories in English language. It has other details like certification date, movie length, certificate registration office, producer name etc. Note: 1) A-Certification means movies restricted to adult audiences 2) UA-certification means Unrestricted public exhibition subject to parental guidance for children below the age of twelve 3) U-Certificate means Unrestricted for Public Exhibition 4) S-Certification means movies restricted to specialized audiences such as doctors or scientists 5) The movie_length column is not properly defined at the source. The value mentioned in the movie_length column can either mean meters (for celluloid version) or minutes (for video version). For both meters and minutes, the unit is given as Mts at the source. 6) The data include not just feature films, but also short films, promos, songs, etc.
https://dataful.in/terms-and-conditionshttps://dataful.in/terms-and-conditions
This Dataset contains year-wise list of movies with all certification categories in Chinese language. It has other details like certification date, movie length, certificate registration office, producer name etc. Note: 1) A-Certification means movies restricted to adult audiences 2) UA-certification means Unrestricted public exhibition subject to parental guidance for children below the age of twelve 3) U-Certificate means Unrestricted for Public Exhibition 4) S-Certification means movies restricted to specialized audiences such as doctors or scientists 5) The movie_length column is not properly defined at the source. The value mentioned in the movie_length column can either mean meters (for celluloid version) or minutes (for video version). For both meters and minutes, the unit is given as Mts at the source. 6) The data include not just feature films, but also short films, promos, songs, etc.
https://dataful.in/terms-and-conditionshttps://dataful.in/terms-and-conditions
This Dataset contains year-wise list of movies with all certification categories in Polish language. It has other details like certification date, movie length, certificate registration office, producer name etc. Note: 1) A-Certification means movies restricted to adult audiences 2) UA-certification means Unrestricted public exhibition subject to parental guidance for children below the age of twelve 3) U-Certificate means Unrestricted for Public Exhibition 4) S-Certification means movies restricted to specialized audiences such as doctors or scientists 5) The movie_length column is not properly defined at the source. The value mentioned in the movie_length column can either mean meters (for celluloid version) or minutes (for video version). For both meters and minutes, the unit is given as Mts at the source. 6) The data include not just feature films, but also short films, promos, songs, etc.
https://dataful.in/terms-and-conditionshttps://dataful.in/terms-and-conditions
This Dataset contains year-wise list of movies with all certification categories for Languages with Fewer Movies. It has other details like certification date, movie length, certificate registration office, producer name etc. Note: 1) A-Certification means movies restricted to adult audiences 2) UA-certification means Unrestricted public exhibition subject to parental guidance for children below the age of twelve 3) U-Certificate means Unrestricted for Public Exhibition 4) S-Certification means movies restricted to specialized audiences such as doctors or scientists 5) The movie_length column is not properly defined at the source. The value mentioned in the movie_length column can either mean meters (for celluloid version) or minutes (for video version). For both meters and minutes, the unit is given as Mts at the source. 6) The data include not just feature films, but also short films, promos, songs, etc.
Not seeing a result you expected?
Learn how you can add new datasets to our index.
DRAKO is a Mobile Location Audience Targeting provider with a programmatic trading desk specialising in geolocation analytics and programmatic advertising. Through our customised approach, we offer business and consumer insights as well as addressable audiences for advertising.
Mobile Location Data can be meaningfully transformed into Audience Targeting when used in conjunction with other dataset. Our expansive POI Data allows us to segment users by visitation to major brands and retailers as well as categorizes them into syndicated segments. Beyond POI visits, our proprietary Home Location Model determines residents of geographic areas such as Designated Market Areas, Counties, or States. Relatedly, our Home Location Model also fuels our Geodemographic Census Data segments as we are able to determine residents of the smallest census units. Additionally, we also have audiences of: ticketed event and venue visitors; survey data; and retail data.
All of our Audience Targeting is 100% deterministic in that it only includes high-quality, real visits to locations as defined by a POIs satellite imagery buildings contour. We never use a radius when building an audience unless requested. We have a horizontal accuracy of 5m.
Additionally, we can always cross reference your audience targeting with our syndicated segments:
Overview of our Syndicated Audience Data Segments: - Brand/POI segments (specific named stores and locations) - Categories (behavioural segments - revealed habits) - Census demographic segments (HH income, race, religion, age, family structure, language, etc.,) - Events segments (ticketed live events, conferences, and seminars) - Resident segments (State/province, CMAs, DMAs, city, county, sub-county) - Political segments (Canadian Federal and Provincial, US Congressional Upper and Lower House, US States, City elections, etc.,) - Survey Data (Psychosocial/Demographic survey data) - Retail Data (Receipt/transaction data)
All of our syndicated segments are customizable. That means you can limit them to people within a certain geography, remove employees, include only the most frequent visitors, define your own custom lookback, or extend our audiences using our Home, Work, and Social Extensions.
In addition to our syndicated segments, we’re also able to run custom queries return to you all the Mobile Ad IDs (MAIDs) seen at in a specific location (address; latitude and longitude; or WKT84 Polygon) or in your defined geographic area of interest (political districts, DMAs, Zip Codes, etc.,)
Beyond just returning all the MAIDs seen within a geofence, we are also able to offer additional customizable advantages: - Average precision between 5 and 15 meters - CRM list activation + extension - Extend beyond Mobile Location Data (MAIDs) with our device graph - Filter by frequency of visitations - Home and Work targeting (retrieve only employees or residents of an address) - Home extensions (devices that reside in the same dwelling from your seed geofence) - Rooftop level address geofencing precision (no radius used EVER unless user specified) - Social extensions (devices in the same social circle as users in your seed geofence) - Turn analytics into addressable audiences - Work extensions (coworkers of users in your seed geofence)
Data Compliance: All of our Audience Targeting Data is fully CCPA compliant and 100% sourced from SDKs (Software Development Kits), the most reliable and consistent mobile data stream with end user consent available with only a 4-5 day delay. This means that our location and device ID data comes from partnerships with over 1,500+ mobile apps. This data comes with an associated location which is how we are able to segment using geofences.
Data Quality: In addition to partnering with trusted SDKs, DRAKO has additional screening methods to ensure that our mobile location data is consistent and reliable. This includes data harmonization and quality scoring from all of our partners in order to disregard MAIDs with a low quality score.