20 datasets found

h
blog_authorship_corpus
huggingface.co
paperswithcode.com
Updated Jul 27, 2003
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Bar-Ilan University (2003). blog_authorship_corpus [Dataset]. https://huggingface.co/datasets/barilan/blog_authorship_corpus
Explore at:
Dataset updated
Jul 27, 2003
Dataset authored and provided by
Bar-Ilan University
License
https://choosealicense.com/licenses/unknown/https://choosealicense.com/licenses/unknown/
Description
The Blog Authorship Corpus consists of the collected posts of 19,320 bloggers gathered from blogger.com in August 2004. The corpus incorporates a total of 681,288 posts and over 140 million words - or approximately 35 posts and 7250 words per person.

Each blog is presented as a separate file, the name of which indicates a blogger id# and the blogger’s self-provided gender, age, industry and astrological sign. (All are labeled for gender and age but for many, industry and/or sign is marked as unknown.)

All bloggers included in the corpus fall into one of three age groups: - 8240 "10s" blogs (ages 13-17), - 8086 "20s" blogs (ages 23-27), - 2994 "30s" blogs (ages 33-47).

For each age group there are an equal number of male and female bloggers.

Each blog in the corpus includes at least 200 occurrences of common English words. All formatting has been stripped with two exceptions. Individual posts within a single blogger are separated by the date of the following post and links within a post are denoted by the label urllink.

The corpus may be freely used for non-commercial research purposes.
d
Blog mix 2013 (2017-02-24) Bloggmix 2013 (2017-02-24) - Dataset - B2FIND
b2find.dkrz.de
Updated Nov 21, 2012
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2012). Blog mix 2013 (2017-02-24) Bloggmix 2013 (2017-02-24) - Dataset - B2FIND [Dataset]. https://b2find.dkrz.de/dataset/17f02033-9108-50f7-80a0-95939c09764e
Explore at:
Dataset updated
Nov 21, 2012
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The blogs in the blogmix are selected through the lists Most visited private blogs, Most visited professional blogs, and the local lists for different regions, at bloggportalen.se. More information, such as the location and age of the blogger is also retrieved from Bloggportalen. The material has not been manually checked, which means that spam may occur. Some English blogs have been removed when discovered, and some blogs have not been added for technical reasons. The time of the blogs ranges from the first to the latest entries of the selected blogs, and the corpus is continually updated. The material is sentence scrambled. Urvalet av bloggar för bloggmixen görs med hjälp av topplistorna på bloggportalen.se, både Mest besökta privata bloggar, Mest besökta proffsbloggar och de lokala topplistorna för olika regioner. Närmare information, som bloggarens ort och ålder, hämtas också från Bloggportalen. Materialet har inte kontrollerats manuellt, vilket betyder att det kan förekomma spam. Några engelskspråkiga bloggar har plockats bort då de upptäckts, och vissa bloggar har inte kunnat läsas in av tekniska skäl. Tidsperioden sträcker sig från de första inläggen i de utvalda bloggarna till de senaste inläggen. Korpusen uppdateras regelbundet.
Z
Dataset for the paper: "Monant Medical Misinformation Dataset: Mapping...
data.niaid.nih.gov
zenodo.org
Updated Apr 22, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jakub Simko (2022). Dataset for the paper: "Monant Medical Misinformation Dataset: Mapping Articles to Fact-Checked Claims" [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_5996863
Explore at:
Dataset updated
Apr 22, 2022
Dataset provided by
Jakub Simko
Matus Tomlein
Ivan Srba
Robert Moro
Elena Stefancova
Branislav Pecher
Maria Bielikova
Description
Overview

This dataset of medical misinformation was collected and is published by Kempelen Institute of Intelligent Technologies (KInIT). It consists of approx. 317k news articles and blog posts on medical topics published between January 1, 1998 and February 1, 2022 from a total of 207 reliable and unreliable sources. The dataset contains full-texts of the articles, their original source URL and other extracted metadata. If a source has a credibility score available (e.g., from Media Bias/Fact Check), it is also included in the form of annotation. Besides the articles, the dataset contains around 3.5k fact-checks and extracted verified medical claims with their unified veracity ratings published by fact-checking organisations such as Snopes or FullFact. Lastly and most importantly, the dataset contains 573 manually and more than 51k automatically labelled mappings between previously verified claims and the articles; mappings consist of two values: claim presence (i.e., whether a claim is contained in the given article) and article stance (i.e., whether the given article supports or rejects the claim or provides both sides of the argument).

The dataset is primarily intended to be used as a training and evaluation set for machine learning methods for claim presence detection and article stance classification, but it enables a range of other misinformation related tasks, such as misinformation characterisation or analyses of misinformation spreading.

Its novelty and our main contributions lie in (1) focus on medical news article and blog posts as opposed to social media posts or political discussions; (2) providing multiple modalities (beside full-texts of the articles, there are also images and videos), thus enabling research of multimodal approaches; (3) mapping of the articles to the fact-checked claims (with manual as well as predicted labels); (4) providing source credibility labels for 95% of all articles and other potential sources of weak labels that can be mined from the articles' content and metadata.

The dataset is associated with the research paper "Monant Medical Misinformation Dataset: Mapping Articles to Fact-Checked Claims" accepted and presented at ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '22).

The accompanying Github repository provides a small static sample of the dataset and the dataset's descriptive analysis in a form of Jupyter notebooks.

Options to access the dataset

There are two ways how to get access to the dataset:

Static dump of the dataset available in the CSV format

Continuously updated dataset available via REST API

In order to obtain an access to the dataset (either to full static dump or REST API), please, request the access by following instructions provided below.

References

If you use this dataset in any publication, project, tool or in any other form, please, cite the following papers:

@inproceedings{SrbaMonantPlatform, author = {Srba, Ivan and Moro, Robert and Simko, Jakub and Sevcech, Jakub and Chuda, Daniela and Navrat, Pavol and Bielikova, Maria}, booktitle = {Proceedings of Workshop on Reducing Online Misinformation Exposure (ROME 2019)}, pages = {1--7}, title = {Monant: Universal and Extensible Platform for Monitoring, Detection and Mitigation of Antisocial Behavior}, year = {2019} }

@inproceedings{SrbaMonantMedicalDataset, author = {Srba, Ivan and Pecher, Branislav and Tomlein Matus and Moro, Robert and Stefancova, Elena and Simko, Jakub and Bielikova, Maria}, booktitle = {Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '22)}, numpages = {11}, title = {Monant Medical Misinformation Dataset: Mapping Articles to Fact-Checked Claims}, year = {2022}, doi = {10.1145/3477495.3531726}, publisher = {Association for Computing Machinery}, address = {New York, NY, USA}, url = {https://doi.org/10.1145/3477495.3531726}, }

Dataset creation process

In order to create this dataset (and to continuously obtain new data), we used our research platform Monant. The Monant platform provides so called data providers to extract news articles/blogs from news/blog sites as well as fact-checking articles from fact-checking sites. General parsers (from RSS feeds, Wordpress sites, Google Fact Check Tool, etc.) as well as custom crawler and parsers were implemented (e.g., for fact checking site Snopes.com). All data is stored in the unified format in a central data storage.

Ethical considerations

The dataset was collected and is published for research purposes only. We collected only publicly available content of news/blog articles. The dataset contains identities of authors of the articles if they were stated in the original source; we left this information, since the presence of an author's name can be a strong credibility indicator. However, we anonymised the identities of the authors of discussion posts included in the dataset.

The main identified ethical issue related to the presented dataset lies in the risk of mislabelling of an article as supporting a false fact-checked claim and, to a lesser extent, in mislabelling an article as not containing a false claim or not supporting it when it actually does. To minimise these risks, we developed a labelling methodology and require an agreement of at least two independent annotators to assign a claim presence or article stance label to an article. It is also worth noting that we do not label an article as a whole as false or true. Nevertheless, we provide partial article-claim pair veracities based on the combination of claim presence and article stance labels.

As to the veracity labels of the fact-checked claims and the credibility (reliability) labels of the articles' sources, we take these from the fact-checking sites and external listings such as Media Bias/Fact Check as they are and refer to their methodologies for more details on how they were established.

Lastly, the dataset also contains automatically predicted labels of claim presence and article stance using our baselines described in the next section. These methods have their limitations and work with certain accuracy as reported in this paper. This should be taken into account when interpreting them.

Reporting mistakes in the dataset The mean to report considerable mistakes in raw collected data or in manual annotations is by creating a new issue in the accompanying Github repository. Alternately, general enquiries or requests can be sent at info [at] kinit.sk.

Dataset structure

Raw data

At first, the dataset contains so called raw data (i.e., data extracted by the Web monitoring module of Monant platform and stored in exactly the same form as they appear at the original websites). Raw data consist of articles from news sites and blogs (e.g. naturalnews.com), discussions attached to such articles, fact-checking articles from fact-checking portals (e.g. snopes.com). In addition, the dataset contains feedback (number of likes, shares, comments) provided by user on social network Facebook which is regularly extracted for all news/blogs articles.

Raw data are contained in these CSV files (and corresponding REST API endpoints):

sources.csv

articles.csv

article_media.csv

article_authors.csv

discussion_posts.csv

discussion_post_authors.csv

fact_checking_articles.csv

fact_checking_article_media.csv

claims.csv

feedback_facebook.csv

Note: Personal information about discussion posts' authors (name, website, gravatar) are anonymised.

Annotations

Secondly, the dataset contains so called annotations. Entity annotations describe the individual raw data entities (e.g., article, source). Relation annotations describe relation between two of such entities.

Each annotation is described by the following attributes:

category of annotation (annotation_category). Possible values: label (annotation corresponds to ground truth, determined by human experts) and prediction (annotation was created by means of AI method).

type of annotation (annotation_type_id). Example values: Source reliability (binary), Claim presence. The list of possible values can be obtained from enumeration in annotation_types.csv.

method which created annotation (method_id). Example values: Expert-based source reliability evaluation, Fact-checking article to claim transformation method. The list of possible values can be obtained from enumeration methods.csv.

its value (value). The value is stored in JSON format and its structure differs according to particular annotation type.

At the same time, annotations are associated with a particular object identified by:

entity type (parameter entity_type in case of entity annotations, or source_entity_type and target_entity_type in case of relation annotations). Possible values: sources, articles, fact-checking-articles.

entity id (parameter entity_id in case of entity annotations, or source_entity_id and target_entity_id in case of relation annotations).

The dataset provides specifically these entity annotations:

Source reliability (binary). Determines validity of source (website) at a binary scale with two options: reliable source and unreliable source.

Article veracity. Aggregated information about veracity from article-claim pairs.

The dataset provides specifically these relation annotations:

Fact-checking article to claim mapping. Determines mapping between fact-checking article and claim.

Claim presence. Determines presence of claim in article.

Claim stance. Determines stance of an article to a claim.

Annotations are contained in these CSV files (and corresponding REST API endpoints):

entity_annotations.csv

relation_annotations.csv

Note: Identification of human annotators authors (email provided in the annotation app) is anonymised.
f
Data from: THE PRODUCTION OF PROFESSIONAL BLOGS AS REFLEXIVE TOOLS IN...
scielo.figshare.com
figshare.com
jpeg
Updated May 31, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Lucas Moreira dos Anjos-Santos; Vera Lúcia Lopes Cristovão (2023). THE PRODUCTION OF PROFESSIONAL BLOGS AS REFLEXIVE TOOLS IN PRE-SERVICE ENGLISH TEACHER EDUCATION [Dataset]. http://doi.org/10.6084/m9.figshare.7514594.v1
Explore at:
jpegAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.7514594.v1
Dataset updated
May 31, 2023
Dataset provided by
SciELO journals
Authors
Lucas Moreira dos Anjos-Santos; Vera Lúcia Lopes Cristovão
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
In this paper, we aim to analyze the production of professional blogs by pre-service English teachers and the roles that such digital language practices may perform in the education of reflexive and critical language teachers. Specifically, we analyzed the blog posts that two pre-service teachers produced and the professional identities that are forged in such a digital language practice. The reported case study is of qualitative and interpretative nature. The data, composed by the pre-service English teachers' blog posts and their experiential narratives regarding the pedagogical practice they experienced, were generated in 2010 in an elective unit in a state university of the north of Paraná. The results demonstrate the emergence of identity conflicts due to the engagement of the pre-service English teachers in the production of digital language practices. These conflicts have generated an impulse towards the reconstruction of the identities of these future English language professionals.
f
Twitter bot profiling
figshare.com
researchdata.smu.edu.sg
pdf
Updated May 31, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Living Analytics Research Centre (2023). Twitter bot profiling [Dataset]. http://doi.org/10.25440/smu.12062706.v1
Explore at:
pdfAvailable download formats
Unique identifier
https://doi.org/10.25440/smu.12062706.v1
Dataset updated
May 31, 2023
Dataset provided by
SMU Research Data Repository (RDR)
Authors
Living Analytics Research Centre
License
http://rightsstatements.org/vocab/InC/1.0/http://rightsstatements.org/vocab/InC/1.0/
Description
This dataset comprises a set of Twitter accounts in Singapore that are used for social bot profiling research conducted by the Living Analytics Research Centre (LARC) at Singapore Management University (SMU). Here a bot is defined as a Twitter account that generates contents and/or interacts with other users automatically (at least according to human judgment). In this research, Twitter bots have been categorized into three major types:

Broadcast bot. This bot aims at disseminating information to general audience by providing, e.g., benign links to news, blogs or sites. Such bot is often managed by an organization or a group of people (e.g., bloggers). Consumption bot. The main purpose of this bot is to aggregate contents from various sources and/or provide update services (e.g., horoscope reading, weather update) for personal consumption or use. Spam bot. This type of bots posts malicious contents (e.g., to trick people by hijacking certain account or redirecting them to malicious sites), or promotes harmless but invalid/irrelevant contents aggressively.

This categorization is general enough to cater for new, emerging types of bot (e.g., chatbots can be viewed as a special type of broadcast bots). The dataset was collected from 1 January to 30 April 2014 via the Twitter REST and streaming APIs. Starting from popular seed users (i.e., users having many followers), their follow, retweet, and user mention links were crawled. The data collection proceeds by adding those followers/followees, retweet sources, and mentioned users who state Singapore in their profile location. Using this procedure, a total of 159,724 accounts have been collected. To identify bots, the first step is to check active accounts who tweeted at least 15 times within the month of April 2014. These accounts were then manually checked and labelled, of which 589 bots were found. As many more human users are expected in the Twitter population, the remaining accounts were randomly sampled and manually checked. With this, 1,024 human accounts were identified. In total, this results in 1,613 labelled accounts. Related Publication: R. J. Oentaryo, A. Murdopo, P. K. Prasetyo, and E.-P. Lim. (2016). On profiling bots in social media. Proceedings of the International Conference on Social Informatics (SocInfo’16), 92-109. Bellevue, WA. https://doi.org/10.1007/978-3-319-47880-7_6
Datasets of word network topic model
figshare.com
application/x-rar
Updated Jun 1, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jichang Zhao (2023). Datasets of word network topic model [Dataset]. http://doi.org/10.6084/m9.figshare.5572588.v1
Explore at:
application/x-rarAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.5572588.v1
Dataset updated
Jun 1, 2023
Dataset provided by
figshare
Authors
Jichang Zhao
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Abstract: This dataset holds the content of one day's micro-blogs sampled from Weibo(http://weibo.com) in the form of bags-of-words.-----------------------------------------------------Data Set Characteristics: TextNumber of Micro-blogs:189,223Total Number of Words:3,252,492Size of the Vocabulary:20,942Associated Tasks: short text topic modeling and etc.-----------------------------------------------------About PreprocessingFor tokenization, we use NLPIR. Stop words and those with term-frequence less than 20 were removed. Besides,words contain only one chinese-character were also removed.-----------------------------------------------------Data FormatThe format of released data is setted as follows:[document_1][document_2]...[document_M]in which each line is one document. [document_i] is the ith document of the dataset that consists of a list of Ni words/terms.[document_i] = [word_i1] [word_i2] ... [word_iNi]in which all word_ij are text strings and they are separated by the blank character.-----------------------------------------------------If you have any questions about the data set, please contact: jichang@buaa.edu.cn.
Learning Management System
catalog.data.gov
datasets.ai
+1more
Updated Jun 8, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
data.usaid.gov (2024). Learning Management System [Dataset]. https://catalog.data.gov/dataset/learning-management-system
Explore at:
Dataset updated
Jun 8, 2024
Dataset provided by
United States Agency for International Developmenthttps://usaid.gov/
Description
Although the commercial name for the The USAID University - Learning Management System is CSOD InCompass, the agencies that use the system have renamed (or rebranded) their specific agency portals to meet their own needs. lnCompass is a comprehensive talent management system that incorporates the following functional modules: 1) Learning -- The Learning module supports the management and tracking of training events and individual training records. Training events may be instructor Jed or online. Courses may be managed within the system to provide descriptions, availability, and registration. Online content is stored on the system. Training information stored for individuals includes courses completed, scores, and courses registered for, 2) Connect -- The Connect module supports employee collaboration efforts. Features include communities of practice, expertise location, blogs, and knowledge sharing support. Profile information that may be stored by the system includes job position, subject matter expertise, and previous accomplishments, 3) Performance -- The Performance module supports management of organizational goals and alignment of those goals to individual performance. The module supports managing skills and competencies for the organization. The module also supports employee performance reviews. The types of information gathered about employees include their skills, competencies, and performance evaluation, 4) Succession -- The Succession module supports workforce management and planning. The type of information gathered for this module includes prior work experience, skills, and competencies, and 5) Extended Enterprise -- The Extended Enterprise module supports delivery of training outside of the organization. Training provided may be for a fee. The type of information collected for this module includes individual data for identifying the person for training records management and related information for commercial transactions.
National Open Address Database (BANO) - Seine-Saint-Denis
ckan.mobidatalab.eu
Updated May 27, 2014
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
OpenStreetMap (2014). National Open Address Database (BANO) - Seine-Saint-Denis [Dataset]. https://ckan.mobidatalab.eu/dataset/national-open-bano-seine-saint-denis-address-base
Explore at:
https://www.iana.org/assignments/media-types/application/json, https://www.iana.org/assignments/media-types/text/csv, https://www.iana.org/assignments/media-types/application/zipAvailable download formats
Dataset updated
May 27, 2014
Dataset provided by
OpenStreetMap//www.openstreetmap.org/
License
Open Database License (ODbL) v1.0https://www.opendatacommons.org/licenses/odbl/1.0/
License information was derived automatically
Area covered
Seine-Saint-Denis
Description
This dataset comes from the Open National Address Base project initiated by OpenStreetMap France.
For more information on this project: http://openstreetmap.fr/blogs/cquest/bano-banco
Origin of data
< p>BANO is a composite database, made up from different sources:

OpenStreetMap
data available in opendata
address data collected on the cadastral site (source DGFiP 2014)
Distribution format
These files are available in shapefile format, in WGS84 projection (EPSG :4326) as well as in CSV format and experimentally as github project.
Description of content
For each address:

id (unique): code_insee + codefantoir + number
number: street number with suffix (e.g.: 1, 1BIS, 1D)
street: street name
post_code: 5-character postcode
city: name of the municipality
source: OSM = data directly from OpenStreetMap, OD = data from local opendata sources, CAD = data directly from the cadastre, C+O = cadastre data enriched by OSM (road name for example)
lat: latitude in WGS84 decimal degrees
lon: longitude in WGS84 decimal degrees
updates, corrections
To update and correct BANO data, simply make improvements directly in OpenStreetMap, they will be taken into account in the next update cycle.
A one-stop collaborative reporting/correction window will soon be set up to simplify the process of improving the content of the database. To participate in its co-construction, do not hesitate to contact us!
For any questions concerning the project or this dataset, you can contact bano@openstreetmap.fr
National Open Address Database (BANO) - Val-d'Oise
ckan.mobidatalab.eu
Updated May 27, 2014
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
OpenStreetMap (2014). National Open Address Database (BANO) - Val-d'Oise [Dataset]. https://ckan.mobidatalab.eu/dataset/national-address-base-open-bano-val-doise
Explore at:
https://www.iana.org/assignments/media-types/application/json, https://www.iana.org/assignments/media-types/text/csv, https://www.iana.org/assignments/media-types/application/zipAvailable download formats
Dataset updated
May 27, 2014
Dataset provided by
OpenStreetMap//www.openstreetmap.org/
License
Open Database License (ODbL) v1.0https://www.opendatacommons.org/licenses/odbl/1.0/
License information was derived automatically
Area covered
Val-d'Oise, Oise
Description
This dataset comes from the Open National Address Base project initiated by OpenStreetMap France.
For more information on this project: http://openstreetmap.fr/blogs/cquest/bano-banco
Origin of data
< p>BANO is a composite database, made up from different sources:

OpenStreetMap
data available in opendata
address data collected on the cadastral site (source DGFiP 2014)
Distribution format
These files are available in shapefile format, in WGS84 projection (EPSG :4326) as well as in CSV format and experimentally as github project.
Description of content
For each address:

id (unique): code_insee + codefantoir + number
number: street number with suffix (e.g.: 1, 1BIS, 1D)
street: street name
post_code: 5-character postcode
city: name of the municipality
source: OSM = data directly from OpenStreetMap, OD = data from local opendata sources, CAD = data directly from the cadastre, C+O = cadastre data enriched by OSM (road name for example)
lat: latitude in WGS84 decimal degrees
lon: longitude in WGS84 decimal degrees
updates, corrections
To update and correct BANO data, simply make improvements directly in OpenStreetMap, they will be taken into account in the next update cycle.
A one-stop collaborative reporting/correction window will soon be set up to simplify the process of improving the content of the database. To participate in its co-construction, do not hesitate to contact us!
For any questions concerning the project or this dataset, you can contact bano@openstreetmap.fr
h
hf-blog-posts-dpo_raw
huggingface.co
hf-proxy-cf.effarig.site
Updated May 3, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Florent Daudens (2024). hf-blog-posts-dpo_raw [Dataset]. https://huggingface.co/datasets/fdaudens/hf-blog-posts-dpo_raw
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
May 3, 2024
Authors
Florent Daudens
Description
Dataset Card for hf-blog-posts-dpo_raw

This dataset has been created with distilabel.

Dataset Summary

This dataset contains a pipeline.yaml which can be used to reproduce the pipeline that generated it in distilabel using the distilabel CLI: distilabel pipeline run --config "https://huggingface.co/datasets/fdaudens/hf-blog-posts-dpo_raw/raw/main/pipeline.yaml"

or explore the configuration: distilabel pipeline info --config… See the full description on the dataset page: https://huggingface.co/datasets/fdaudens/hf-blog-posts-dpo_raw.
E
Graffiti around University of Edinburgh
dtechtive.com
find.data.gov.scot
xml, zip
Updated Feb 22, 2017
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
University of Edinburgh (2017). Graffiti around University of Edinburgh [Dataset]. http://doi.org/10.7488/ds/1961
Explore at:
zip(0.0038 MB), xml(0.0045 MB)Available download formats
Unique identifier
https://doi.org/10.7488/ds/1961
Dataset updated
Feb 22, 2017
Dataset provided by
University of Edinburgh
License
Open Database License (ODbL) v1.0https://www.opendatacommons.org/licenses/odbl/1.0/
License information was derived automatically
Area covered
Edinburgh, UK
Description
This dataset maps the location of anti-social graffiti around the University of Edinburgh's central campus. The data was collected over a 2 week period between the 19th May and the 2nd June 2014. The data was collected using a smartphone through an app called Fieldtrip GB (http://fieldtripgb.blogs.edina.ac.uk/). Multiple asset collectors were deployed to use a pre-defined data collection form which allowed users to log the following attributes: Date / Name of asset collector / Type of graffiti (image/tag/words/advert/.....) / What the graffiti was on (building/wall/lamppost/....) / What medium was used (paint/paper/chalk/....) / Density of graffiti / Photograph / Location. The data is by no means complete and realistically captured only around 50% of the graffiti in the study area. It is hoped that this dataset will be updated every 3 months to chart the distribution of graffiti over time. data was collected using the app Fieldtrip GB Once collected, data from multiple asset collectors was merged in FtGB's authoring tool and exported as a CSV file. This was then imported into QGIS and saved as a vector dataset in ESRI Shapefile format. GIS vector data. This dataset was first accessioned in the EDINA ShareGeo Open repository on 2014-06-06 and migrated to Edinburgh DataShare on 2017-02-22.
HadUK-Grid Gridded Climate Observations on a 12km grid over the UK,...
catalogue.ceda.ac.uk
Updated Jul 24, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dan Hollis; Mark McCarthy; Michael Kendon; Tim Legg (2023). HadUK-Grid Gridded Climate Observations on a 12km grid over the UK, v1.2.0.ceda (1836-2022) [Dataset]. https://catalogue.ceda.ac.uk/uuid/640d33e0cf99477990f7fee35a101850
Explore at:
Dataset updated
Jul 24, 2023
Dataset provided by
Centre for Environmental Data Analysishttp://www.ceda.ac.uk/
Authors
Dan Hollis; Mark McCarthy; Michael Kendon; Tim Legg
License
Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
Time period covered
Jan 1, 1836 - Dec 31, 2022
Area covered

Variables measured
time, latitude, longitude, air_temperature, relative_humidity, duration_of_sunshine, projection_x_coordinate, projection_y_coordinate, lwe_thickness_of_precipitation_amount
Description
HadUK-Grid is a collection of gridded climate variables derived from the network of UK land surface observations. The data have been interpolated from meteorological station data onto a uniform grid to provide complete and consistent coverage across the UK. The dataset at 12 km resolution is derived from the associated 1 km x 1 km resolution to allow for comparison to data from climate projections. The dataset spans the period from 1836 to 2022, but the start time is dependent on climate variable and temporal resolution.

The gridded data are produced for daily, monthly, seasonal and annual timescales, as well as long term averages for a set of climatological reference periods. Variables include air temperature (maximum, minimum and mean), precipitation, sunshine, mean sea level pressure, wind speed, relative humidity, vapour pressure, days of snow lying, and days of ground frost.

This data set supersedes the previous versions of this dataset which also superseded UKCP09 gridded observations. Subsequent versions may be released in due course and will follow the version numbering as outlined by Hollis et al. (2018, see linked documentation).

The changes for v1.2.0.ceda HadUK-Grid datasets are as follows:

Added data for calendar year 2022

Added newly digitised data for monthly sunshine 1910-1918

Added Rainfall Rescue version 2 doi:10.5281/zenodo.7554242

Updated shapefiles used for production of area average statistics https://github.com/ukcp-data/ukcp- spatial-files

Updated controlled vocabulary for metadata assignment https://github.com/ukcp-data/UKCP18_CVs

Updated assignment of timepoint for some periods so that the datetime is the middle of the period (e.g. season) rather than a fixed offset from the period start.

Updated ordering of regions within regional values files. Alphabetical ordering.

Files use netcdf level 4 compression using gzip https://www.unidata.ucar.edu/blogs/developer/entry/netcdf_compression

Net changes to the input station data used to generate this dataset:

Total of 125601744 observations

122621050 (97.6%) unchanged

26700 (0.02%) modified for this version

2953994 (2.35%) added in this version

16315 (0.01%) deleted from this version

Changes to monthly rainfall 1836-1960

Total of 4823973 observations

3315657 (68.7%) unchanged

21029 (0.4%) modified for this version

1487287 (30.8%) added in this version

11155 (0.2%) deleted from this version

The primary purpose of these data are to facilitate monitoring of UK climate and research into climate change, impacts and adaptation. The datasets have been created by the Met Office with financial support from the Department for Business, Energy and Industrial Strategy (BEIS) and Department for Environment, Food and Rural Affairs (DEFRA) in order to support the Public Weather Service Customer Group (PWSCG), the Hadley Centre Climate Programme, and the UK Climate Projections (UKCP18) project. The output from a number of data recovery activities relating to 19th and early 20th Century data have been used in the creation of this dataset, these activities were supported by: the Met Office Hadley Centre Climate Programme; the Natural Environment Research Council project "Analysis of historic drought and water scarcity in the UK"; the UK Research & Innovation (UKRI) Strategic Priorities Fund UK Climate Resilience programme; The UK Natural Environment Research Council (NERC) Public Engagement programme; the National Centre for Atmospheric Science; National Centre for Atmospheric Science and the NERC GloSAT project; and the contribution of many thousands of public volunteers. The dataset is provided under Open Government Licence.
National Open Address Database (BANO) - Yvelines
ckan.mobidatalab.eu
Updated May 27, 2014
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
OpenStreetMap (2014). National Open Address Database (BANO) - Yvelines [Dataset]. https://ckan.mobidatalab.eu/dataset/national-open-bano-yvelines-address-base
Explore at:
https://www.iana.org/assignments/media-types/text/csv, https://www.iana.org/assignments/media-types/application/json, https://www.iana.org/assignments/media-types/application/zipAvailable download formats
Dataset updated
May 27, 2014
Dataset provided by
OpenStreetMap//www.openstreetmap.org/
License
Open Database License (ODbL) v1.0https://www.opendatacommons.org/licenses/odbl/1.0/
License information was derived automatically
Area covered
Yvelines
Description
This dataset comes from the Open National Address Base project initiated by OpenStreetMap France.
For more information on this project: http://openstreetmap.fr/blogs/cquest/bano-banco
Origin of data
< p>BANO is a composite database, made up from different sources:

OpenStreetMap
data available in opendata
address data collected on the cadastral site (source DGFiP 2014)
Distribution format
These files are available in shapefile format, in WGS84 projection (EPSG :4326) as well as in CSV format and experimentally as github project.
Description of content
For each address:

id (unique): code_insee + codefantoir + number
number: street number with suffix (e.g.: 1, 1BIS, 1D)
street: street name
post_code: 5-character postcode
city: name of the municipality
source: OSM = data directly from OpenStreetMap, OD = data from local opendata sources, CAD = data directly from the cadastre, C+O = cadastre data enriched by OSM (road name for example)
lat: latitude in WGS84 decimal degrees
lon: longitude in WGS84 decimal degrees
updates, corrections
To update and correct BANO data, simply make improvements directly in OpenStreetMap, they will be taken into account in the next update cycle.
A one-stop collaborative reporting/correction window will soon be set up to simplify the process of improving the content of the database. To participate in its co-construction, do not hesitate to contact us!
For any questions concerning the project or this dataset, you can contact bano@openstreetmap.fr
C
LEGO Diorama Images
data.wprdc.org
catalog.data.gov
+1more
jpeg
Updated Feb 26, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Western Pennsylvania Regional Data Center (2025). LEGO Diorama Images [Dataset]. https://data.wprdc.org/dataset/lego-diorama-images
Explore at:
jpeg, jpeg(59694)Available download formats
Dataset updated
Feb 26, 2025
Dataset provided by
Western Pennsylvania Regional Data Center
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
We often create dioramas from LEGO bricks for use with our presentations, blogs, and social media posts. We find it's much more fun and effective to reenact meetings and other scenes than to try and use real-life images. It also saves us from the hassle of worrying about receiving permission to use a person's photo in our work. By popular demand, we have released some of our favorite images as open data for you to use.
Outlines of the EPCI 2015
data.europa.eu
esri shape
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
OpenStreetMap, Outlines of the EPCI 2015 [Dataset]. https://data.europa.eu/data/datasets/54f63501c751df466f882844?locale=en
Explore at:
esri shapeAvailable download formats
Dataset authored and provided by
OpenStreetMap//www.openstreetmap.org/
License
Open Database License (ODbL) v1.0https://www.opendatacommons.org/licenses/odbl/1.0/
License information was derived automatically
Description
Geographical contours of the EPCI resulting from the crossing of the communal boundaries of OpenStreetMap and data from the General Directorate of Local Authorities dating from 2015.

These data are partly from the crowdsourcing carried out by the contributors to the OpenStreetMap project and are therefore under ODbL license which imposes an identical sharing and the mandatory attribution mention must be "**© the contributors of OpenStreetMap under ODbL license**" in accordance with http://osm.org/copyright

Description of the contents of the "epci" files

Origin of data

The data come from the Directorate General of Local Authorities (DGCL) crossed with the municipal division from the OpenStreetMap map database. These were created from the cadastre made available by the DGFiP on cadastre.gouv.fr.

Source for EPCI 2015: http://www.collectivites-locales.gouv.fr/liste-et-composition-2015

Format

These files are offered in shapefile format, in WGS84 projection with several levels of detail:

simplification at 5m

simplification at 50m

simplification to 100m

The topology is retained during the simplification process (see: http://openstreetmap.fr/blogs/cquest/administrative-simplified limits)

Content

These files contain all the EPCI contained in the DGCL file (see "Origin of data").

The following attributes are provided:

siren_epci: SIREN code assigned by INSEE to EPCI (source Min. Intérieur)

name_epci: name of EPCI (source Min. Intérieur)

ptot_epci: total population of the EPCI (source Min. Intérieur)

nb_comm: number of municipalities in the EPCI

surf_km2: EPCI area in km2 on the spheroid WGS84 (rounded per hectare before simplification)

short_name: Abbreviated name (source OSM)

wikipedia: Wikipedia article about EPCI (language code + article name, example: "en:Community of communes of Larmont")

web: EPCI website (source OSM)

osm_id: OSM relationship ID at time of export

name_osm: EPCI name in OSM

type_epci: EPCI type (OSM source)

History

20-12-2013: first generation of the file, based on the OSM municipal division at 19-12-2013

12-02-2014: second generation of the file with the EPCI 2014, and municipal division OSM of 19-12-2013

13-02-2014: addition of web, wikipedia and osm_id fields

06-03-2014: third generation of the file with the EPCI 2014 and the municipal division OSM at 06-03-2014

06-03-2014: addition of OpenStreetMap name and EPCI type

03-03-2014: fourth generation of the file with the EPCI 2015 and the municipal division OSM at 01-01-2015

Previous versions available at: http://osm13.openstreetmap.fr/~cquest/openfla/export/

For any questions regarding these exports, you can contact exports@openstreetmap.fr

See also:

Contours des EPCI 2014

Contours des communes françaises

Contours des arrondissements français

Contours des départements français et Cartes SVG des départements

Contours de régions françaises

Contours of future regions
HadUK-Grid Climate Observations by UK countries, v1.3.0.ceda (1836-2023)
catalogue.ceda.ac.uk
Updated Jul 18, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dan Hollis; Emily Carlisle; Michael Kendon; Stephen Packman; Amy Doherty (2024). HadUK-Grid Climate Observations by UK countries, v1.3.0.ceda (1836-2023) [Dataset]. https://catalogue.ceda.ac.uk/uuid/a508838f92c74005a26b9277eae59a7c
Explore at:
Dataset updated
Jul 18, 2024
Dataset provided by
Centre for Environmental Data Analysishttp://www.ceda.ac.uk/
Authors
Dan Hollis; Emily Carlisle; Michael Kendon; Stephen Packman; Amy Doherty
License
Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
Time period covered
Jan 1, 1836 - Dec 31, 2023
Area covered

Variables measured
time, region, area_type, wind_speed, air_temperature, relative_humidity, surface_temperature, duration_of_sunshine, surface_snow_binary_mask, air_pressure_at_sea_level, and 4 more
Description
HadUK-Grid is a collection of gridded climate variables derived from the network of UK land surface observations. The data have been interpolated from meteorological station data onto a uniform grid to provide complete and consistent coverage across the UK. These data at 1 km resolution have been averaged across a set of discrete geographies defining UK countries consistent with data from UKCP18 climate projections. The dataset spans the period from 1836 to 2023, but the start time is dependent on climate variable and temporal resolution.

The gridded data are produced for daily, monthly, seasonal and annual timescales, as well as long term averages for a set of climatological reference periods. Variables include air temperature (maximum, minimum and mean), precipitation, sunshine, mean sea level pressure, wind speed, relative humidity, vapour pressure, days of snow lying, and days of ground frost.

This data set supersedes the previous versions of this dataset which also superseded UKCP09 gridded observations. Subsequent versions may be released in due course and will follow the version numbering as outlined by Hollis et al. (2018, see linked documentation).

The changes for v1.3.0.ceda HadUK-Grid datasets are as follows:

Added data for calendar year 2023

Added newly digitised data for monthly sunshine 1910-1918

Added Rainfall Rescue version 2 doi:10.5281/zenodo.7554242

Updated shapefiles used for production of area average statistics https://github.com/ukcp-data/ukcp-spatial-files

Updated controlled vocabulary for metadata assignment https://github.com/ukcp-data/UKCP18_CVs

Updated assignment of timepoint for some periods so that the datetime is the middle of the period (e.g. season) rather than a fixed offset from the period start.

Updated ordering of regions within regional values files. Alphabetical ordering.

Files use netcdf level 4 compression using gzip https://www.unidata.ucar.edu/blogs/developer/entry/netcdf_compression

Net changes to the input station data used to generate this dataset:

Total of 125601744 observations

122621050 (97.6%) unchanged

26700 (0.02%) modified for this version

2953994 (2.35%) added in this version

16315 (0.01%) deleted from this version

Changes to monthly rainfall 1836-1960

Total of 4823973 observations

3315657 (68.7%) unchanged

21029 (0.4%) modified for this version

1487287 (30.8%) added in this version

11155 (0.2%) deleted from this version

The primary purpose of these data are to facilitate monitoring of UK climate and research into climate change, impacts and adaptation. The datasets have been created by the Met Office with financial support from the Department for Business, Energy and Industrial Strategy (BEIS) and Department for Environment, Food and Rural Affairs (DEFRA) in order to support the Public Weather Service Customer Group (PWSCG), the Hadley Centre Climate Programme, and the UK Climate Projections (UKCP18) project. The output from a number of data recovery activities relating to 19th and early 20th Century data have been used in the creation of this dataset, these activities were supported by: the Met Office Hadley Centre Climate Programme; the Natural Environment Research Council project "Analysis of historic drought and water scarcity in the UK"; the UK Research & Innovation (UKRI) Strategic Priorities Fund UK Climate Resilience programme; The UK Natural Environment Research Council (NERC) Public Engagement programme; the National Centre for Atmospheric Science; National Centre for Atmospheric Science and the NERC GloSAT project; and the contribution of many thousands of public volunteers. The dataset is provided under Open Government Licence.
HadUK-Grid Gridded Climate Observations on a 25km grid over the UK,...
catalogue.ceda.ac.uk
Updated Jul 24, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dan Hollis; Mark McCarthy; Michael Kendon; Tim Legg (2023). HadUK-Grid Gridded Climate Observations on a 25km grid over the UK, v1.2.0.ceda (1836-2022) [Dataset]. https://catalogue.ceda.ac.uk/uuid/0545f37fb7124df381d42468eb63c144
Explore at:
Dataset updated
Jul 24, 2023
Dataset provided by
Centre for Environmental Data Analysishttp://www.ceda.ac.uk/
Authors
Dan Hollis; Mark McCarthy; Michael Kendon; Tim Legg
License
Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
Time period covered
Jan 1, 1836 - Dec 31, 2022
Area covered

Variables measured
time, latitude, longitude, air_temperature, relative_humidity, projection_x_coordinate, projection_y_coordinate, air_pressure_at_sea_level, lwe_thickness_of_precipitation_amount
Description
HadUK-Grid is a collection of gridded climate variables derived from the network of UK land surface observations. The data have been interpolated from meteorological station data onto a uniform grid to provide complete and consistent coverage across the UK. The dataset at 25 km resolution is derived from the associated 1 km x 1 km resolution to allow for comparison to data from UKCP18 climate projections. The dataset spans the period from 1836 to 2022, but the start time is dependent on climate variable and temporal resolution.

The gridded data are produced for daily, monthly, seasonal and annual timescales, as well as long term averages for a set of climatological reference periods. Variables include air temperature (maximum, minimum and mean), precipitation, sunshine, mean sea level pressure, wind speed, relative humidity, vapour pressure, days of snow lying, and days of ground frost.

This data set supersedes the previous versions of this dataset which also superseded UKCP09 gridded observations. Subsequent versions may be released in due course and will follow the version numbering as outlined by Hollis et al. (2018, see linked documentation).

The changes for v1.2.0.ceda HadUK-Grid datasets are as follows:

Added data for calendar year 2022

Added newly digitised data for monthly sunshine 1910-1918

Added Rainfall Rescue version 2 doi:10.5281/zenodo.7554242

Updated shapefiles used for production of area average statistics https://github.com/ukcp-data/ukcp-spatial-files

Updated controlled vocabulary for metadata assignment https://github.com/ukcp-data/UKCP18_CVs

Updated assignment of timepoint for some periods so that the datetime is the middle of the period (e.g. season) rather than a fixed offset from the period start.

Updated ordering of regions within regional values files. Alphabetical ordering.

Files use netcdf level 4 compression using gzip https://www.unidata.ucar.edu/blogs/developer/entry/netcdf_compression

Net changes to the input station data used to generate this dataset:

Total of 125601744 observations

122621050 (97.6%) unchanged

26700 (0.02%) modified for this version

2953994 (2.35%) added in this version

16315 (0.01%) deleted from this version

Changes to monthly rainfall 1836-1960

Total of 4823973 observations

3315657 (68.7%) unchanged

21029 (0.4%) modified for this version

1487287 (30.8%) added in this version

11155 (0.2%) deleted from this version

The primary purpose of these data are to facilitate monitoring of UK climate and research into climate change, impacts and adaptation. The datasets have been created by the Met Office with financial support from the Department for Business, Energy and Industrial Strategy (BEIS) and Department for Environment, Food and Rural Affairs (DEFRA) in order to support the Public Weather Service Customer Group (PWSCG), the Hadley Centre Climate Programme, and the UK Climate Projections (UKCP18) project. The output from a number of data recovery activities relating to 19th and early 20th Century data have been used in the creation of this dataset, these activities were supported by: the Met Office Hadley Centre Climate Programme; the Natural Environment Research Council project "Analysis of historic drought and water scarcity in the UK"; the UK Research & Innovation (UKRI) Strategic Priorities Fund UK Climate Resilience programme; The UK Natural Environment Research Council (NERC) Public Engagement programme; the National Centre for Atmospheric Science; National Centre for Atmospheric Science and the NERC GloSAT project; and the contribution of many thousands of public volunteers. The dataset is provided under Open Government Licence.
HadUK-Grid Climate Observations by UK river basins, v1.2.0.ceda (1836-2022)
catalogue.ceda.ac.uk
Updated Jul 24, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dan Hollis; Mark McCarthy; Michael Kendon; Tim Legg (2023). HadUK-Grid Climate Observations by UK river basins, v1.2.0.ceda (1836-2022) [Dataset]. https://catalogue.ceda.ac.uk/uuid/e6822428e4124c5986b689a37fda10bc
Explore at:
Dataset updated
Jul 24, 2023
Dataset provided by
Centre for Environmental Data Analysishttp://www.ceda.ac.uk/
Authors
Dan Hollis; Mark McCarthy; Michael Kendon; Tim Legg
License
Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
Time period covered
Jan 1, 1836 - Dec 31, 2022
Area covered

Variables measured
time, region, area_type, air_temperature, surface_temperature, surface_snow_binary_mask, air_pressure_at_sea_level, surface_snow_area_fraction, water_vapor_partial_pressure_in_air, lwe_thickness_of_precipitation_amount, and 1 more
Description
HadUK-Grid is a collection of gridded climate variables derived from the network of UK land surface observations. The data have been interpolated from meteorological station data onto a uniform grid to provide complete and consistent coverage across the UK. These data at 1 km resolution have been averaged across a set of discrete geographies defining UK river basins consistent with data from UKCP18 climate projections. The dataset spans the period from 1836 to 2022, but the start time is dependent on climate variable and temporal resolution.

The gridded data are produced for daily, monthly, seasonal and annual timescales, as well as long term averages for a set of climatological reference periods. Variables include air temperature (maximum, minimum and mean), precipitation, sunshine, mean sea level pressure, wind speed, relative humidity, vapour pressure, days of snow lying, and days of ground frost.

This data set supersedes the previous versions of this dataset which also superseded UKCP09 gridded observations. Subsequent versions may be released in due course and will follow the version numbering as outlined by Hollis et al. (2018, see linked documentation).

The changes for v1.2.0.ceda HadUK-Grid datasets are as follows:

Added data for calendar year 2022

Added newly digitised data for monthly sunshine 1910-1918

Added Rainfall Rescue version 2 doi:10.5281/zenodo.7554242

Updated shapefiles used for production of area average statistics https://github.com/ukcp-data/ukcp-spatial-files

Updated controlled vocabulary for metadata assignment https://github.com/ukcp-data/UKCP18_CVs

Updated assignment of timepoint for some periods so that the datetime is the middle of the period (e.g. season) rather than a fixed offset from the period start.

Updated ordering of regions within regional values files. Alphabetical ordering.

Files use netcdf level 4 compression using gzip https://www.unidata.ucar.edu/blogs/developer/entry/netcdf_compression

Net changes to the input station data used to generate this dataset:

Total of 125601744 observations

122621050 (97.6%) unchanged

26700 (0.02%) modified for this version

2953994 (2.35%) added in this version

16315 (0.01%) deleted from this version

Changes to monthly rainfall 1836-1960

Total of 4823973 observations

3315657 (68.7%) unchanged

21029 (0.4%) modified for this version

1487287 (30.8%) added in this version

11155 (0.2%) deleted from this version

The primary purpose of these data are to facilitate monitoring of UK climate and research into climate change, impacts and adaptation. The datasets have been created by the Met Office with financial support from the Department for Business, Energy and Industrial Strategy (BEIS) and Department for Environment, Food and Rural Affairs (DEFRA) in order to support the Public Weather Service Customer Group (PWSCG), the Hadley Centre Climate Programme, and the UK Climate Projections (UKCP18) project. The output from a number of data recovery activities relating to 19th and early 20th Century data have been used in the creation of this dataset, these activities were supported by: the Met Office Hadley Centre Climate Programme; the Natural Environment Research Council project "Analysis of historic drought and water scarcity in the UK"; the UK Research & Innovation (UKRI) Strategic Priorities Fund UK Climate Resilience programme; The UK Natural Environment Research Council (NERC) Public Engagement programme; the National Centre for Atmospheric Science; National Centre for Atmospheric Science and the NERC GloSAT project; and the contribution of many thousands of public volunteers. The dataset is provided under Open Government Licence.
d
Correlation Analysis to Investigate Unconscious Mental Processes, 2018-2021...
b2find.dkrz.de
Updated Apr 26, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2023). Correlation Analysis to Investigate Unconscious Mental Processes, 2018-2021 - Dataset - B2FIND [Dataset]. https://b2find.dkrz.de/dataset/33ce5cd1-ae8b-5c30-aacc-441cf0b1fe6c
Explore at:
Dataset updated
Apr 26, 2023
Description
Data and code for Malejka et al. (2021), "Correlation analysis to investigate unconscious mental processes". The present project focused on a particular domain of this literature, implicit learning. Studies conducted in this area try to determine whether we are able to detect regularities in our environment without awareness of those regularities. Finding evidence of awareness in these domains is important because it suggests that some degree of control may be available as well. In the present project we propose new methods for the study of unconscious learning. Many of the problems that we have detected in our previous research can be ameliorated by employing cutting-edge statistical analysis, including Bayesian and meta-analytic methods and model fitting. However, the validity of these approaches in the domain of implicit cognition remains untested.A consensus among researchers is that much of our behaviour is based on rather automatic processes we are barely aware of and over which we have little control. Research suggests that exposure to subtle cues can have dramatic effects on our decisions. For instance, asking people to provide the last 2 digits of their social security number biases how much they are willing to pay for products and commodities. Similarly, according to some researchers, people are more likely to be impolite and disrespectful if they have been exposed to words related to rudeness while solving anagrams. Another line of research suggests that we take many of our (important) decisions when distracted and thinking about other things and that this 'unconscious thought' process actually improves the quality of our decisions. These studies pertain to a larger area of research usually called 'implicit cognition', which explores how unconscious mechanisms contribute to cognitive processes including perception, learning, memory, and decision making. This area of research has attracted a great deal of attention from the media and features frequently in popular science books, blogs, and documentaries. Some authors have even suggested that parts of this research could be used to improve our decisions in different domains at a societal level (for example, in health behaviour and pension planning). The present project focuses on a particular domain of this literature, implicit learning. Studies conducted in this area try to determine whether we are able to detect regularities in our environment without awareness of those regularities. In other words, these studies address whether we can learn something without realising that we are indeed learning it. In recent years there have been thousands of demonstrations of implicit learning effects in the scientific literature and, not surprisingly, this literature has become increasingly influential in all areas of psychology, with an important impact in our understanding of human cognition and psychopathology. Unfortunately, our previous research suggests that much of this evidence is undermined by fundamental methodological problems that preclude any strong conclusions about the reliability of unconscious learning effects. We have shown that many of these studies find unconscious learning because researchers use weaker methods to assess whether people are conscious of what they have learned than to assess whether learning has taken place. Naturally, this implies that learning is easily detected but awareness is not, which creates the illusion that learning has taken place unconsciously. Finding evidence of awareness in these domains is important because it suggests that some degree of control may be available as well. In the present project we propose new methods for the study of unconscious learning. Many of the problems that we have detected in our previous research can be ameliorated by employing cutting-edge statistical analysis, including Bayesian and meta-analytic methods and model fitting. However, the validity of these approaches in the domain of implicit cognition remains untested. A second goal is to conduct a large-scale exploration of the prevalence and magnitude of these problems. Our previous studies have focused on a very particular effect studied in implicit learning research ('contextual cueing'). We suspect that many of these problems transcend this domain and affect a large proportion of current studies on implicit learning. The potential impact of this assessment is difficult to overestimate. Finally, we will set up a collaboration with other international laboratories working on this topic to gather the largest and most sensitive data set of implicit learning effects available so far. This data set will be publicly available for all researchers, which will make it a fundamental resource for the study of unconscious cognitive processes for many years to come.
d
Tutkijoiden lukemisen käytännöt 2016 - Dataset - B2FIND
b2find.dkrz.de
Updated May 9, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2023). Tutkijoiden lukemisen käytännöt 2016 - Dataset - B2FIND [Dataset]. https://b2find.dkrz.de/dataset/ecfad660-bd13-5a63-ba35-9e4b50924301
Explore at:
Dataset updated
May 9, 2023
Description
Tutkimuksessa selvitettiin, miten tutkijat käyttävät erilaisia painettuja ja sähköisiä julkaisuja, kuten tieteellisiä lehtiä, artikkeleita, kirjoja, raportteja ja sosiaalista mediaa työssään. Tutkimus on osa yhdysvaltalaisten Carol Tenopirin ja Donald W. Kingin vuonna 1977 aloittamaa kyselytutkimusten sarjaa, jolla on seurattu tutkijoiden lukemisen käytäntöjä ja niissä tapahtuvia muutoksia eri tieteenaloilla ja eri maissa. Suomessa on kerätty aineistoa myös vuonna 2006, mutta sitä ei ole tallennettu Tietoarkistoon. Vuoden 2016 tutkimushanke toteutettiin osin Suomen Kulttuurirahaston apurahalla. Kyselyssä kartoitettiin tutkijoiden yleisiä lukemiskäytäntöjä sekä tieteellisen artikkelin ja muiden julkaisutyyppien julkaisemista ja lukemista. Kysyttiin myös, miten tutkijat hakevat, hankkivat, viittaavat ja julkaisevat tieteellistä tietoa. Lisäksi tiedusteltiin, kuinka paljon aikaa vastaajat käyttivät artikkelien lukemiseen, kuinka tuoreita ja minkä kielisiä julkaisut olivat sekä kuinka hyödyllisiä ne olivat työn kannalta. Sosiaalisen median merkitystä vastaajien työssä kartoitettiin kysymällä, kuinka tärkeitä eri palvelut ja välineet, kuten blogit, pilvipalvelut, akateemiset verkkoyhteisöt ja viitteidenhallintaohjelmat ovat työssä. Kysyttiin myös, miten tärkeitä elektronisten julkaisujen ominaisuuksia ovat esim. yhteensopivuus ja luettavuus eri laitteilla, mahdollisuus jakaa ja globaali kielituki. Lopuksi tiedusteltiin, miten tutkijan tieteellisen kirjallisuuden lukeminen tai luetun jakaminen on muuttunut viime vuosina ja miten se muuttuu lähivuosina. Taustamuuttujina olivat vastaajan tieteenala, ammattiasema, ikä ja työpaikan tyyppi. The study examined Finnish researchers' use of different printed and electronic publications in their work, such as scientific journals, articles, books, reports, and social media. The study is part of Carol Tenopir and Donald W. King's survey series launched in 1977 following the reading practices of researchers in different countries and scientific fields. Finnish data have also been collected in 2006 but this dataset has not been archived at the Finnish Social Science Data Archive. The 2016 project was partly funded by the Finnish Cultural Foundation. The survey charted researchers' common reading practices as well as publishing of different types of scientific articles and other publications. The way that the respondents' work time was distributed between different types of tasks was charted as well as how many publications of different types they had authored within the previous two years. It was also examined how researchers searched for information, published scientific work, and cited the work of others. Questions also covered how much time the respondents spent on reading articles, how many scientific articles and other types of publications they had read within the previous 30 days, how recent the publications that they read were, reasons for reading them, what language they were in, how they found the publications and received access, where they read the publications, which scientific field the publications represented, and how useful they considered different publication formats for their work. The significance of social media was charted with questions regarding, for instance, how important different services and tools were for their work (e.g. blogs, cloud services, institutional repositories, academic online communities, reference management software). The respondents were also asked about how important different features of electronic publications were (e.g. compatibility and readability on different devices, possibility to share publications, advanced navigation features, global language support, possibility to embed audio into publications). Background variables included scientific field, job title, age, and type of workplace. Ei-todennäköisyysotanta: itsestään muotoutunut näyteNonprobability.Availability Non-probability: AvailabilityNonprobability.Availability Itsetäytettävä lomake: verkkolomakeSelfAdministeredQuestionnaire.CAWI
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

Bar-Ilan University (2003). blog_authorship_corpus [Dataset]. https://huggingface.co/datasets/barilan/blog_authorship_corpus

blog_authorship_corpus

Blog Authorship Corpus

barilan/blog_authorship_corpus

Explore at:

4 scholarly articles cite this dataset (View in Google Scholar)

Dataset updated

Jul 27, 2003

Dataset authored and provided by

Bar-Ilan University

License

https://choosealicense.com/licenses/unknown/https://choosealicense.com/licenses/unknown/

Description

The Blog Authorship Corpus consists of the collected posts of 19,320 bloggers gathered from blogger.com in August 2004. The corpus incorporates a total of 681,288 posts and over 140 million words - or approximately 35 posts and 7250 words per person.

Each blog is presented as a separate file, the name of which indicates a blogger id# and the blogger’s self-provided gender, age, industry and astrological sign. (All are labeled for gender and age but for many, industry and/or sign is marked as unknown.)

All bloggers included in the corpus fall into one of three age groups: - 8240 "10s" blogs (ages 13-17), - 8086 "20s" blogs (ages 23-27), - 2994 "30s" blogs (ages 33-47).

For each age group there are an equal number of male and female bloggers.

Each blog in the corpus includes at least 200 occurrences of common English words. All formatting has been stripped with two exceptions. Individual posts within a single blogger are separated by the date of the following post and links within a post are denoted by the label urllink.

The corpus may be freely used for non-commercial research purposes.

Clear search

Close search

Google apps

Main menu

blog_authorship_corpus

Blog mix 2013 (2017-02-24) Bloggmix 2013 (2017-02-24) - Dataset - B2FIND

Dataset for the paper: "Monant Medical Misinformation Dataset: Mapping...

Data from: THE PRODUCTION OF PROFESSIONAL BLOGS AS REFLEXIVE TOOLS IN...

Twitter bot profiling

Datasets of word network topic model

Learning Management System

National Open Address Database (BANO) - Seine-Saint-Denis

National Open Address Database (BANO) - Val-d'Oise

hf-blog-posts-dpo_raw

Graffiti around University of Edinburgh

HadUK-Grid Gridded Climate Observations on a 12km grid over the UK,...

National Open Address Database (BANO) - Yvelines

LEGO Diorama Images

Outlines of the EPCI 2015

Description of the contents of the "epci" files

Origin of data

Format

Content

History

HadUK-Grid Climate Observations by UK countries, v1.3.0.ceda (1836-2023)

HadUK-Grid Gridded Climate Observations on a 25km grid over the UK,...

HadUK-Grid Climate Observations by UK river basins, v1.2.0.ceda (1836-2022)

Correlation Analysis to Investigate Unconscious Mental Processes, 2018-2021...

Tutkijoiden lukemisen käytännöt 2016 - Dataset - B2FIND

blog_authorship_corpusSee More Versions

Blog Authorship Corpus

barilan/blog_authorship_corpus

blog_authorship_corpus