56 datasets found
  1. d

    Population of X/Twitter users and web domains embedded in a multidimensional...

    • data.sciencespo.fr
    tsv
    Updated Mar 14, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Antoine Vendeville; Jimena Royo-Letelier; Duncan Cassells; Jean-Philippe Cointet; Maxime Crépel; Tim Faverjon; Théophile Lenoir; Béatrice Mazoyer; Benjamin Ooghe-Tabanou; Armin Pournaki; Hiroki Yamashita; Pedro Ramaciotti; Antoine Vendeville; Jimena Royo-Letelier; Duncan Cassells; Jean-Philippe Cointet; Maxime Crépel; Tim Faverjon; Théophile Lenoir; Béatrice Mazoyer; Benjamin Ooghe-Tabanou; Armin Pournaki; Hiroki Yamashita; Pedro Ramaciotti (2025). Population of X/Twitter users and web domains embedded in a multidimensional political opinion space [Dataset]. http://doi.org/10.21410/7E4/QPECFF
    Explore at:
    tsv(100846), tsv(106000433), tsv(177962), tsv(32523281), tsv(146217)Available download formats
    Dataset updated
    Mar 14, 2025
    Dataset provided by
    data.sciencespo
    Authors
    Antoine Vendeville; Jimena Royo-Letelier; Duncan Cassells; Jean-Philippe Cointet; Maxime Crépel; Tim Faverjon; Théophile Lenoir; Béatrice Mazoyer; Benjamin Ooghe-Tabanou; Armin Pournaki; Hiroki Yamashita; Pedro Ramaciotti; Antoine Vendeville; Jimena Royo-Letelier; Duncan Cassells; Jean-Philippe Cointet; Maxime Crépel; Tim Faverjon; Théophile Lenoir; Béatrice Mazoyer; Benjamin Ooghe-Tabanou; Armin Pournaki; Hiroki Yamashita; Pedro Ramaciotti
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The undertaking of several studies of political phenomena in social media mandates the operationalization of the notion of political stance of users and contents involved. Relevant examples include the study of segregation and polarization online, the study of political diversity in content diets in social media, or AI explainability. While many research designs rely on operationalizations best suited for the US setting, few allow addressing more general design, in which users and content might take stances on multiple ideology and issue dimensions, going beyond traditional Liberal-Conservative or Left-Right scales. To advance the study of more general online ecosystems, we present a dataset of X/Twitter population of users in the French political Twittersphere and web domains embedded in a political space spanned by dimensions measuring attitudes towards immigration, the EU, liberal values, elites and institutions, nationalism and the environment. We provide several benchmarks validating the positions of these entities (based on both, LLM and human annotations), and discuss several applications for this dataset.

  2. Twitter Dataset

    • brightdata.com
    .json, .csv, .xlsx
    Updated Dec 23, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bright Data (2024). Twitter Dataset [Dataset]. https://brightdata.com/products/datasets/twitter
    Explore at:
    .json, .csv, .xlsxAvailable download formats
    Dataset updated
    Dec 23, 2024
    Dataset authored and provided by
    Bright Datahttps://brightdata.com/
    License

    https://brightdata.com/licensehttps://brightdata.com/license

    Area covered
    Worldwide
    Description

    Utilize our Twitter dataset for diverse applications to enrich business strategies and market insights. Analyzing this dataset provides a comprehensive understanding of social media trends, empowering organizations to refine their communication and marketing strategies. Access the entire dataset or customize a subset to fit your needs. Popular use cases include market research to identify trending topics and hashtags, AI training by reviewing factors such as tweet content, retweets, and user interactions for predictive analytics, and trend forecasting by examining correlations between specific themes and user engagement to uncover emerging social media preferences.

  3. Open dataset of scholars on Twitter (X)

    • zenodo.org
    csv
    Updated Apr 2, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Philippe Mongeon; Philippe Mongeon; Timothy Bowman; Timothy Bowman; Rodrigo Costas; Rodrigo Costas; Wenceslao Arroyo Machado; Wenceslao Arroyo Machado (2024). Open dataset of scholars on Twitter (X) [Dataset]. http://doi.org/10.5281/zenodo.10905839
    Explore at:
    csvAvailable download formats
    Dataset updated
    Apr 2, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Philippe Mongeon; Philippe Mongeon; Timothy Bowman; Timothy Bowman; Rodrigo Costas; Rodrigo Costas; Wenceslao Arroyo Machado; Wenceslao Arroyo Machado
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    This is a version 2 dataset of paired OpenAlex author IDs (https://docs.openalex.org/about-the-data/author) and Twitter (now X) user IDs

    Major update in this version

    Following the significant update to OpenAlex's author identification system, the scholars on Twitter dataset, which previously linked Twitter IDs to OpenAlex author IDs, immediately became outdated. This called for a new approach to re-establish these links, as the absence of new Twitter data made it impossible to replicate the original method of matching Twitter profiles with scholarly authors. To navigate this challenge, a bridge was constructed between the June 2022 snapshot of the OpenAlex database—used in the original matching process—and the most recent snapshot from February 2024. This bridge utilized OpenAlex works IDs and DOIs to match authors in both datasets by their shared publications and identical primary names. When a connection was established between two authors with the same name, the new OpenAlex author ID was assigned to the corresponding Twitter ID. When direct matches based on primary names were not found, an attempt was made to establish connections by matching the names from June 2022 with any corresponding alternative names found in the 2024 dataset. This method ensured continuity of identity through the system update, adapting the strategy to link profiles across the temporal divide created by the database's overhaul.

    Our efficient method for re-establishing links between author IDs and Twitter profiles has been notably successful, managing to rematch 432,417 (88%) OpenAlex author IDs. This effort successfully restored connections for 388,968 unique Twitter users, which represents 92% of the original dataset. Of these, 375,316 were matched using their primary names, and 57,101 through alternative names. The simplicity and quick execution of this approach led to exceptionally favourable results, with a minimal loss of only 8% of the original Twitter-linked scholarly accounts.

    The dataset includes 432,417 unique author_ids and 388,968 unique tweeter_ids forming 462,427 unique author-tweeter pairs.

    File descriptions

    • authors_tweeters_2024_02.csv is the actual dataset of author IDs paired with tweeter IDs. The "alternative" column indicates if the match was made with the primary name (0) or an alternate name (1).
    • mapping_tweeters_2022_2024.csv contains the relationship made between the 2022 author IDs and the 2024 author IDs, including the names.

    How to cite

    When using the dataset, please cite the following article providing details about the matching process:

    Mongeon, P., Bowman, T. D., & Costas, R. (2023). An open data set of scholars on Twitter. Quantitative Science Studies, 1–11.
  4. Twitter users in the United States 2019-2028

    • statista.com
    • ai-chatbox.pro
    Updated Jul 31, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista Research Department (2025). Twitter users in the United States 2019-2028 [Dataset]. https://www.statista.com/topics/3196/social-media-usage-in-the-united-states/
    Explore at:
    Dataset updated
    Jul 31, 2025
    Dataset provided by
    Statistahttp://statista.com/
    Authors
    Statista Research Department
    Area covered
    United States
    Description

    The number of Twitter users in the United States was forecast to continuously increase between 2024 and 2028 by in total 4.3 million users (+5.32 percent). After the ninth consecutive increasing year, the Twitter user base is estimated to reach 85.08 million users and therefore a new peak in 2028. Notably, the number of Twitter users of was continuously increasing over the past years.User figures, shown here regarding the platform twitter, have been estimated by taking into account company filings or press material, secondary research, app downloads and traffic data. They refer to the average monthly active users over the period.The shown data are an excerpt of Statista's Key Market Indicators (KMI). The KMI are a collection of primary and secondary indicators on the macro-economic, demographic and technological environment in up to 150 countries and regions worldwide. All indicators are sourced from international and national statistical offices, trade associations and the trade press and they are processed to generate comparable data sets (see supplementary notes under details for more information).Find more key insights for the number of Twitter users in countries like Canada and Mexico.

  5. Master X-Ray Catalog - Dataset - NASA Open Data Portal

    • data.nasa.gov
    • data.staging.idas-ds1.appdat.jsc.nasa.gov
    Updated Apr 1, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    nasa.gov (2025). Master X-Ray Catalog - Dataset - NASA Open Data Portal [Dataset]. https://data.nasa.gov/dataset/master-x-ray-catalog
    Explore at:
    Dataset updated
    Apr 1, 2025
    Dataset provided by
    NASAhttp://nasa.gov/
    Description

    The XRAY database table contains selected parameters from almost all HEASARC X-ray catalogs that have source positions located to better than a few arcminutes. The XRAY database table was created by copying all of the entries and common parameters from the tables listed in the Component Tables section. The XRAY database table has many entries but relatively few parameters; it provides users with general information about X-ray sources, obtained from a variety of catalogs. XRAY is especially suitable for cone searches and cross-correlations with other databases. Each entry in XRAY has a parameter called 'database_table' which indicates from which original database the entry was copied; users can browse that original table should they wish to examine all of the parameter fields for a particular entry. For some entries in XRAY, some of the parameter fields may be blank (or have zero values); this indicates that the original database table did not contain that particular parameter or that it had this same value there. The HEASARC in certain instances has included X-ray sources for which the quoted value for the specified band is an upper limit rather than a detection. The HEASARC recommends that the user should always check the original tables to get the complete information about the properties of the sources listed in the XRAY master source list. This master catalog is updated periodically whenever one of the component database tables is modified or a new component database table is added. This is a service provided by NASA HEASARC .

  6. u

    Annual Respondents Database X, 1997-2020: Secure Access

    • beta.ukdataservice.ac.uk
    Updated 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Office For National Statistics (ONS) (2024). Annual Respondents Database X, 1997-2020: Secure Access [Dataset]. http://doi.org/10.5255/ukda-sn-7989-5
    Explore at:
    Dataset updated
    2024
    Dataset provided by
    Office for National Statistics
    datacite
    Authors
    Office For National Statistics (ONS)
    Description

    The Annual Respondents Database X (ARDx) has been created to allow users of Annual Respondents Database (ARD) (held at the UK Data Archive under SN 6644) to continue analysis even though the Annual Business Inquiry (ABI) which was used to create ARD ceased in 2008. ARDx contains harmonised variables from 1997 to 2020.

    ARDx is created from two ONS surveys, the Annual Business Inquiry (ABI; 1998-2008, held at the UK Data Archive under SN 6644) and the Annual Business Survey (ABS; 2009 onwards, held at the UK Data Archive under SN 7451). The ABI has an employment survey (ABI1) and a second survey for financial information (ABI2). ABS only collects financial data, and so is supplemented with employment data from the Business Register and Employment Survey (BRES; 2009 onwards, held at the UK Data Archive under SN 7463).

    ARDx consists of six types of files: 'respondent files' which have reported and derived information from survey questionnaire responses; and 'universe files' which contain limited information on all business that are within scope of the ABI/ABS. These files are provided at both the Reporting Unit and Local Unit levels. There are also 'register panel' and "capital stock" files.

    Linking to other business studies
    These data contain Inter-Departmental Business Register (IDBR) reference numbers. These are anonymous but unique reference numbers assigned to business organisations. Their inclusion allows researchers to combine different business survey sources together. Researchers may consider applying for other business data to assist their research.

    For the fifth edition (December 2023), ARDx Version 4.0 for 1997-2020 has been provided, replacing Version 3. Coverage has thus been expanded to include 1997 and 2015-2020.

    Note to users
    Due to the limited nature of the documentation available for ARDx, users are advised to consult the documentation for the Annual Business Survey (UK Data Archive SN 7451) for detailed information about the data.

    For Secure Lab projects applying for access to this study as well as to SN 6697 Business Structure Database and/or SN 7683 Business Structure Database Longitudinal, only postcode-free versions of the data will be made available.

  7. c

    Master X-Ray Catalog

    • s.cnmilf.com
    • catalog.data.gov
    Updated Jun 28, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    High Energy Astrophysics Science Archive Research Center (2025). Master X-Ray Catalog [Dataset]. https://s.cnmilf.com/user74170196/https/catalog.data.gov/dataset/master-x-ray-catalog
    Explore at:
    Dataset updated
    Jun 28, 2025
    Dataset provided by
    High Energy Astrophysics Science Archive Research Center
    Description

    The XRAY database table contains selected parameters from almost all HEASARC X-ray catalogs that have source positions located to better than a few arcminutes. The XRAY database table was created by copying all of the entries and common parameters from the tables listed in the Component Tables section. The XRAY database table has many entries but relatively few parameters; it provides users with general information about X-ray sources, obtained from a variety of catalogs. XRAY is especially suitable for cone searches and cross-correlations with other databases. Each entry in XRAY has a parameter called 'database_table' which indicates from which original database the entry was copied; users can browse that original table should they wish to examine all of the parameter fields for a particular entry. For some entries in XRAY, some of the parameter fields may be blank (or have zero values); this indicates that the original database table did not contain that particular parameter or that it had this same value there. The HEASARC in certain instances has included X-ray sources for which the quoted value for the specified band is an upper limit rather than a detection. The HEASARC recommends that the user should always check the original tables to get the complete information about the properties of the sources listed in the XRAY master source list. This master catalog is updated periodically whenever one of the component database tables is modified or a new component database table is added. This is a service provided by NASA HEASARC .

  8. CANDID-II Dataset

    • figshare.com
    png
    Updated Jun 27, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sijing Feng (2025). CANDID-II Dataset [Dataset]. http://doi.org/10.17608/k6.auckland.19606921.v2
    Explore at:
    pngAvailable download formats
    Dataset updated
    Jun 27, 2025
    Dataset provided by
    figshare
    Authors
    Sijing Feng
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    53,054 anonymized adult chest x-ray dataset in 1024 x 1024 pixel DICOM format with corresponding anonymized free-text reports from Dunedin Hospital, New Zealand between 2010 - 2020. Corresponding radiology reports generated by FRANZCR radiologists were manually annotated for 46 common radiological findings mapped to Unified Medical Language System (UMLS) and RadLex ontology. Each of the multiclassification annotations contains 4 types of labels, namely positive, uncertain, negative and not mentioned. In the provided dataset, image filenames contain patient index (enabling analysis requiring grouping of images by patients), as well as anonymized date of acquisition information where the temporal relationship between images is preserved. This dataset can be used for training and testing for deep learning algorithms for adult chest x rays.Unfortunately, since Feb 2024, the New Zealand government is changing the data governance on datasets used for AI development and this affects the process of how the CANDID II dataset is to be accessed by the external users. Therefore, the CANDID II dataset is not available for access by users outside Health New Zealand. Further notice of access will be updated here should access by external users be reopened.

  9. Measurement-based MIMO channel model at 140GHz

    • zenodo.org
    zip
    Updated Apr 6, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mar Francis de Guzman; Katsuyuki Haneda; Pekka Kyösti; Mar Francis de Guzman; Katsuyuki Haneda; Pekka Kyösti (2024). Measurement-based MIMO channel model at 140GHz [Dataset]. http://doi.org/10.5281/zenodo.7640353
    Explore at:
    zipAvailable download formats
    Dataset updated
    Apr 6, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Mar Francis de Guzman; Katsuyuki Haneda; Pekka Kyösti; Mar Francis de Guzman; Katsuyuki Haneda; Pekka Kyösti
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    1. Introduction

    The file “gen_dd_channel.zip” is a package of a wideband multiple-input multiple-output (MIMO) stored radio channel model at 140 GHz in indoor hall, outdoor suburban, residential and urban scenarios. The package consists of 1) measured wideband double-directional multipath data sets estimated from radio channel sounding and processed through measurement-based ray-launching and 2) MATLAB code sets that allows users to generate wideband MIMO radio channels with various antenna array types, e.g., uniform planar and circular arrays at link ends.

    2. What does this package do?

    Outputs of the channel model

    The MATLAB file “ChannelGeneratorDD_hexax.m” gives the following variables, among others. The .m file also gives optional figures illustrating antennas and radio channel responses.

    Variables

    Descriptions

    CIR

    MIMO channel impulse responses

    CFR

    MIMO channel frequency responses

    Inputs to the channel model

    In order for the MATLAB file “ChannelGeneratorDD_hexax.m” to run properly, the following inputs are required.

    Directory

    Descriptions

    data_030123_double_directional_paths

    Double-directional multipath data, measured and complemented by ray-launching tool, for various cellular sites.

    User’s parameters

    When using “ChannelGeneratorDD_hexax.m”, the following choices are available.

    Features

    Choices

    Channel model types for transfer function generation

    • 'snapshot': single time sample per link = static, random phase for each path, amplitude from measurements

    • 'virtualMotion': Doppler shifts & temporal fading, static propagation parameters, random phase for each path, amplitude from measurements, Doppler frequency per path from AoA and velocity vector

    Antenna / beam shapes

    • 'single3GPP': single antenna element with power pattern shape defined in 3GPP, adjustable HPBW etc.

    • 'URA': uniform rectangular array, omni-directional elements

    • 'UCA': uniform circular array, omni-directional elements

    List of files in the dataset

    MATLAB codes that implement the channel model

    The MATLAB files consist of the following files.

    File and directory names

    Descriptions

    readme_100223.txt

    Readme file; please read it before using the files

    ChannelGeneratorDD_hexax.m

    Main code to run; a code to integrate antenna arrays and double-directional path data to derive MIMO radio channels. No need to see/edit other files.

    gen_pathDD.m, randl.m, randLoc.m

    Sub-routines used in ChannelGeneratorDD_hexax.m; no need of modifications.

    Hexa-X channel generator DD_presentation.pdf

    User manual of ChannelGeneratorDD_hexax.m.

    Measured multipath data

    The directory "data_030123_double_directional_paths" in the package contains the following files.

    Filenames

    Descriptions

    readme_100223.txt

    Readme file; please read it before using the files

    RTdata_[scenario]_[date].mat

    Containing double-directional multipath parameters at 140 GHz in the specified scenario, estimated from radio channel sounding and ray-tracing.

    description_of_data_dd_[scenario].pdf

    Explaining data formats, the measurement site and sample results.

    References

    Details of the data set are available in the following two documents:

    The stored channel models

    A. Nimr (ed.), "Hexa-X Deliverable D2.3 Radio models and enabling techniques towards ultra-high data rate links and capacity in 6G," April 2023, available: https://hexa-x.eu/deliverables/

    @misc{Hexa-XD23,
    author = {{A. Nimr (ed.)}},
    title = {{Hexa-X Deliverable D2.3 Radio models and enabling techniques towards ultra-high data rate links and capacity in 6G}},
    year = {2023},
    month = {Apr.},
    howpublished = {https://hexa-x.eu/deliverables/},
    }

    Derivation of the data, i.e., radio channel sounding and measurement-based ray-launching

    M. F. De Guzman and K. Haneda, "Analysis of wave-interacting objects in indoor and outdoor environments at 142 GHz," IEEE Transactions on Antennas and Propagation, vol. 71, no. 12, pp. 9838-9848, Dec. 2023, doi: 10.1109/TAP.2023.3318861

    @ARTICLE{DeGuzman23_TAP,
    author={De Guzman, Mar Francis and Haneda, Katsuyuki},
    journal={IEEE Transactions on Antennas and Propagation},
    title={Analysis of Wave-Interacting Objects in Indoor and Outdoor Environments at 142 {GHz}},
    year={2023},
    volume={71},
    number={12},
    pages={9838-9848},
    }

    Finally, the code “randl.m” are from the following MATLAB Central File Exchange.

    Hristo Zhivomirov (2023). Generation of Random Numbers with Laplace Distribution (https://www.mathworks.com/matlabcentral/fileexchange/53397-generation-of-random-numbers-with-laplace-distribution), MATLAB Central File Exchange. Retrieved February 15, 2023.

    Data usage terms

    Any usage of the data must be upon consent on the following conditions:

    • The file “ChannelGeneratorDD_hexax.m” is owned by OUL. Contact: Dr. Pekka Kyösti, Pekka.Kyosti@oulu.fi.
    • The other files and those in the directories, except for “randl.m”, are owned by AAU. Contact: Mr. Mar Francis de Guzman, francis.deguzman@aalto.fi.
    • When a scientific paper is published that exploits the data and code, please cite this data set; the citation can be downloaded from the zenodo page of this data set.
  10. Twitter dataset about Information Operations in Honduras and UAE

    • zenodo.org
    Updated Oct 10, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Lorenzo Cima; Lorenzo Cima; Lorenzo Mannocci; Lorenzo Mannocci; Marco Avvenuti; Marco Avvenuti; MAURIZIO TESCONI; MAURIZIO TESCONI; Stefano Cresci; Stefano Cresci (2024). Twitter dataset about Information Operations in Honduras and UAE [Dataset]. http://doi.org/10.5281/zenodo.13912659
    Explore at:
    bin, application/x-troff-meAvailable download formats
    Dataset updated
    Oct 10, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Lorenzo Cima; Lorenzo Cima; Lorenzo Mannocci; Lorenzo Mannocci; Marco Avvenuti; Marco Avvenuti; MAURIZIO TESCONI; MAURIZIO TESCONI; Stefano Cresci; Stefano Cresci
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    United Arab Emirates, Honduras
    Description

    Dataset concerning coordinated behaviour in Information Operations in Honduras and United Arab Emirates, consisting of two parts:

    • malicious tweets, provided by Twitter/X Moderation Research Consortium (TMRC), concerning well-known Information Operations (IOs).
    • genuine enriching tweets, recovered using Twitter/X search APIs with Academic Elevated Access. Those tweets were published by "genuine" users (i.e. users not into the malicious dataset) and concerned the main topics of the IOs

    This dataset allows to explore meaningful patterns of coordination which could distinguish conversations with malicious intent from genuine conversations.

    • 1,2M malicious or genuine tweets about the Honduras IO, shared between 11 September 2019 and 8 January 2020
    • 2,8M malicious or genuine tweets about the UAE IO, shared between 27 January 2019 and 26 May 2019
  11. f

    TWITTER UKR-PL DATABASE

    • figshare.com
    xlsx
    Updated Apr 8, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Tomasz Piróg; Rafał Olszowski; Piotr Pięta; Tomasz Masłyk (2025). TWITTER UKR-PL DATABASE [Dataset]. http://doi.org/10.6084/m9.figshare.28751420.v2
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Apr 8, 2025
    Dataset provided by
    figshare
    Authors
    Tomasz Piróg; Rafał Olszowski; Piotr Pięta; Tomasz Masłyk
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Ukraine
    Description

    The dataset contains posts from the X/Tweeter platform. Extraction of unstructured data from X/Twitter has been performed using R scripts through the Application Programming Interface (API) v2 for Academic Research, which enabled researchers to retrieve posts from the entire X/Twitter archive. At the time the data was collected, access to the Twitter API for Academic Research was still possible, but was restricted after the company changed its policy in February 2023. The post selection criteria were (i) posts published in the Polish language, (ii) posts containing the keywords “Ukraińcy” (“Ukrainians”), “w Polsce” (“in Poland”), and (iii) posts that were published between 22 February 2022 (12:00 a.m. CET) and 31 December 2022 (11:59 p.m. CET). The time frame selected for this study is related to the date when the Russian Federation invaded Ukraine and the closing date of the first calendar year of the conflict. The X/Twitter users included in the data analysis were those who sent posts with the above-mentioned characteristics during the pre-defined period. Unverified users were also included, as one of the objectives of the study was to analyse message dissemination. A total of 55,035 posts (original content), reposts (forwarded content), and replies (discussions among users) were collected. These were then extracted, and imported into NodeXL software, which is a professional tool for analysing social media, used in many research projects.Rows are posts. Columns are variables, described as:Vertex1: Author of the postVertex2: Target of the interaction (the user whose tweet is being retweeted or replied to)Relationship: Type of interaction (Tweet, Retweet, Reply)Relationship Date: Time of post publicationTweet: Content of the postRetweet Count: Number of repostsFavorite Count: Number of likesReply Count: Number of replies to the postQuote Count: Number of times the post was quotedHashtags in Tweet: Hashtags included in the post (if any)URLs in Tweet: URLs included in the post (if any)Domains in Tweet: Referenced domains (if any)Mentions in Tweet: Mentions in the post (if any)Media in Tweet: Referenced media (if any)Media Type: Type of referenced media (if applicable)Twitter Page for Tweet: Link to the webpage with the source tweetTweet Date (UTC): Date of original tweet publicationTweet Image File: Avatar of the post's authorImported ID: ID of the post after importConversation ID: Thread ID to which the post is assignedIn Reply To Tweet ID: ID of the post to which this post is a reply (if applicable)Quoted Status ID: ID of the post that was quoted (if applicable)Retweet ID: ID of the repostAuthor ID: ID of the post's authorVertex1 Group: User group to which the post's author was assigned through clusteringVertex2 Group: User group to which the recipient of the post was assigned through clustering

  12. E

    ECMWF ERA-15: regular 2.5 x 2.5 degree gridded data

    • catalogue.ceda.ac.uk
    • data-search.nerc.ac.uk
    Updated Sep 11, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    European Centre for Medium-Range Weather Forecasts (ECMWF) (2024). ECMWF ERA-15: regular 2.5 x 2.5 degree gridded data [Dataset]. https://catalogue.ceda.ac.uk/uuid/248c4cc51507422d81df105a4770245c
    Explore at:
    Dataset updated
    Sep 11, 2024
    Dataset provided by
    NCAS British Atmospheric Data Centre (NCAS BADC)
    Authors
    European Centre for Medium-Range Weather Forecasts (ECMWF)
    License

    https://artefacts.ceda.ac.uk/licences/specific_licences/ecmwf-era-products.pdfhttps://artefacts.ceda.ac.uk/licences/specific_licences/ecmwf-era-products.pdf

    Time period covered
    Jan 1, 1979 - Feb 28, 1994
    Area covered
    Earth
    Variables measured
    Ice, Land, Sea Mask, U-Stress, V-Stress, Frequency, Heat Flux, Divergence, Snow Depth, Wind Shear, and 88 more
    Description

    The European Centre for Medium-Range Weather Forecasts (ECMWF) has provided global atmospheric analyses from its archive for many years. The ERA-15 Re-analysis project was devised in response to wishes expressed by many users for a data set generated by a modern, consistent, and invariant data assimilation system. The ERA-15 project produced a long time-series (January 1979 - February 1994) of consistent meteorological analyses using a single version of the ECMWF model.

    This dataset contains regular 2.5 degree x 2.5 degree gridded data on standard pressure levels and at the surface.

  13. e

    X-ray data for 56 protoplanetary disk sources - Dataset - B2FIND

    • b2find.eudat.eu
    Updated May 18, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2021). X-ray data for 56 protoplanetary disk sources - Dataset - B2FIND [Dataset]. https://b2find.eudat.eu/dataset/b57e046a-ec73-5b77-a259-5f9cc34e9ba9
    Explore at:
    Dataset updated
    May 18, 2021
    Description

    Consistent modeling of protoplanetary disks requires the simultaneous solution of both continuum and line radiative transfer, heating and cooling balance between dust and gas and, of course, chemistry. Such models depend on panchromatic observations that can provide a complete description of the physical and chemical properties and energy balance of protoplanetary systems. Along these lines we present a homogeneous, panchromatic collection of data on a sample of 85 T Tauri and Herbig Ae objects for which data cover a range from X-rays to centimeter wavelengths. Datasets consist of photometric measurements, spectra, along with results from the data analysis such as line fluxes from atomic and molecular transitions. Additional properties resulting from modeling of the sources such as disc mass and shape parameters, dust size and PAH properties are also provided for completeness. The purpose of this data collection is to provide a solid base that can enable consistent modeling of the properties of protoplan- etary disks. To this end, we performed an unbiased collection of publicly available data that were combined to homogeneous datasets adopting consistent criteria. Targets were selected based on both their properties but also on the availability of data. Data from more than 50 different telescopes and facilities were retrieved and combined in homogeneous datasets directly from public data archives or after being extracted from more than 100 published articles. X-ray data for a subset of 56 sources represent an exception as they were reduced from scratch and are presented here for the first time. Compiled datasets along with a subset of continuum and emission-line models are stored in a dedicated database and distributed through a publicly accessible online system. All datasets contain metadata descriptors that allow to backtrack them to their original resources. The graphical user interface of the online system allows the user to visually inspect individual objects but also compare between datasets and models. It also offers to the user the possibility to download any of the stored data and metadata for further processing.

  14. Z

    A study on real graphs of fake news spreading on Twitter

    • data.niaid.nih.gov
    Updated Aug 20, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Amirhosein Bodaghi (2021). A study on real graphs of fake news spreading on Twitter [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_3711599
    Explore at:
    Dataset updated
    Aug 20, 2021
    Dataset authored and provided by
    Amirhosein Bodaghi
    Description

    *** Fake News on Twitter ***

    These 5 datasets are the results of an empirical study on the spreading process of newly fake news on Twitter. Particularly, we have focused on those fake news which have given rise to a truth spreading simultaneously against them. The story of each fake news is as follow:

    1- FN1: A Muslim waitress refused to seat a church group at a restaurant, claiming "religious freedom" allowed her to do so.

    2- FN2: Actor Denzel Washington said electing President Trump saved the U.S. from becoming an "Orwellian police state."

    3- FN3: Joy Behar of "The View" sent a crass tweet about a fatal fire in Trump Tower.

    4- FN4: The animated children's program 'VeggieTales' introduced a cannabis character in August 2018.

    5- FN5: In September 2018, the University of Alabama football program ended its uniform contract with Nike, in response to Nike's endorsement deal with Colin Kaepernick.

    The data collection has been done in two stages that each provided a new dataset: 1- attaining Dataset of Diffusion (DD) that includes information of fake news/truth tweets and retweets 2- Query of neighbors for spreaders of tweets that provides us with Dataset of Graph (DG).

    DD

    DD for each fake news story is an excel file, named FNx_DD where x is the number of fake news, and has the following structure:

    The structure of excel files for each dataset is as follow:

    Each row belongs to one captured tweet/retweet related to the rumor, and each column of the dataset presents a specific information about the tweet/retweet. These columns from left to right present the following information about the tweet/retweet:

    User ID (user who has posted the current tweet/retweet)

    The description sentence in the profile of the user who has published the tweet/retweet

    The number of published tweet/retweet by the user at the time of posting the current tweet/retweet

    Date and time of creation of the account by which the current tweet/retweet has been posted

    Language of the tweet/retweet

    Number of followers

    Number of followings (friends)

    Date and time of posting the current tweet/retweet

    Number of like (favorite) the current tweet had been acquired before crawling it

    Number of times the current tweet had been retweeted before crawling it

    Is there any other tweet inside of the current tweet/retweet (for example this happens when the current tweet is a quote or reply or retweet)

    The source (OS) of device by which the current tweet/retweet was posted

    Tweet/Retweet ID

    Retweet ID (if the post is a retweet then this feature gives the ID of the tweet that is retweeted by the current post)

    Quote ID (if the post is a quote then this feature gives the ID of the tweet that is quoted by the current post)

    Reply ID (if the post is a reply then this feature gives the ID of the tweet that is replied by the current post)

    Frequency of tweet occurrences which means the number of times the current tweet is repeated in the dataset (for example the number of times that a tweet exists in the dataset in the form of retweet posted by others)

    State of the tweet which can be one of the following forms (achieved by an agreement between the annotators):

    r : The tweet/retweet is a fake news post

    a : The tweet/retweet is a truth post

    q : The tweet/retweet is a question about the fake news, however neither confirm nor deny it

    n : The tweet/retweet is not related to the fake news (even though it contains the queries related to the rumor, but does not refer to the given fake news)

    DG

    DG for each fake news contains two files:

    A file in graph format (.graph) which includes the information of graph such as who is linked to whom. (This file named FNx_DG.graph, where x is the number of fake news)

    A file in Jsonl format (.jsonl) which includes the real user IDs of nodes in the graph file. (This file named FNx_Labels.jsonl, where x is the number of fake news)

    Because in the graph file, the label of each node is the number of its entrance in the graph. For example if node with user ID 12345637 be the first node which has been entered into the graph file then its label in the graph is 0 and its real ID (12345637) would be at the row number 1 (because the row number 0 belongs to column labels) in the jsonl file and so on other node IDs would be at the next rows of the file (each row corresponds to 1 user id). Therefore, if we want to know for example what the user id of node 200 (labeled 200 in the graph) is, then in jsonl file we should look at row number 202.

    The user IDs of spreaders in DG (those who have had a post in DD) would be available in DD to get extra information about them and their tweet/retweet. The other user IDs in DG are the neighbors of these spreaders and might not exist in DD.

  15. Twitter users worldwide 2019-2028

    • statista.com
    • ai-chatbox.pro
    Updated Dec 10, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista Research Department (2024). Twitter users worldwide 2019-2028 [Dataset]. https://www.statista.com/topics/2297/twitter-marketing/
    Explore at:
    Dataset updated
    Dec 10, 2024
    Dataset provided by
    Statistahttp://statista.com/
    Authors
    Statista Research Department
    Description

    The global number of Twitter users in was forecast to continuously increase between 2024 and 2028 by in total 74.3 million users (+17.32 percent). After the ninth consecutive increasing year, the Twitter user base is estimated to reach 503.42 million users and therefore a new peak in 2028. Notably, the number of Twitter users of was continuously increasing over the past years.User figures, shown here regarding the platform twitter, have been estimated by taking into account company filings or press material, secondary research, app downloads and traffic data. They refer to the average monthly active users over the period.The shown data are an excerpt of Statista's Key Market Indicators (KMI). The KMI are a collection of primary and secondary indicators on the macro-economic, demographic and technological environment in up to 150 countries and regions worldwide. All indicators are sourced from international and national statistical offices, trade associations and the trade press and they are processed to generate comparable data sets (see supplementary notes under details for more information).Find more key insights for the number of Twitter users in countries like South America and the Americas.

  16. Diatom algae colorized Dataset

    • kaggle.com
    Updated May 27, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Medi Hunter - 4004 (2025). Diatom algae colorized Dataset [Dataset]. http://doi.org/10.34740/kaggle/dsv/11970637
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    May 27, 2025
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Medi Hunter - 4004
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    #Raw Data, Source, More Information :: https://www.kaggle.com/datasets/huseyingunduz/diatom-dataset?select=images Citation @article{gunduz2022, title={Segmentation of diatoms using edge detection and deep learning}, volume={30}, DOI={10.55730/1300-0632.3938}, number={6}, journal={Turkish Journal of Electrical Engineering & Computer Sciences}, author={Gunduz, Huseyin and Solak, Cuneyt Nadir and Gunal, Serkan}, year={2022}, pages={ 2268–2285}} Diatoms are a group of algae found in oceans, freshwater, moist soils, and surfaces. They are one of the most common phytoplankton species found in nature. There are more than 200 genera of diatoms, as well as about 200,000 species. They produce approximately 20-25% of the oxygen on the planet.

    Accurate detection, segmentation and classification of diatoms is very important, especially in terms of determining water quality and ecological change. https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F25409507%2F0b140a77bdc1e8b3955453f9eb60a294%2F1049_10.jpg?generation=1748347264264824&alt=media" alt=""> https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F25409507%2F5dfebe496555e0d44f88a323020b5c29%2F1435_11.jpg?generation=1748347286426827&alt=media" alt=""> https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F25409507%2F52b9bac2244c46778d4e7a5680d5db9b%2F1057_8.jpg?generation=1748347333663011&alt=media" alt=""> Colorized Data Processing Techniques for Medical Imaging

    Medical images like CT scans and X-rays are typically grayscale, making subtle anatomical or pathological differences harder to distinguish. The following image processing and enhancement techniques are used to colorize and improve visual interpretation for diagnostics, training, or AI preprocessing.

    🔷 1. 3D_Rendering Renders medical image volumes into three-dimensional visualizations. Though often grayscale, color can be applied to different tissue types or densities to enhance spatial understanding. Useful in surgical planning or tumor visualization.

    🔷 2. 3D_Volume_Rendering An advanced visualization technique that projects 3D image volumes with transparency and color blending, simulating how light passes through tissue. Color helps distinguish internal structures like organs, vessels, or tumors.

    🔷 3. Adaptive Histogram Equalization (AHE) Enhances contrast locally within the image, especially in low-contrast regions. When colorized, different intensities are mapped to distinct hues, improving visibility of fine-grained details like soft tissues or lesions.

    🔷 4. Alpha Blending A layering technique that combines multiple images (e.g., CT + annotation masks) with transparency. Colors represent different modalities or regions of interest, providing composite visual cues for diagnosis.

    🔷 5. Basic Color Map Applies a standard color palette (like Jet or Viridis) to grayscale data. Different intensities are mapped to different colors, enhancing the visual discrimination of anatomical or pathological regions in the image.

    🔷 6. Contrast Stretching Expands the grayscale range to improve brightness and contrast. When combined with color mapping, tissues with similar intensities become visually distinct, aiding in tasks like bone vs. soft tissue separation.

    🔷 7. Edge Detection Extracts and overlays object boundaries (e.g., organ or lesion outlines) on the original scan. Edge maps are typically colorized (e.g., green or red) to highlight anatomical structures or abnormalities clearly.

    🔷 8. Gamma Correction Adjusts image brightness non-linearly. Color can be used to highlight underexposed or overexposed regions, often revealing soft tissue structures otherwise hidden in raw grayscale CT/X-ray images.

    🔷 9. Gaussian Blur Smooths image noise and details. When visualized with color overlays (e.g., before vs. after), it helps assess denoising effectiveness. It is also used in segmentation preprocessing to reduce edge artifacts.

    🔷 10. Heatmap Visualization Encodes intensity or prediction confidence into a heatmap overlay (e.g., red for high activity). Common in AI-assisted diagnosis to localize tumors, fractures, or infections, layered over the original grayscale image.

    🔷 11. Interactive Segmentation A semi-automated method to extract regions of interest with user input. Segmented areas are color-coded (e.g., tumor = red, background = blue) for immediate visual confirmation and further analysis.

    🔷 12. LUT (Lookup Table) Color Map Maps grayscale values to custom color palettes using a lookup table. This enhances contrast and emphasizes certain intensity ranges (e.g., blood vessels vs. bone), improving interpretability for radiologists.

    🔷 13. Random Color Palette Applies random but consistent colors to segmented regions or labels. Common in datasets with multiple classes (e.g., liver, spleen, kidneys), it helps in v...

  17. Data from: CANDID-PTX

    • figshare.com
    png
    Updated Jun 27, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sijing Feng; Damian Azzollini; Ji Soo Kim; Cheng Kai Jin; Eve Kim; Simon Gordon; Jason Yeoh; Min A Han; Andrew Lee; Aakash Patel; Amy Fong; Cameron Simmers; Gregory Tarr; Stuart Barnard; Ben Wilson (2025). CANDID-PTX [Dataset]. http://doi.org/10.17608/k6.auckland.14173982.v2
    Explore at:
    pngAvailable download formats
    Dataset updated
    Jun 27, 2025
    Dataset provided by
    Figsharehttp://figshare.com/
    figshare
    Authors
    Sijing Feng; Damian Azzollini; Ji Soo Kim; Cheng Kai Jin; Eve Kim; Simon Gordon; Jason Yeoh; Min A Han; Andrew Lee; Aakash Patel; Amy Fong; Cameron Simmers; Gregory Tarr; Stuart Barnard; Ben Wilson
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    19,237 anonymized adult chest x-ray datasets in 1024 x 1024 pixel DICOM format with corresponding anonymized free-text reports from Dunedin Hospital, New Zealand between 2010 - 2020. Images were manually annotated by RANZCR radiology trainee and radiologists with respect to pneumothorax, acute rib fracture, and chest tubes. Segmentation annotations were converted to run-length-coded (RLE) format in csv files. In the provided metadata, image filenames contain patient index (enabling analysis requiring patient grouping of images), as well as anonymized date of acquisition information where the temporal relationship between images is preserved.Unfortunately, since Feb 2024, the New Zealand government is changing the data governance on datasets used for AI development and this affects the process of how the CANDID PTX dataset is to be accessed by the external users. Therefore, the CANDID PTX dataset is not available for access by users outside Health New Zealand. Further notice of access will be updated here should access by external users be reopened.

  18. SpaceNet 7 Change Detection Chips and Masks

    • kaggle.com
    zip
    Updated Dec 24, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    A Merii (2020). SpaceNet 7 Change Detection Chips and Masks [Dataset]. https://www.kaggle.com/datasets/amerii/spacenet-7-change-detection-chips-and-masks
    Explore at:
    zip(0 bytes)Available download formats
    Dataset updated
    Dec 24, 2020
    Authors
    A Merii
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    Context

    This dataset is based on the original SpaceNet 7 dataset, with a few modifications.

    Content

    The original dataset consisted of Planet satellite imagery mosaics, which includes 24 images (one per month) covering ~100 unique geographies. The original dataset will comprised over 40,000 square kilometers of imagery and exhaustive polygon labels of building footprints in the imagery, totaling over 10 million individual annotations.

    This dataset builds upon the original dataset, such that each image is segmented into 64 x 64 chips, in order to make it easier to build a model for.

    https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F4101651%2F66851650dbfb7017f1c5717af16cea3c%2Fchips.png?generation=1607947381793575&alt=media" alt="">

    The images also compare the changes that between each image of each month, such that an image taken in month 1 is compared with the image take in month 2, 3, ... 24. This is done by taking the cartesian product of the differences between each image. For more information on how this is done check out the following notebook.

    The differences between the images are captured in the output mask, and the 2 images being compared are stacked. Which means that our input images have dimensions of 64 x 64 x 6, and our output mask has dimensions 64 x 64 x 1. The reason our input images have 6 dimensions is because as mentioned earlier, they are 2 images stacked together. See image below for more details:

    https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F4101651%2F9cdcf8481d8d81b6d3fed072cea89586%2Fdifference.png?generation=1607947852597860&alt=media" alt="">

    The image above shows the masks for each of the original satellite images and what the difference between the 2 looks like. For more information on how the original data was explored check out this notebook.

    Data Structure

    The data is structured as follows:
    chip_dataset
    └── change_detection
    └── fname
    ├── chips
    │ └── year1_month1_year2_month2
    │ └── global_monthly_year1_month1_year2_month2_chip_x###_y###_fname.tif
    └── masks
    └── year1_month1_year2_month2
    └── global_monthly_year1_month1_year2_month2_chip_x###_y###_fname_blank.tif

    The _blank in the mask chips, indicates whether the mask is a blank mask or not.

    For more information on how the data was structured and augmented check out the following notebook.

    Acknowledgements

    All credit goes to the team at SpaceNet for collecting and annotating and formatting the original dataset.

  19. Twitter users in Brazil 2019-2028

    • statista.com
    Updated Jul 9, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2025). Twitter users in Brazil 2019-2028 [Dataset]. https://www.statista.com/forecasts/1146589/twitter-users-in-brazil
    Explore at:
    Dataset updated
    Jul 9, 2025
    Dataset authored and provided by
    Statistahttp://statista.com/
    Area covered
    Brazil
    Description

    The number of Twitter users in Brazil was forecast to continuously increase between 2024 and 2028 by in total *** million users (+***** percent). After the ninth consecutive increasing year, the Twitter user base is estimated to reach ***** million users and therefore a new peak in 2028. Notably, the number of Twitter users of was continuously increasing over the past years.User figures, shown here regarding the platform twitter, have been estimated by taking into account company filings or press material, secondary research, app downloads and traffic data. They refer to the average monthly active users over the period.The shown data are an excerpt of Statista's Key Market Indicators (KMI). The KMI are a collection of primary and secondary indicators on the macro-economic, demographic and technological environment in up to *** countries and regions worldwide. All indicators are sourced from international and national statistical offices, trade associations and the trade press and they are processed to generate comparable data sets (see supplementary notes under details for more information).

  20. n

    CSU Synthetic Attribution Benchmark Dataset

    • cmr.earthdata.nasa.gov
    Updated Oct 10, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2023). CSU Synthetic Attribution Benchmark Dataset [Dataset]. http://doi.org/10.34911/rdnt.8snx6c
    Explore at:
    Dataset updated
    Oct 10, 2023
    Time period covered
    Jan 1, 2020 - Jan 1, 2023
    Area covered
    Description

    This is a synthetic dataset that can be used by users that are interested in benchmarking methods of explainable artificial intelligence (XAI) for geoscientific applications. The dataset is specifically inspired from a climate forecasting setting (seasonal timescales) where the task is to predict regional climate variability given global climate information lagged in time. The dataset consists of a synthetic input X (series of 2D arrays of random fields drawn from a multivariate normal distribution) and a synthetic output Y (scalar series) generated by using a nonlinear function F: R^d -> R.

    The synthetic input aims to represent temporally independent realizations of anomalous global fields of sea surface temperature, the synthetic output series represents some type of regional climate variability that is of interest (temperature, precipitation totals, etc.) and the function F is a simplification of the climate system.

    Since the nonlinear function F that is used to generate the output given the input is known, we also derive and provide the attribution of each output value to the corresponding input features. Using this synthetic dataset users can train any AI model to predict Y given X and then implement XAI methods to interpret it. Based on the “ground truth” of attribution of F the user can assess the faithfulness of any XAI method.

    NOTE: the spatial configuration of the observations in the NetCDF database file conform to the planetocentric coordinate system (89.5N - 89.5S, 0.5E - 359.5E), where longitude is measured in the positive heading east from the prime meridian.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Antoine Vendeville; Jimena Royo-Letelier; Duncan Cassells; Jean-Philippe Cointet; Maxime Crépel; Tim Faverjon; Théophile Lenoir; Béatrice Mazoyer; Benjamin Ooghe-Tabanou; Armin Pournaki; Hiroki Yamashita; Pedro Ramaciotti; Antoine Vendeville; Jimena Royo-Letelier; Duncan Cassells; Jean-Philippe Cointet; Maxime Crépel; Tim Faverjon; Théophile Lenoir; Béatrice Mazoyer; Benjamin Ooghe-Tabanou; Armin Pournaki; Hiroki Yamashita; Pedro Ramaciotti (2025). Population of X/Twitter users and web domains embedded in a multidimensional political opinion space [Dataset]. http://doi.org/10.21410/7E4/QPECFF

Population of X/Twitter users and web domains embedded in a multidimensional political opinion space

Explore at:
tsv(100846), tsv(106000433), tsv(177962), tsv(32523281), tsv(146217)Available download formats
Dataset updated
Mar 14, 2025
Dataset provided by
data.sciencespo
Authors
Antoine Vendeville; Jimena Royo-Letelier; Duncan Cassells; Jean-Philippe Cointet; Maxime Crépel; Tim Faverjon; Théophile Lenoir; Béatrice Mazoyer; Benjamin Ooghe-Tabanou; Armin Pournaki; Hiroki Yamashita; Pedro Ramaciotti; Antoine Vendeville; Jimena Royo-Letelier; Duncan Cassells; Jean-Philippe Cointet; Maxime Crépel; Tim Faverjon; Théophile Lenoir; Béatrice Mazoyer; Benjamin Ooghe-Tabanou; Armin Pournaki; Hiroki Yamashita; Pedro Ramaciotti
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

The undertaking of several studies of political phenomena in social media mandates the operationalization of the notion of political stance of users and contents involved. Relevant examples include the study of segregation and polarization online, the study of political diversity in content diets in social media, or AI explainability. While many research designs rely on operationalizations best suited for the US setting, few allow addressing more general design, in which users and content might take stances on multiple ideology and issue dimensions, going beyond traditional Liberal-Conservative or Left-Right scales. To advance the study of more general online ecosystems, we present a dataset of X/Twitter population of users in the French political Twittersphere and web domains embedded in a political space spanned by dimensions measuring attitudes towards immigration, the EU, liberal values, elites and institutions, nationalism and the environment. We provide several benchmarks validating the positions of these entities (based on both, LLM and human annotations), and discuss several applications for this dataset.

Search
Clear search
Close search
Google apps
Main menu