Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The undertaking of several studies of political phenomena in social media mandates the operationalization of the notion of political stance of users and contents involved. Relevant examples include the study of segregation and polarization online, the study of political diversity in content diets in social media, or AI explainability. While many research designs rely on operationalizations best suited for the US setting, few allow addressing more general design, in which users and content might take stances on multiple ideology and issue dimensions, going beyond traditional Liberal-Conservative or Left-Right scales. To advance the study of more general online ecosystems, we present a dataset of X/Twitter population of users in the French political Twittersphere and web domains embedded in a political space spanned by dimensions measuring attitudes towards immigration, the EU, liberal values, elites and institutions, nationalism and the environment. We provide several benchmarks validating the positions of these entities (based on both, LLM and human annotations), and discuss several applications for this dataset.
https://brightdata.com/licensehttps://brightdata.com/license
Utilize our Twitter dataset for diverse applications to enrich business strategies and market insights. Analyzing this dataset provides a comprehensive understanding of social media trends, empowering organizations to refine their communication and marketing strategies. Access the entire dataset or customize a subset to fit your needs. Popular use cases include market research to identify trending topics and hashtags, AI training by reviewing factors such as tweet content, retweets, and user interactions for predictive analytics, and trend forecasting by examining correlations between specific themes and user engagement to uncover emerging social media preferences.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This is a version 2 dataset of paired OpenAlex author IDs (https://docs.openalex.org/about-the-data/author) and Twitter (now X) user IDs
Major update in this version
Following the significant update to OpenAlex's author identification system, the scholars on Twitter dataset, which previously linked Twitter IDs to OpenAlex author IDs, immediately became outdated. This called for a new approach to re-establish these links, as the absence of new Twitter data made it impossible to replicate the original method of matching Twitter profiles with scholarly authors. To navigate this challenge, a bridge was constructed between the June 2022 snapshot of the OpenAlex database—used in the original matching process—and the most recent snapshot from February 2024. This bridge utilized OpenAlex works IDs and DOIs to match authors in both datasets by their shared publications and identical primary names. When a connection was established between two authors with the same name, the new OpenAlex author ID was assigned to the corresponding Twitter ID. When direct matches based on primary names were not found, an attempt was made to establish connections by matching the names from June 2022 with any corresponding alternative names found in the 2024 dataset. This method ensured continuity of identity through the system update, adapting the strategy to link profiles across the temporal divide created by the database's overhaul.
Our efficient method for re-establishing links between author IDs and Twitter profiles has been notably successful, managing to rematch 432,417 (88%) OpenAlex author IDs. This effort successfully restored connections for 388,968 unique Twitter users, which represents 92% of the original dataset. Of these, 375,316 were matched using their primary names, and 57,101 through alternative names. The simplicity and quick execution of this approach led to exceptionally favourable results, with a minimal loss of only 8% of the original Twitter-linked scholarly accounts.
The dataset includes 432,417 unique author_ids and 388,968 unique tweeter_ids forming 462,427 unique author-tweeter pairs.
File descriptions
How to cite
When using the dataset, please cite the following article providing details about the matching process:
The number of Twitter users in the United States was forecast to continuously increase between 2024 and 2028 by in total 4.3 million users (+5.32 percent). After the ninth consecutive increasing year, the Twitter user base is estimated to reach 85.08 million users and therefore a new peak in 2028. Notably, the number of Twitter users of was continuously increasing over the past years.User figures, shown here regarding the platform twitter, have been estimated by taking into account company filings or press material, secondary research, app downloads and traffic data. They refer to the average monthly active users over the period.The shown data are an excerpt of Statista's Key Market Indicators (KMI). The KMI are a collection of primary and secondary indicators on the macro-economic, demographic and technological environment in up to 150 countries and regions worldwide. All indicators are sourced from international and national statistical offices, trade associations and the trade press and they are processed to generate comparable data sets (see supplementary notes under details for more information).Find more key insights for the number of Twitter users in countries like Canada and Mexico.
The XRAY database table contains selected parameters from almost all HEASARC X-ray catalogs that have source positions located to better than a few arcminutes. The XRAY database table was created by copying all of the entries and common parameters from the tables listed in the Component Tables section. The XRAY database table has many entries but relatively few parameters; it provides users with general information about X-ray sources, obtained from a variety of catalogs. XRAY is especially suitable for cone searches and cross-correlations with other databases. Each entry in XRAY has a parameter called 'database_table' which indicates from which original database the entry was copied; users can browse that original table should they wish to examine all of the parameter fields for a particular entry. For some entries in XRAY, some of the parameter fields may be blank (or have zero values); this indicates that the original database table did not contain that particular parameter or that it had this same value there. The HEASARC in certain instances has included X-ray sources for which the quoted value for the specified band is an upper limit rather than a detection. The HEASARC recommends that the user should always check the original tables to get the complete information about the properties of the sources listed in the XRAY master source list. This master catalog is updated periodically whenever one of the component database tables is modified or a new component database table is added. This is a service provided by NASA HEASARC .
The Annual Respondents Database X (ARDx) has been created to allow users of Annual Respondents Database (ARD) (held at the UK Data Archive under SN 6644) to continue analysis even though the Annual Business Inquiry (ABI) which was used to create ARD ceased in 2008. ARDx contains harmonised variables from 1997 to 2020.
ARDx is created from two ONS surveys, the Annual Business Inquiry (ABI; 1998-2008, held at the UK Data Archive under SN 6644) and the Annual Business Survey (ABS; 2009 onwards, held at the UK Data Archive under SN 7451). The ABI has an employment survey (ABI1) and a second survey for financial information (ABI2). ABS only collects financial data, and so is supplemented with employment data from the Business Register and Employment Survey (BRES; 2009 onwards, held at the UK Data Archive under SN 7463).
ARDx consists of six types of files: 'respondent files' which have reported and derived information from survey questionnaire responses; and 'universe files' which contain limited information on all business that are within scope of the ABI/ABS. These files are provided at both the Reporting Unit and Local Unit levels. There are also 'register panel' and "capital stock" files.
Linking to other business studies
These data contain Inter-Departmental Business Register (IDBR) reference numbers. These are anonymous but unique reference numbers assigned to business organisations. Their inclusion allows researchers to combine different business survey sources together. Researchers may consider applying for other business data to assist their research.
For the fifth edition (December 2023), ARDx Version 4.0 for 1997-2020 has been provided, replacing Version 3. Coverage has thus been expanded to include 1997 and 2015-2020.
Note to users
Due to the limited nature of the documentation available for ARDx, users are advised to consult the documentation for the Annual Business Survey (UK Data Archive SN 7451) for detailed information about the data.
For Secure Lab projects applying for access to this study as well as to SN 6697 Business Structure Database and/or SN 7683 Business Structure Database Longitudinal, only postcode-free versions of the data will be made available.
The XRAY database table contains selected parameters from almost all HEASARC X-ray catalogs that have source positions located to better than a few arcminutes. The XRAY database table was created by copying all of the entries and common parameters from the tables listed in the Component Tables section. The XRAY database table has many entries but relatively few parameters; it provides users with general information about X-ray sources, obtained from a variety of catalogs. XRAY is especially suitable for cone searches and cross-correlations with other databases. Each entry in XRAY has a parameter called 'database_table' which indicates from which original database the entry was copied; users can browse that original table should they wish to examine all of the parameter fields for a particular entry. For some entries in XRAY, some of the parameter fields may be blank (or have zero values); this indicates that the original database table did not contain that particular parameter or that it had this same value there. The HEASARC in certain instances has included X-ray sources for which the quoted value for the specified band is an upper limit rather than a detection. The HEASARC recommends that the user should always check the original tables to get the complete information about the properties of the sources listed in the XRAY master source list. This master catalog is updated periodically whenever one of the component database tables is modified or a new component database table is added. This is a service provided by NASA HEASARC .
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
53,054 anonymized adult chest x-ray dataset in 1024 x 1024 pixel DICOM format with corresponding anonymized free-text reports from Dunedin Hospital, New Zealand between 2010 - 2020. Corresponding radiology reports generated by FRANZCR radiologists were manually annotated for 46 common radiological findings mapped to Unified Medical Language System (UMLS) and RadLex ontology. Each of the multiclassification annotations contains 4 types of labels, namely positive, uncertain, negative and not mentioned. In the provided dataset, image filenames contain patient index (enabling analysis requiring grouping of images by patients), as well as anonymized date of acquisition information where the temporal relationship between images is preserved. This dataset can be used for training and testing for deep learning algorithms for adult chest x rays.Unfortunately, since Feb 2024, the New Zealand government is changing the data governance on datasets used for AI development and this affects the process of how the CANDID II dataset is to be accessed by the external users. Therefore, the CANDID II dataset is not available for access by users outside Health New Zealand. Further notice of access will be updated here should access by external users be reopened.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
1. Introduction
The file “gen_dd_channel.zip” is a package of a wideband multiple-input multiple-output (MIMO) stored radio channel model at 140 GHz in indoor hall, outdoor suburban, residential and urban scenarios. The package consists of 1) measured wideband double-directional multipath data sets estimated from radio channel sounding and processed through measurement-based ray-launching and 2) MATLAB code sets that allows users to generate wideband MIMO radio channels with various antenna array types, e.g., uniform planar and circular arrays at link ends.
2. What does this package do?
Outputs of the channel model
The MATLAB file “ChannelGeneratorDD_hexax.m” gives the following variables, among others. The .m file also gives optional figures illustrating antennas and radio channel responses.
Variables |
Descriptions |
CIR |
MIMO channel impulse responses |
CFR |
MIMO channel frequency responses |
Inputs to the channel model
In order for the MATLAB file “ChannelGeneratorDD_hexax.m” to run properly, the following inputs are required.
Directory |
Descriptions |
data_030123_double_directional_paths |
Double-directional multipath data, measured and complemented by ray-launching tool, for various cellular sites. |
User’s parameters
When using “ChannelGeneratorDD_hexax.m”, the following choices are available.
Features |
Choices |
Channel model types for transfer function generation |
|
Antenna / beam shapes |
|
List of files in the dataset
MATLAB codes that implement the channel model
The MATLAB files consist of the following files.
File and directory names |
Descriptions |
readme_100223.txt |
Readme file; please read it before using the files |
ChannelGeneratorDD_hexax.m |
Main code to run; a code to integrate antenna arrays and double-directional path data to derive MIMO radio channels. No need to see/edit other files. |
gen_pathDD.m, randl.m, randLoc.m |
Sub-routines used in ChannelGeneratorDD_hexax.m; no need of modifications. |
Hexa-X channel generator DD_presentation.pdf |
User manual of ChannelGeneratorDD_hexax.m. |
Measured multipath data
The directory "data_030123_double_directional_paths" in the package contains the following files.
Filenames |
Descriptions |
readme_100223.txt |
Readme file; please read it before using the files |
RTdata_[scenario]_[date].mat |
Containing double-directional multipath parameters at 140 GHz in the specified scenario, estimated from radio channel sounding and ray-tracing. |
description_of_data_dd_[scenario].pdf |
Explaining data formats, the measurement site and sample results. |
References
Details of the data set are available in the following two documents:
The stored channel models
A. Nimr (ed.), "Hexa-X Deliverable D2.3 Radio models and enabling techniques towards ultra-high data rate links and capacity in 6G," April 2023, available: https://hexa-x.eu/deliverables/
@misc{Hexa-XD23,
author = {{A. Nimr (ed.)}},
title = {{Hexa-X Deliverable D2.3 Radio models and enabling techniques towards ultra-high data rate links and capacity in 6G}},
year = {2023},
month = {Apr.},
howpublished = {https://hexa-x.eu/deliverables/},
}
Derivation of the data, i.e., radio channel sounding and measurement-based ray-launching
M. F. De Guzman and K. Haneda, "Analysis of wave-interacting objects in indoor and outdoor environments at 142 GHz," IEEE Transactions on Antennas and Propagation, vol. 71, no. 12, pp. 9838-9848, Dec. 2023, doi: 10.1109/TAP.2023.3318861
@ARTICLE{DeGuzman23_TAP,
author={De Guzman, Mar Francis and Haneda, Katsuyuki},
journal={IEEE Transactions on Antennas and Propagation},
title={Analysis of Wave-Interacting Objects in Indoor and Outdoor Environments at 142 {GHz}},
year={2023},
volume={71},
number={12},
pages={9838-9848},
}
Finally, the code “randl.m” are from the following MATLAB Central File Exchange.
Hristo Zhivomirov (2023). Generation of Random Numbers with Laplace Distribution (https://www.mathworks.com/matlabcentral/fileexchange/53397-generation-of-random-numbers-with-laplace-distribution), MATLAB Central File Exchange. Retrieved February 15, 2023.
Data usage terms
Any usage of the data must be upon consent on the following conditions:
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Dataset concerning coordinated behaviour in Information Operations in Honduras and United Arab Emirates, consisting of two parts:
This dataset allows to explore meaningful patterns of coordination which could distinguish conversations with malicious intent from genuine conversations.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The dataset contains posts from the X/Tweeter platform. Extraction of unstructured data from X/Twitter has been performed using R scripts through the Application Programming Interface (API) v2 for Academic Research, which enabled researchers to retrieve posts from the entire X/Twitter archive. At the time the data was collected, access to the Twitter API for Academic Research was still possible, but was restricted after the company changed its policy in February 2023. The post selection criteria were (i) posts published in the Polish language, (ii) posts containing the keywords “Ukraińcy” (“Ukrainians”), “w Polsce” (“in Poland”), and (iii) posts that were published between 22 February 2022 (12:00 a.m. CET) and 31 December 2022 (11:59 p.m. CET). The time frame selected for this study is related to the date when the Russian Federation invaded Ukraine and the closing date of the first calendar year of the conflict. The X/Twitter users included in the data analysis were those who sent posts with the above-mentioned characteristics during the pre-defined period. Unverified users were also included, as one of the objectives of the study was to analyse message dissemination. A total of 55,035 posts (original content), reposts (forwarded content), and replies (discussions among users) were collected. These were then extracted, and imported into NodeXL software, which is a professional tool for analysing social media, used in many research projects.Rows are posts. Columns are variables, described as:Vertex1: Author of the postVertex2: Target of the interaction (the user whose tweet is being retweeted or replied to)Relationship: Type of interaction (Tweet, Retweet, Reply)Relationship Date: Time of post publicationTweet: Content of the postRetweet Count: Number of repostsFavorite Count: Number of likesReply Count: Number of replies to the postQuote Count: Number of times the post was quotedHashtags in Tweet: Hashtags included in the post (if any)URLs in Tweet: URLs included in the post (if any)Domains in Tweet: Referenced domains (if any)Mentions in Tweet: Mentions in the post (if any)Media in Tweet: Referenced media (if any)Media Type: Type of referenced media (if applicable)Twitter Page for Tweet: Link to the webpage with the source tweetTweet Date (UTC): Date of original tweet publicationTweet Image File: Avatar of the post's authorImported ID: ID of the post after importConversation ID: Thread ID to which the post is assignedIn Reply To Tweet ID: ID of the post to which this post is a reply (if applicable)Quoted Status ID: ID of the post that was quoted (if applicable)Retweet ID: ID of the repostAuthor ID: ID of the post's authorVertex1 Group: User group to which the post's author was assigned through clusteringVertex2 Group: User group to which the recipient of the post was assigned through clustering
https://artefacts.ceda.ac.uk/licences/specific_licences/ecmwf-era-products.pdfhttps://artefacts.ceda.ac.uk/licences/specific_licences/ecmwf-era-products.pdf
The European Centre for Medium-Range Weather Forecasts (ECMWF) has provided global atmospheric analyses from its archive for many years. The ERA-15 Re-analysis project was devised in response to wishes expressed by many users for a data set generated by a modern, consistent, and invariant data assimilation system. The ERA-15 project produced a long time-series (January 1979 - February 1994) of consistent meteorological analyses using a single version of the ECMWF model.
This dataset contains regular 2.5 degree x 2.5 degree gridded data on standard pressure levels and at the surface.
Consistent modeling of protoplanetary disks requires the simultaneous solution of both continuum and line radiative transfer, heating and cooling balance between dust and gas and, of course, chemistry. Such models depend on panchromatic observations that can provide a complete description of the physical and chemical properties and energy balance of protoplanetary systems. Along these lines we present a homogeneous, panchromatic collection of data on a sample of 85 T Tauri and Herbig Ae objects for which data cover a range from X-rays to centimeter wavelengths. Datasets consist of photometric measurements, spectra, along with results from the data analysis such as line fluxes from atomic and molecular transitions. Additional properties resulting from modeling of the sources such as disc mass and shape parameters, dust size and PAH properties are also provided for completeness. The purpose of this data collection is to provide a solid base that can enable consistent modeling of the properties of protoplan- etary disks. To this end, we performed an unbiased collection of publicly available data that were combined to homogeneous datasets adopting consistent criteria. Targets were selected based on both their properties but also on the availability of data. Data from more than 50 different telescopes and facilities were retrieved and combined in homogeneous datasets directly from public data archives or after being extracted from more than 100 published articles. X-ray data for a subset of 56 sources represent an exception as they were reduced from scratch and are presented here for the first time. Compiled datasets along with a subset of continuum and emission-line models are stored in a dedicated database and distributed through a publicly accessible online system. All datasets contain metadata descriptors that allow to backtrack them to their original resources. The graphical user interface of the online system allows the user to visually inspect individual objects but also compare between datasets and models. It also offers to the user the possibility to download any of the stored data and metadata for further processing.
*** Fake News on Twitter ***
These 5 datasets are the results of an empirical study on the spreading process of newly fake news on Twitter. Particularly, we have focused on those fake news which have given rise to a truth spreading simultaneously against them. The story of each fake news is as follow:
1- FN1: A Muslim waitress refused to seat a church group at a restaurant, claiming "religious freedom" allowed her to do so.
2- FN2: Actor Denzel Washington said electing President Trump saved the U.S. from becoming an "Orwellian police state."
3- FN3: Joy Behar of "The View" sent a crass tweet about a fatal fire in Trump Tower.
4- FN4: The animated children's program 'VeggieTales' introduced a cannabis character in August 2018.
5- FN5: In September 2018, the University of Alabama football program ended its uniform contract with Nike, in response to Nike's endorsement deal with Colin Kaepernick.
The data collection has been done in two stages that each provided a new dataset: 1- attaining Dataset of Diffusion (DD) that includes information of fake news/truth tweets and retweets 2- Query of neighbors for spreaders of tweets that provides us with Dataset of Graph (DG).
DD
DD for each fake news story is an excel file, named FNx_DD where x is the number of fake news, and has the following structure:
The structure of excel files for each dataset is as follow:
Each row belongs to one captured tweet/retweet related to the rumor, and each column of the dataset presents a specific information about the tweet/retweet. These columns from left to right present the following information about the tweet/retweet:
User ID (user who has posted the current tweet/retweet)
The description sentence in the profile of the user who has published the tweet/retweet
The number of published tweet/retweet by the user at the time of posting the current tweet/retweet
Date and time of creation of the account by which the current tweet/retweet has been posted
Language of the tweet/retweet
Number of followers
Number of followings (friends)
Date and time of posting the current tweet/retweet
Number of like (favorite) the current tweet had been acquired before crawling it
Number of times the current tweet had been retweeted before crawling it
Is there any other tweet inside of the current tweet/retweet (for example this happens when the current tweet is a quote or reply or retweet)
The source (OS) of device by which the current tweet/retweet was posted
Tweet/Retweet ID
Retweet ID (if the post is a retweet then this feature gives the ID of the tweet that is retweeted by the current post)
Quote ID (if the post is a quote then this feature gives the ID of the tweet that is quoted by the current post)
Reply ID (if the post is a reply then this feature gives the ID of the tweet that is replied by the current post)
Frequency of tweet occurrences which means the number of times the current tweet is repeated in the dataset (for example the number of times that a tweet exists in the dataset in the form of retweet posted by others)
State of the tweet which can be one of the following forms (achieved by an agreement between the annotators):
r : The tweet/retweet is a fake news post
a : The tweet/retweet is a truth post
q : The tweet/retweet is a question about the fake news, however neither confirm nor deny it
n : The tweet/retweet is not related to the fake news (even though it contains the queries related to the rumor, but does not refer to the given fake news)
DG
DG for each fake news contains two files:
A file in graph format (.graph) which includes the information of graph such as who is linked to whom. (This file named FNx_DG.graph, where x is the number of fake news)
A file in Jsonl format (.jsonl) which includes the real user IDs of nodes in the graph file. (This file named FNx_Labels.jsonl, where x is the number of fake news)
Because in the graph file, the label of each node is the number of its entrance in the graph. For example if node with user ID 12345637 be the first node which has been entered into the graph file then its label in the graph is 0 and its real ID (12345637) would be at the row number 1 (because the row number 0 belongs to column labels) in the jsonl file and so on other node IDs would be at the next rows of the file (each row corresponds to 1 user id). Therefore, if we want to know for example what the user id of node 200 (labeled 200 in the graph) is, then in jsonl file we should look at row number 202.
The user IDs of spreaders in DG (those who have had a post in DD) would be available in DD to get extra information about them and their tweet/retweet. The other user IDs in DG are the neighbors of these spreaders and might not exist in DD.
The global number of Twitter users in was forecast to continuously increase between 2024 and 2028 by in total 74.3 million users (+17.32 percent). After the ninth consecutive increasing year, the Twitter user base is estimated to reach 503.42 million users and therefore a new peak in 2028. Notably, the number of Twitter users of was continuously increasing over the past years.User figures, shown here regarding the platform twitter, have been estimated by taking into account company filings or press material, secondary research, app downloads and traffic data. They refer to the average monthly active users over the period.The shown data are an excerpt of Statista's Key Market Indicators (KMI). The KMI are a collection of primary and secondary indicators on the macro-economic, demographic and technological environment in up to 150 countries and regions worldwide. All indicators are sourced from international and national statistical offices, trade associations and the trade press and they are processed to generate comparable data sets (see supplementary notes under details for more information).Find more key insights for the number of Twitter users in countries like South America and the Americas.
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
#Raw Data, Source, More Information :: https://www.kaggle.com/datasets/huseyingunduz/diatom-dataset?select=images Citation @article{gunduz2022, title={Segmentation of diatoms using edge detection and deep learning}, volume={30}, DOI={10.55730/1300-0632.3938}, number={6}, journal={Turkish Journal of Electrical Engineering & Computer Sciences}, author={Gunduz, Huseyin and Solak, Cuneyt Nadir and Gunal, Serkan}, year={2022}, pages={ 2268–2285}} Diatoms are a group of algae found in oceans, freshwater, moist soils, and surfaces. They are one of the most common phytoplankton species found in nature. There are more than 200 genera of diatoms, as well as about 200,000 species. They produce approximately 20-25% of the oxygen on the planet.
Accurate detection, segmentation and classification of diatoms is very important, especially in terms of determining water quality and ecological change.
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F25409507%2F0b140a77bdc1e8b3955453f9eb60a294%2F1049_10.jpg?generation=1748347264264824&alt=media" alt="">
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F25409507%2F5dfebe496555e0d44f88a323020b5c29%2F1435_11.jpg?generation=1748347286426827&alt=media" alt="">
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F25409507%2F52b9bac2244c46778d4e7a5680d5db9b%2F1057_8.jpg?generation=1748347333663011&alt=media" alt="">
Colorized Data Processing Techniques for Medical Imaging
Medical images like CT scans and X-rays are typically grayscale, making subtle anatomical or pathological differences harder to distinguish. The following image processing and enhancement techniques are used to colorize and improve visual interpretation for diagnostics, training, or AI preprocessing.
🔷 1. 3D_Rendering Renders medical image volumes into three-dimensional visualizations. Though often grayscale, color can be applied to different tissue types or densities to enhance spatial understanding. Useful in surgical planning or tumor visualization.
🔷 2. 3D_Volume_Rendering An advanced visualization technique that projects 3D image volumes with transparency and color blending, simulating how light passes through tissue. Color helps distinguish internal structures like organs, vessels, or tumors.
🔷 3. Adaptive Histogram Equalization (AHE) Enhances contrast locally within the image, especially in low-contrast regions. When colorized, different intensities are mapped to distinct hues, improving visibility of fine-grained details like soft tissues or lesions.
🔷 4. Alpha Blending A layering technique that combines multiple images (e.g., CT + annotation masks) with transparency. Colors represent different modalities or regions of interest, providing composite visual cues for diagnosis.
🔷 5. Basic Color Map Applies a standard color palette (like Jet or Viridis) to grayscale data. Different intensities are mapped to different colors, enhancing the visual discrimination of anatomical or pathological regions in the image.
🔷 6. Contrast Stretching Expands the grayscale range to improve brightness and contrast. When combined with color mapping, tissues with similar intensities become visually distinct, aiding in tasks like bone vs. soft tissue separation.
🔷 7. Edge Detection Extracts and overlays object boundaries (e.g., organ or lesion outlines) on the original scan. Edge maps are typically colorized (e.g., green or red) to highlight anatomical structures or abnormalities clearly.
🔷 8. Gamma Correction Adjusts image brightness non-linearly. Color can be used to highlight underexposed or overexposed regions, often revealing soft tissue structures otherwise hidden in raw grayscale CT/X-ray images.
🔷 9. Gaussian Blur Smooths image noise and details. When visualized with color overlays (e.g., before vs. after), it helps assess denoising effectiveness. It is also used in segmentation preprocessing to reduce edge artifacts.
🔷 10. Heatmap Visualization Encodes intensity or prediction confidence into a heatmap overlay (e.g., red for high activity). Common in AI-assisted diagnosis to localize tumors, fractures, or infections, layered over the original grayscale image.
🔷 11. Interactive Segmentation A semi-automated method to extract regions of interest with user input. Segmented areas are color-coded (e.g., tumor = red, background = blue) for immediate visual confirmation and further analysis.
🔷 12. LUT (Lookup Table) Color Map Maps grayscale values to custom color palettes using a lookup table. This enhances contrast and emphasizes certain intensity ranges (e.g., blood vessels vs. bone), improving interpretability for radiologists.
🔷 13. Random Color Palette Applies random but consistent colors to segmented regions or labels. Common in datasets with multiple classes (e.g., liver, spleen, kidneys), it helps in v...
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
19,237 anonymized adult chest x-ray datasets in 1024 x 1024 pixel DICOM format with corresponding anonymized free-text reports from Dunedin Hospital, New Zealand between 2010 - 2020. Images were manually annotated by RANZCR radiology trainee and radiologists with respect to pneumothorax, acute rib fracture, and chest tubes. Segmentation annotations were converted to run-length-coded (RLE) format in csv files. In the provided metadata, image filenames contain patient index (enabling analysis requiring patient grouping of images), as well as anonymized date of acquisition information where the temporal relationship between images is preserved.Unfortunately, since Feb 2024, the New Zealand government is changing the data governance on datasets used for AI development and this affects the process of how the CANDID PTX dataset is to be accessed by the external users. Therefore, the CANDID PTX dataset is not available for access by users outside Health New Zealand. Further notice of access will be updated here should access by external users be reopened.
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
This dataset is based on the original SpaceNet 7 dataset, with a few modifications.
The original dataset consisted of Planet satellite imagery mosaics, which includes 24 images (one per month) covering ~100 unique geographies. The original dataset will comprised over 40,000 square kilometers of imagery and exhaustive polygon labels of building footprints in the imagery, totaling over 10 million individual annotations.
This dataset builds upon the original dataset, such that each image is segmented into 64 x 64 chips, in order to make it easier to build a model for.
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F4101651%2F66851650dbfb7017f1c5717af16cea3c%2Fchips.png?generation=1607947381793575&alt=media" alt="">
The images also compare the changes that between each image of each month, such that an image taken in month 1 is compared with the image take in month 2, 3, ... 24. This is done by taking the cartesian product of the differences between each image. For more information on how this is done check out the following notebook.
The differences between the images are captured in the output mask, and the 2 images being compared are stacked. Which means that our input images have dimensions of 64 x 64 x 6, and our output mask has dimensions 64 x 64 x 1. The reason our input images have 6 dimensions is because as mentioned earlier, they are 2 images stacked together. See image below for more details:
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F4101651%2F9cdcf8481d8d81b6d3fed072cea89586%2Fdifference.png?generation=1607947852597860&alt=media" alt="">
The image above shows the masks for each of the original satellite images and what the difference between the 2 looks like. For more information on how the original data was explored check out this notebook.
The data is structured as follows:
chip_dataset
└── change_detection
└── fname
├── chips
│ └── year1_month1_year2_month2
│ └── global_monthly_year1_month1_year2_month2_chip_x###_y###_fname.tif
└── masks
└── year1_month1_year2_month2
└── global_monthly_year1_month1_year2_month2_chip_x###_y###_fname_blank.tif
The _blank
in the mask chips, indicates whether the mask is a blank mask or not.
For more information on how the data was structured and augmented check out the following notebook.
All credit goes to the team at SpaceNet for collecting and annotating and formatting the original dataset.
The number of Twitter users in Brazil was forecast to continuously increase between 2024 and 2028 by in total *** million users (+***** percent). After the ninth consecutive increasing year, the Twitter user base is estimated to reach ***** million users and therefore a new peak in 2028. Notably, the number of Twitter users of was continuously increasing over the past years.User figures, shown here regarding the platform twitter, have been estimated by taking into account company filings or press material, secondary research, app downloads and traffic data. They refer to the average monthly active users over the period.The shown data are an excerpt of Statista's Key Market Indicators (KMI). The KMI are a collection of primary and secondary indicators on the macro-economic, demographic and technological environment in up to *** countries and regions worldwide. All indicators are sourced from international and national statistical offices, trade associations and the trade press and they are processed to generate comparable data sets (see supplementary notes under details for more information).
This is a synthetic dataset that can be used by users that are interested in benchmarking methods of explainable artificial intelligence (XAI) for geoscientific applications. The dataset is specifically inspired from a climate forecasting setting (seasonal timescales) where the task is to predict regional climate variability given global climate information lagged in time. The dataset consists of a synthetic input X (series of 2D arrays of random fields drawn from a multivariate normal distribution) and a synthetic output Y (scalar series) generated by using a nonlinear function F: R^d -> R.
The synthetic input aims to represent temporally independent realizations of anomalous global fields of sea surface temperature, the synthetic output series represents some type of regional climate variability that is of interest (temperature, precipitation totals, etc.) and the function F is a simplification of the climate system.
Since the nonlinear function F that is used to generate the output given the input is known, we also derive and provide the attribution of each output value to the corresponding input features. Using this synthetic dataset users can train any AI model to predict Y given X and then implement XAI methods to interpret it. Based on the “ground truth” of attribution of F the user can assess the faithfulness of any XAI method.
NOTE: the spatial configuration of the observations in the NetCDF database file conform to the planetocentric coordinate system (89.5N - 89.5S, 0.5E - 359.5E), where longitude is measured in the positive heading east from the prime meridian.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The undertaking of several studies of political phenomena in social media mandates the operationalization of the notion of political stance of users and contents involved. Relevant examples include the study of segregation and polarization online, the study of political diversity in content diets in social media, or AI explainability. While many research designs rely on operationalizations best suited for the US setting, few allow addressing more general design, in which users and content might take stances on multiple ideology and issue dimensions, going beyond traditional Liberal-Conservative or Left-Right scales. To advance the study of more general online ecosystems, we present a dataset of X/Twitter population of users in the French political Twittersphere and web domains embedded in a political space spanned by dimensions measuring attitudes towards immigration, the EU, liberal values, elites and institutions, nationalism and the environment. We provide several benchmarks validating the positions of these entities (based on both, LLM and human annotations), and discuss several applications for this dataset.