Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
About the Dataset
Media bias is a multifaceted problem, leading to one-sided views and impacting decision-making. A way to address bias in news articles is to automatically detect and indicate it through machine-learning methods. However, such detection is limited due to the difficulty of obtaining reliable training data. To facilitate the data-gathering process, we introduce NewsUnravel, a news-reading web application leveraging an initially tested feedback mechanism to collect reader feedback on machine-generated bias highlights within news articles. Our approach augments dataset quality by significantly increasing inter-annotator agreement by 26.31% and improving classifier performance by 2.49%. As the first human-in-the-loop application for media bias, NewsUnravel shows that a user-centric approach to media bias data collection can return reliable data while being scalable and evaluated as easy to use. NewsUnravel demonstrates that feedback mechanisms are a promising strategy to reduce data collection expenses, fluidly adapt to changes in language, and enhance evaluators' diversity.
Description of the data files
This repository contains the datasets for the anonymous NewsUnravel submission. The tables contain following data:
NUDAdataset.csv: the NUDA dataset with 310 new sentences with bias labels
Statistics.png: contains all Umami statistics for NewsUnravel's usage data
Feedback.csv: holds the participantID of a single feedback with the sentence ID (contentId), the bias rating, and provided reasons
Content.csv: holds the participant ID of a rating with the sentence ID (contentId) of a rated sentences and the bias rating, and reason, if given
Article.csv: holds the article ID, title, source, article meta data, article topic, and bias amount in %
Participant.csv: holds the participant IDs and data processing consent
Facebook
TwitterThe Data Science Journal, a renowned publication in the field of data analysis and computer science. Headquartered in New York, the organization has been a leading authority on data-driven insights for over two decades. Founded by a team of esteemed researchers and academics, the company's mission is to provide a platform for data scientists and researchers to share their findings and collaborate with peers.
The Data Science Journal's mission is reflected in their extensive repository of data, which includes datasets on various subjects such as machine learning, natural language processing, and data visualization. The data is carefully curated and organized to facilitate seamless access and integration into various projects. With a strong focus on excellence and innovation, the company continues to expand its offerings, staying at the forefront of the rapidly evolving data science landscape.
Facebook
TwitterAttribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Researcher(s): Alexandros Mokas, Eleni Kamateri
Supervisor: Ioannis Tsampoulatidis
This repository contains 3 social media datasets:
2 Post-processing datasets: These datasets contain post-processing data extracted from the analysis of social media posts collected for two different use cases during the first two years of the Deepcube project. More specifically, these include:
The UC2 dataset containing the post-processing analysis of the Twitter data collected for the DeepCube use case (UC2) dealing with the climate induced migration in Africa. This dataset contains in total 5,695,253 social media posts collected from the Twitter platform, based on the initial version of search criteria relevant to UC2 defined by Universitat De Valencia, focused on the regions of Ethiopia and Somalia and started from 26 June, 2021 till March, 2023.
The UC5 dataset containing the post-processing analysis of the Twitter and Instagram data collected for the DeepCube use case (UC5) related to the sustainable and environmentally-friendly tourism. This dataset contains in total 58,143 social media posts collected from the Twitter and Instagram platform (12,881 collected from Twitter and 45,262 collected from Instagram), based on the initial version of search criteria relevant to UC5 defined by MURMURATION SAS, focused on the regions of Brasil and started from 26 June, 2021 till March, 2023.
1 Annotated dataset: An additional anottated dataset was created that contains post-processing data along with annotations of Twitter posts collected for UC2 for the years 2010-2022. More specifically, it includes:
The UC2 dataset contain the post-processing of the Twitter data collected for the DeepCube use case (UC2) dealing with the climate induced migration in Africa. This dataset contains in total 1721 annotated (412 relevant and 1309 irrelevant) by social media posts collected from the Twitter platform, focused on the region of Somalia and started from 1 January, 2010 till 31 December, 2022.
For every social media post retrieved from Twitter and Instagram, a preprocessing step was performed. This involved a three-step analysis of each post using the appropriate web service. First, the location of the post was automatically extracted from the text using a location extraction service. Second, the images included in the post were analyzed using a concept extraction service, which identified and provided the top ten concepts that best described the image. These concepts included items such as "person," "building," "drought," "sun," and so on. Finally, the sentiment expressed in the post's text was determined by using a sentiment analysis service. The sentiment was classified as either positive, negative, or neutral.
After the social media posts were preprocessed, they were visualized using the Social Media Web Application. This intuitive, user-friendly online application was designed for both expert and non-expert users and offers a web-based user interface for filtering and visualizing the collected social media data. The application provides various filtering options, an interactive map, a timeline, and a collection of graphs to help users analyze the data. Moreover, this application provides users with the option to download aggregated data for specific periods by applying filters and clicking the "Download Posts" button. This feature allows users to easily extract and analyze social media data outside of the web application, providing greater flexibility and control over data analysis.
The dataset is provided by INFALIA. INFALIA, being a spin-off of the CERTH institute and a partner of a research EU project, releases this dataset containing Tweets IDs and post pre-processing data for the sole purpose of enabling the validation of the research conducted within the DeepCube. Moreover, Twitter Content provided in this dataset to third parties remains subject to the Twitter Policy, and those third parties must agree to the Twitter Terms of Service, Privacy Policy, Developer Agreement, and Developer Policy (https://developer.twitter.com/en/developer-terms) before receiving this download.
Facebook
TwitterUnlock the power of ready-to-use data sourced from developer communities and repositories with Developer Community and Code Datasets.
Data Sources:
GitHub: Access comprehensive data about GitHub repositories, developer profiles, contributions, issues, social interactions, and more.
StackShare: Receive information about companies, their technology stacks, reviews, tools, services, trends, and more.
DockerHub: Dive into data from container images, repositories, developer profiles, contributions, usage statistics, and more.
Developer Community and Code Datasets are a treasure trove of public data points gathered from tech communities and code repositories across the web.
With our datasets, you'll receive:
Choose from various output formats, storage options, and delivery frequencies:
Why choose our Datasets?
Fresh and accurate data: Access complete, clean, and structured data from scraping professionals, ensuring the highest quality.
Time and resource savings: Let us handle data extraction and processing cost-effectively, freeing your resources for strategic tasks.
Customized solutions: Share your unique data needs, and we'll tailor our data harvesting approach to fit your requirements perfectly.
Legal compliance: Partner with a trusted leader in ethical data collection. Oxylabs is trusted by Fortune 500 companies and adheres to GDPR and CCPA standards.
Pricing Options:
Standard Datasets: choose from various ready-to-use datasets with standardized data schemas, priced from $1,000/month.
Custom Datasets: Tailor datasets from any public web domain to your unique business needs. Contact our sales team for custom pricing.
Experience a seamless journey with Oxylabs:
Empower your data-driven decisions with Oxylabs Developer Community and Code Datasets!
Facebook
TwitterOpen Database License (ODbL) v1.0https://www.opendatacommons.org/licenses/odbl/1.0/
License information was derived automatically
This dataset was originally collected for a data science and machine learning project that aimed at investigating the potential correlation between the amount of time an individual spends on social media and the impact it has on their mental health.
The project involves conducting a survey to collect data, organizing the data, and using machine learning techniques to create a predictive model that can determine whether a person should seek professional help based on their answers to the survey questions.
This project was completed as part of a Statistics course at a university, and the team is currently in the process of writing a report and completing a paper that summarizes and discusses the findings in relation to other research on the topic.
The following is the Google Colab link to the project, done on Jupyter Notebook -
https://colab.research.google.com/drive/1p7P6lL1QUw1TtyUD1odNR4M6TVJK7IYN
The following is the GitHub Repository of the project -
https://github.com/daerkns/social-media-and-mental-health
Libraries used for the Project -
Pandas
Numpy
Matplotlib
Seaborn
Sci-kit Learn
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
A summary of the media analysis conducted on the Tokyo 2020 (2021) online media articles from July 2021 to October 2021.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
AboutRecent research shows that visualizing linguistic media bias mitigates its negative effects. However, reliable automatic detection methods to generate such visualizations require costly, knowledge-intensive training data. To facilitate data collection for media bias datasets, we present News Ninja, a game employing data-collecting game mechanics to generate a crowdsourced dataset. Before annotating sentences, players are educated on media bias via a tutorial. Our findings show that datasets gathered with crowdsourced workers trained on News Ninja can reach significantly higher inter-annotator agreements than expert and crowdsourced datasets. As News Ninja encourages continuous play, it allows datasets to adapt to the reception and contextualization of news over time, presenting a promising strategy to reduce data collection expenses, educate players, and promote long-term bias mitigation.
GeneralThis dataset was created through player annotations in the News Ninja Game made by ANON. Its goal is to improve the detection of linguistic media bias. Support came from ANON. None of the funders played any role in the dataset creation process or publication-related decisions.
The dataset includes sentences with binary bias labels (processed, biased or not biased) as well as the annotations of single players used for the majority vote. It includes all game-collected data. All data is completely anonymous. The dataset does not identify sub-populations or can be considered sensitive to them, nor is it possible to identify individuals.
Some sentences might be offensive or triggering as they were taken from biased or more extreme news sources. The dataset contains topics such as violence, abortion, and hate against specific races, genders, religions, or sexual orientations.
Description of the Data FilesThis repository contains the datasets for the anonymous News Ninja submission. The tables contain the following data:
ExportNewsNinja.csv: Contains 370 BABE sentences and 150 new sentences with their text (sentence), words labeled as biased (words), BABE ground truth (ground_Truth), and the sentence bias label from the player annotations (majority_vote). The first 370 sentences are re-annotated BABE sentences, and the following 150 sentences are new sentences.
AnalysisNewsNinja.xlsx: Contains 370 BABE sentences and 150 new sentences. The first 370 sentences are re-annotated BABE sentences, and the following 150 sentences are new sentences. The table includes the full sentence (Sentence), the sentence bias label from player annotations (isBiased Game), the new expert label (isBiased Expert), if the game label and expert label match (Game VS Expert), if differing labels are a false positives or false negatives (false negative, false positive), the ground truth label from BABE (isBiasedBABE), if Expert and BABE labels match (Expert VS BABE), and if the game label and BABE label match (Game VS BABE). It also includes the analysis of the agreement between the three rater categories (Game, Expert, BABE).
demographics.csv: Contains demographic information of News Ninja players, including gender, age, education, English proficiency, political orientation, news consumption, and consumed outlets.
Collection ProcessData was collected through interactions with the NewsNinja game. All participants went through a tutorial before annotating 2x10 BABE sentences and 2x10 new sentences. For this first test, players were recruited using Prolific. The game was hosted on a costume-built responsive website. The collection period was from 20.02.2023 to 28.02.2023. Before starting the game, players were informed about the goal and the data processing. After consenting, they could proceed to the tutorial.
The dataset will be open source. A link with all details and contact information will be provided upon acceptance. No third parties are involved.
The dataset will not be maintained as it captures the first test of NewsNinja at a specific point in time. However, new datasets will arise from further iterations. Those will be linked in the repository. Please cite the NewsNinja paper if you use the dataset and contact us if you're interested in more information or joining the project.
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Executive Summary Background In June 2016, the Tri-Council Agencies released a statement regarding Digital Data Management for grant applications . In preparation to support researchers facing new requirements, UBC librarians on both the Vancouver and Okanagan campuses initially surveyed faculty in the Sciences in Fall 2015, to determine both the actual practices of Research Data Management (RDM) employed by these researchers, and areas where the researchers would like help. Acknowledging disciplinary differences, a second survey was administered to all faculty and graduate students in Humanities and Social Sciences in October 2016. The results of these surveys will assist the University in making evidence-based decisions about what expertise will be needed to support and assist faculty in improving their data management practises to meet new requirements from funding bodies. Findings Researchers are collecting and working with a wide variety of data ranging from numerical and text data to multimedia files, software, instrument specific data, geospatial data, and many other types of data. Researchers identified four broad areas where they would like additional help and support: 1. Data Storage (including preservation and sharing) 2. Data Management Plans 3. Data Repository access 4. Data Education (workshops, and personalized training) These areas present opportunities for the Library and campus partners to bolster research excellence by supporting strong RDM practices of Faculty, Students and Staff. Recommendations 1. The Library continues to collaborate with VPR’s Advanced Research Computing (ARC) unit, UBC Ethics, UBC IT Services, and other campus partners to plan and coordinate services for researchers around the management of research data. 2. UBC ensures that a robust infrastructure is available to researchers to store, preserve, and share their research data. 3. UBC implements a campus-wide service to support a Data Management Repository (or suite of repositories) which would include the Abacus Dataverse (currently operated by the Library). Conclusions A more detailed statistical analysis is underway, but initial results show that the majority of survey respondents indicated that they need assistance with storage and security of research data, with crafting data management plans, with a centralized research data repository, and with workshops about research data best practices for faculty and especially for graduate students. Further, understandings of the particular needs or habits within specific research disciplines will provide insights into how these researchers think about, and work with data and can also identify areas for future research and investigation. Finally, this survey has provided a fuller understanding of the RDM needs and perceived barriers and benefits which can now enable more targeted and nuanced conversations between librarians, researchers, and IT research support personnel. These results will assist the Library and other campus partners with the development of specific programs and infrastructure to bolster a strategic direction for RDM support.
Facebook
TwitterAttribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
This repository contains Python code and data used in the Museums in the Pandemic (MIP) project, including aggregated social media datasets and analysis results. The input data cannot be disseminated for copyright reasons.
Project description: Museums have an important role in our economy, education and cultural life. They add to the texture and richness of villages, towns and cities, and can help build and maintain communities. During the pandemic, their continuing existence has been under threat, and while many museums have benefitted from emergency funding or government schemes, their position remains precarious. In order to better support the UK museum sector, the museum services need to identify which types of museums are at risk of closure, which remain resilient, and which close on a permanent basis. Doing so presents a considerable challenge. Data collection is selective and tends not to cover unaccredited museums, it is dispersed across multiple platforms, there are no mechanisms for documenting closure, and establishing risk of closure entirely relies on individual organisations self-reporting. The Museums in the Pandemic project investigates how ‘big data techniques’ can inform research into the UK museum sector. It combines qualitative and quantitative research, and has three inter-related strands: Developing new ways to collect data on museums. We will use web analytics, natural language processing, and sentiment analysis to digitally track trends as they emerge. The data will be analysed with respect to museum characteristics – such as governance, location and size – to provide a nuanced understanding of the sector at a given moment. Manually checking and validating the information generated by big data collection. Using interview-based research to better understand what constitutes risk during a pandemic, the triggers for permanent closure, and how museums have and continue to remain resilient.
URL: https://www.bbk.ac.uk/research/projects/museums-in-the-pandemic
PI: Fiona Candlin (Birkbeck, UoL) Co-I: Andrea Ballatore (King's College London) Co-I: Alex Poulovassilis (Birkbeck, UoL) Co-I: Peter Wood (Birkbeck, UoL)
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
About the NUDA DatasetMedia bias is a multifaceted problem, leading to one-sided views and impacting decision-making. A way to address bias in news articles is to automatically detect and indicate it through machine-learning methods. However, such detection is limited due to the difficulty of obtaining reliable training data. To facilitate the data-gathering process, we introduce NewsUnravel, a news-reading web application leveraging an initially tested feedback mechanism to collect reader feedback on machine-generated bias highlights within news articles. Our approach augments dataset quality by significantly increasing inter-annotator agreement by 26.31% and improving classifier performance by 2.49%. As the first human-in-the-loop application for media bias, NewsUnravel shows that a user-centric approach to media bias data collection can return reliable data while being scalable and evaluated as easy to use. NewsUnravel demonstrates that feedback mechanisms are a promising strategy to reduce data collection expenses, fluidly adapt to changes in language, and enhance evaluators' diversity.
General
This dataset was created through user feedback on automatically generated bias highlights on news articles on the website NewsUnravel made by ANON. Its goal is to improve the detection of linguistic media bias for analysis and to indicate it to the public. Support came from ANON. None of the funders played any role in the dataset creation process or publication-related decisions.
The dataset consists of text, namely biased sentences with binary bias labels (processed, biased or not biased) as well as metadata about the article. It includes all feedback that was given. The single ratings (unprocessed) used to create the labels with correlating User IDs are included.
For training, this dataset was combined with the BABE dataset. All data is completely anonymous. Some sentences might be offensive or triggering as they were taken from biased or more extreme news sources. The dataset does not identify sub-populations or can be considered sensitive to them, nor is it possible to identify individuals.
Description of the Data Files
This repository contains the datasets for the anonymous NewsUnravel submission. The tables contain the following data:
NUDAdataset.csv: the NUDA dataset with 310 new sentences with bias labelsStatistics.png: contains all Umami statistics for NewsUnravel's usage dataFeedback.csv: holds the participantID of a single feedback with the sentence ID (contentId), the bias rating, and provided reasonsContent.csv: holds the participant ID of a rating with the sentence ID (contentId) of a rated sentence and the bias rating, and reason, if givenArticle.csv: holds the article ID, title, source, article metadata, article topic, and bias amount in %Participant.csv: holds the participant IDs and data processing consent
Collection Process
Data was collected through interactions with the Feedback Mechanism on NewsUnravel. A news article was displayed with automatically generated bias highlights. Each highlight could be selected, and readers were able to agree or disagree with the automatic label. Through a majority vote, labels were generated from those feedback interactions. Spammers were excluded through a spam detection approach.
Readers came to our website voluntarily through posts on LinkedIn and social media as well as posts on university boards. The data collection period lasted for one week, from March 4th to March 11th (2023). The landing page informed them about the goal and the data processing. After being informed, they could proceed to the article overview.
So far, the dataset has been used on top of BABE to train a linguistic bias classifier, adopting hyperparameter configurations from BABE with a pre-trained model from Hugging Face.The dataset will be open source. On acceptance, a link with all details and contact information will be provided. No third parties are involved.
The dataset will not be maintained as it captures the first test of NewsUnravel at a specific point in time. However, new datasets will arise from further iterations. Those will be linked in the repository. Please cite the NewsUnravel paper if you use the dataset and contact us if you're interested in more information or joining the project.
Facebook
TwitterDramatic increases in large-scale data generated through social media, combined with increased computational power, have enabled the growth of computational approaches to social media research, and social science in general. While many of these approaches require statistical or computational training, they have the great benefit of being inherently transparent—allowing for research that others can reproduce and learn from. To that end, we wrote a book chapter in the Sage Handbook of Social Media in which we obtain a large-scale dataset of metadata about social media research papers which we analyze using a few commonly-used computational methods. This repository provides the code, data, and documentation designed to tell you exactly how we did that and to walk you through how to reproduce our results and our paper by running the code we wrote. You can find the chapter here: Foote, Jeremy D., Aaron Shaw, and Benjamin Mako Hill. 2017. “A Computational Analysis of Social Media Scholarship.” In The SAGE Handbook of Social Media, edited by Jean Burgess, Alice Marwick, and Thomas Poell, 111–34. London, UK: SAGE. [Official Link] [Preprint PDF] Documentation on how to download and use these data are provided on the following website: https://communitydata.cc/social-media-chapter/ A copy of our documentation website can be found in the files README.md and README.html included in this repository.
Facebook
Twitterhttps://dataintelo.com/privacy-and-policyhttps://dataintelo.com/privacy-and-policy
According to our latest research, the global Marketing Data Warehouse market size reached USD 7.7 billion in 2024, reflecting robust adoption across multiple industries. The market is expanding at a CAGR of 13.2% and is forecasted to attain a value of USD 23.1 billion by 2033. This significant growth trajectory is primarily driven by the increasing demand for advanced analytics, seamless data integration, and the surging volume of customer data in digital marketing environments. As organizations continue to prioritize data-driven decision-making, the Marketing Data Warehouse market is witnessing a profound transformation, with innovative solutions enabling businesses to optimize marketing strategies and enhance customer experiences.
One of the most compelling growth factors for the Marketing Data Warehouse market is the exponential rise in digital marketing activities, which generates vast amounts of structured and unstructured data. Companies are increasingly recognizing the need to consolidate disparate marketing data silos into a centralized repository to extract actionable insights and drive targeted campaigns. The proliferation of omnichannel marketing strategies, which include social media, email, web, and mobile, has further accelerated the adoption of Marketing Data Warehouses. These platforms empower organizations to analyze customer journeys holistically, optimize campaign effectiveness, and personalize marketing initiatives, thereby boosting return on investment and competitive advantage.
Another substantial driver is the rapid advancement in artificial intelligence (AI) and machine learning (ML) technologies, which are being integrated into Marketing Data Warehouse platforms. These intelligent systems enable predictive analytics, real-time segmentation, and automated decision-making, revolutionizing how marketing data is leveraged. The growing emphasis on customer-centric approaches and hyper-personalization has made it imperative for companies to adopt sophisticated data warehousing solutions that can handle large-scale, complex datasets. Additionally, regulatory compliance requirements, such as GDPR and CCPA, are compelling organizations to implement robust data governance frameworks, further fueling the demand for secure and scalable Marketing Data Warehouses.
The surge in cloud adoption is also playing a pivotal role in shaping the Marketing Data Warehouse market landscape. Cloud-based solutions offer unparalleled scalability, flexibility, and cost-efficiency, allowing businesses of all sizes to deploy and scale data warehouses without significant upfront investments in infrastructure. The shift towards cloud-native architectures is enabling seamless integration with various marketing platforms, real-time data processing, and enhanced collaboration among distributed teams. Furthermore, the increasing availability of managed services and advanced analytics tools on the cloud is democratizing access to enterprise-grade data warehousing capabilities, driving widespread market adoption across industries and geographies.
From a regional perspective, North America continues to dominate the Marketing Data Warehouse market, accounting for the largest share in 2024, followed by Europe and Asia Pacific. The region’s leadership can be attributed to the presence of major technology vendors, early adoption of digital marketing practices, and significant investments in big data infrastructure. Meanwhile, Asia Pacific is emerging as the fastest-growing market, driven by rapid digital transformation, expanding e-commerce sectors, and increasing awareness of data-driven marketing strategies among enterprises. Latin America and the Middle East & Africa are also witnessing steady growth, supported by rising internet penetration and the adoption of cloud-based marketing solutions. The global market is expected to witness intensified competition and innovation as regional players strive to enhance their offerings and capture emerging opportunities.
The component segment of the Marketing Data Warehouse market is categorized into Software, Hardware, and Services, each playing a distinct role in the overall ecosystem. Software solutions form the backbone of data warehousing, encompassing platforms for data integration, analytics, reporting, and visualization. The demand for advanced software is being propelled by the need for real-time data processing, seamless integra
Facebook
Twitter
According to our latest research, the global Retail Media Data Onboarding market size reached USD 2.18 billion in 2024, reflecting robust adoption across the retail and advertising sectors. The market is projected to expand at a CAGR of 13.6% during the forecast period, reaching a value of USD 6.38 billion by 2033. This impressive growth is driven primarily by the escalating demand for omnichannel marketing strategies, increased focus on personalized customer experiences, and the growing importance of first-party data in a privacy-centric digital landscape.
One of the primary growth factors fueling the Retail Media Data Onboarding market is the rapid digital transformation of the retail industry. As retailers strive to bridge the gap between online and offline consumer touchpoints, data onboarding solutions have become essential for integrating disparate customer data sources. The proliferation of e-commerce platforms and the surge in digital advertising investments are compelling brands and retailers to leverage data onboarding to create unified customer profiles, enabling more precise audience targeting and measurement. Additionally, the shift towards cookieless advertising and stringent data privacy regulations have underscored the value of first-party data, further accelerating the adoption of data onboarding solutions among retailers and their partners.
Another significant driver is the heightened focus on customer personalization and experience optimization. Retailers and brands are increasingly utilizing data onboarding to enrich their understanding of customer behaviors, preferences, and purchase journeys. By connecting offline transaction data with digital identifiers, organizations can deliver highly relevant content, offers, and advertisements across channels. This not only improves marketing ROI but also enhances customer loyalty and engagement. The evolution of advanced analytics and artificial intelligence within onboarding platforms is enabling deeper insights and more granular segmentation, making personalization efforts more impactful and measurable.
The expanding ecosystem of retail media networks, particularly those operated by large retailers, is also contributing to market growth. These networks are leveraging data onboarding to monetize their audience data, offering advertisers the ability to reach shoppers both within and outside their owned properties. As retail media becomes a critical component of the advertising mix, partnerships between retailers, brands, agencies, and technology providers are intensifying. This collaborative approach is fueling innovation in onboarding technologies, driving the development of more scalable, secure, and privacy-compliant solutions tailored to the unique needs of the retail sector.
From a regional perspective, North America continues to dominate the Retail Media Data Onboarding market, accounting for the largest revenue share in 2024. This leadership is attributed to the mature digital advertising landscape, high adoption of advanced marketing technologies, and the presence of major retail and e-commerce players. Europe follows closely, with significant investments in data privacy and regulatory compliance driving the need for sophisticated onboarding solutions. Meanwhile, the Asia Pacific region is emerging as a high-growth market, propelled by rapid digitalization, expanding retail infrastructure, and a burgeoning middle-class consumer base. Latin America and the Middle East & Africa are also witnessing steady growth, albeit at a relatively nascent stage, as retailers in these regions increasingly recognize the benefits of integrated data strategies.
In this dynamic landscape, the role of a Reference Data Management Platform becomes increasingly crucial. As retailers and brands navigate the complexities of data onboarding, these platforms offer a structured approach to manage and integrate diverse data sources. By providing a centralized repository for reference data, these platforms ensure consistency and accuracy across all marketing channels. This capability is particularly valuable in the context of retail media, where the alignment of data from multiple sources is essential for effective audience targeting and personalization. The integration of Reference Data Management Platforms wi
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Social Media has been taking up everything on the Internet. People getting the latest news, useful resources, life partner and what not. In a world where Social media plays a big role in giving news, we must also know that news which affects our sentiments are going to get spread like a wildfire. Based on the Headline and the title, and according to the date given and the Social media platforms, you have to predict how it has affected the human sentiment scores. You have to predict the column “SentimentTitle” and “SentimentHeadline”.
This is a subset of the dataset of the same name available in the UCI Machine Learning Repository The collected data relates to a period of 8 months, between November 2015 and July 2016, accounting for about 100,000 news items on four different topics: economy, microsoft, obama and palestine.
The attributes for each of the dataset are : - IDLink (numeric): Unique identifier of news items - Title (string): Title of the news item according to the official media sources - Headline (string): Headline of the news item according to the official media sources - Source (string): Original news outlet that published the news item - Topic (string): Query topic used to obtain the items in the official media sources - Publish-Date (timestamp): Date and time of the news items' publication - Facebook (numeric): Final value of the news items' popularity according to the social media source Facebook - Google-Plus (numeric): Final value of the news items' popularity according to the social media source Google+ - LinkedIn (numeric): Final value of the news items' popularity according to the social media source LinkedIn - SentimentTitle: Sentiment score of the title, Higher the score, better is the impact or +ve sentiment and vice-versa. (Target Variable 1) - SentimentHeadline: Sentiment score of the text in the news items' headline. Higher the score, better is the impact or +ve sentiment. (Target Variable 2)
Facebook
TwitterIn response to the Coronavirus disease (COVID-19) outbreak and the Transportation Research Board’s (TRB) urgent need for work related to transportation and pandemics, this paper contributes with a sense of urgency and provides a starting point for research on the topic. The main goal of this paper is to support transportation researchers and the TRB community during this COVID-19 pandemic by reviewing the performance of software models used for extracting large-scale data from Twitter streams related to COVID-19. The study extends the previous research efforts in social media data mining by providing a review of contemporary tools, including their computing maturity and their potential usefulness. The paper also includes an open repository for the processed data frames to facilitate the quick development of new transportation research studies. The output of this work is recommended to be used by the TRB community when deciding to further investigate topics related to COVID-19 and social media data mining tools.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This record contains the underlying research data for the publication "The Valuation of User-Generated Content: A Structural, Stylistic and Semantic Analysis of Online Reviews" and the full-text is available from: https://ink.library.smu.edu.sg/etd_coll/78The ability and ease for users to create and publish content has provided vast amount of online product reviews. However, the amount of data is overwhelmingly large and unstructured, making information difficult to quantify. This creates challenge in understanding how online reviews affect consumers’ purchase decisions. In my dissertation, I explore the structural, stylistic and semantic content of online reviews. Firstly, I present a measurement that quantifies sentiments with respect to a multi-point scale and conduct a systematic study on the impact of online reviews on product sales. Using the sentiment metrics generated, I estimate the weight that customers place on each segment of the review and examine how these segments affect the sales for a given product. The results empirically verified that sentiments influence sales, of which ratings alone do not capture. Secondly, I propose a method to detect online review manipulation using writing style analysis and assess how consumers respond to such manipulation. Finally, I find that societal norms have influence on posting behavior and significant differences do exist across cultures. Users should therefore exercise care in interpreting the information from online reviews. This dissertation advances our understanding on the consumer decision making process and shed insight on the relevance of online review ratings and sentiments over a sequential decision making process. Having tapped into the abundant supply of online review data, the results in this work are based on large-scale datasets which extend beyond the scale of traditional word-of-mouth research.
Facebook
Twitterhttps://dataintelo.com/privacy-and-policyhttps://dataintelo.com/privacy-and-policy
According to our latest research, the global File Search Software market size reached USD 2.88 billion in 2024, with a robust compound annual growth rate (CAGR) of 13.2% projected from 2025 to 2033. This growth trajectory is expected to drive the market to a forecasted value of USD 8.85 billion by 2033. The market’s expansion is primarily attributed to the escalating volumes of digital data, the increasing adoption of cloud technologies, and the growing necessity for efficient file management solutions across diverse industries. As enterprises and individuals continue to generate and store massive quantities of digital files, the demand for advanced file search software that can deliver rapid, accurate, and secure search functionalities is surging globally.
A primary growth factor for the file search software market is the exponential increase in unstructured data generated by organizations. With the proliferation of digital transformation initiatives, businesses are accumulating vast repositories of documents, emails, multimedia files, and other data types. Efficiently locating and retrieving this information has become a critical operational challenge. File search software addresses this by offering advanced indexing, semantic search, and AI-powered search capabilities, enabling users to quickly find relevant files and documents. This efficiency not only saves time but also enhances productivity, compliance, and decision-making processes, making such solutions indispensable in modern enterprises.
Another significant driver for the file search software market is the rapid shift toward cloud-based infrastructure. As organizations migrate their data and workflows to cloud environments, the need for scalable and secure file search solutions has intensified. Cloud-based file search software offers seamless integration with various cloud storage platforms, ensuring that users can access and search files across distributed environments. Moreover, the scalability and collaborative features of cloud solutions align well with the needs of remote and hybrid workforces, further fueling market growth. The flexibility to deploy search tools both on-premises and in the cloud provides organizations with the agility to meet evolving business requirements.
The adoption of file search software is also being propelled by stringent regulatory requirements and the growing emphasis on data security and governance. Industries such as healthcare, BFSI, and government are subject to strict data management and privacy regulations. File search software equipped with robust encryption, access controls, and audit trails helps organizations maintain compliance and protect sensitive information. Additionally, the integration of artificial intelligence and machine learning in modern file search solutions enhances their ability to understand context, deliver personalized results, and identify potential security risks. This combination of compliance, security, and intelligence is driving widespread adoption across regulated sectors.
From a regional perspective, North America continues to dominate the file search software market, accounting for the largest revenue share in 2024. This leadership position is underpinned by the early adoption of digital technologies, a mature IT infrastructure, and the presence of leading software vendors in the United States and Canada. Europe follows closely, with rapid adoption across industries such as BFSI, healthcare, and education. The Asia Pacific region is witnessing the fastest growth, driven by digitalization initiatives, expanding enterprise sectors, and increasing investments in IT infrastructure across countries like China, India, and Japan. Latin America and the Middle East & Africa are also emerging as promising markets, supported by growing awareness and gradual digital transformation efforts.
The file search software market by component is primarily segmented into software and services. The software segment encompasses standalone file search applications, integrated enterprise search platforms, and solutions embedded within larger content management systems. This segment is experiencing rapid innovation, with vendors introducing AI-driven features such as natural language processing, contextual search, and predictive analytics. These advancements enable users to conduct more intuitive and accurate searches, even as data volumes con
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
These files are used to replicate all analyses in Media in a Time of Crisis: Newspaper Coverage of Covid-19 in East Asia, available at https://ink.library.smu.edu.sg/soss_research/3348/.
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This repository includes sample data, etc. for the paper "Embedding Regression for Comparative Analysis: Tracing Far Right Ideology Across Multimedia Ecosystems" to be published in Political Communication
Facebook
TwitterAttribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
This is the Dataset for Ms ZHENG Yueyuan's MPhil thesis project ‘The Role of Eye Movements in Multimodal Information Processing: The Case of Documentary Comprehension and Emotion Recognition’. Study 1 has been published in the Proceedings of the 41st Annual Conference of the Cognitive Science Society and study 2 has been published in the Proceedings of the 42nd Annual Conference of the Cognitive Science Society. Both datasets has been made publicly available on Open Science Framework.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
About the Dataset
Media bias is a multifaceted problem, leading to one-sided views and impacting decision-making. A way to address bias in news articles is to automatically detect and indicate it through machine-learning methods. However, such detection is limited due to the difficulty of obtaining reliable training data. To facilitate the data-gathering process, we introduce NewsUnravel, a news-reading web application leveraging an initially tested feedback mechanism to collect reader feedback on machine-generated bias highlights within news articles. Our approach augments dataset quality by significantly increasing inter-annotator agreement by 26.31% and improving classifier performance by 2.49%. As the first human-in-the-loop application for media bias, NewsUnravel shows that a user-centric approach to media bias data collection can return reliable data while being scalable and evaluated as easy to use. NewsUnravel demonstrates that feedback mechanisms are a promising strategy to reduce data collection expenses, fluidly adapt to changes in language, and enhance evaluators' diversity.
Description of the data files
This repository contains the datasets for the anonymous NewsUnravel submission. The tables contain following data:
NUDAdataset.csv: the NUDA dataset with 310 new sentences with bias labels
Statistics.png: contains all Umami statistics for NewsUnravel's usage data
Feedback.csv: holds the participantID of a single feedback with the sentence ID (contentId), the bias rating, and provided reasons
Content.csv: holds the participant ID of a rating with the sentence ID (contentId) of a rated sentences and the bias rating, and reason, if given
Article.csv: holds the article ID, title, source, article meta data, article topic, and bias amount in %
Participant.csv: holds the participant IDs and data processing consent