Private contractor survey results - anonymous. 2017-18
Non-anonymized subset of the databases used in the paper "Traveling the Silk Road: A measurement analysis of a large anonymous online marketplace" (Christin, 2013). In this dataset, textual information (item name, description, or feedback text) and handles have not been anonymized and are thus available. We don't expect any private identifiers or other PII to be present in the data, which was collected from a publicly available website -- the Silk Road anonymous marketplace -- for a few months in 2012.
For less restricted usage terms, please consider the anonymized version, which is also available without any restrictions. This non-anonymized dataset should only be requested if your project MUST rely on full textual descriptions of items and/or feedback.
Christin (2013) Traveling the Silk Road: A measurement analysis of a large anonymous online marketplace. To appear in Proceedings of the 22nd International World Wide Web Conference (WWW'13). Rio de Janeiro, Brazil. May 2013.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Data set for "Handling Environmental Uncertainty in Design Time Access Control Analysis".
Anonymous-data-model/Anonymous-under-review dataset hosted on Hugging Face and contributed by the HF Datasets community
DRAKO specializes in delivering Anonymous IP Data, focusing on privacy-first approaches to consumer identity and behavior analysis. Our data allows businesses to track user interactions without compromising individual privacy, ensuring compliance with data protection regulations.
Anonymous IP Data is crucial for effective audience targeting, analyzing traffic sources, and measuring campaign performance. By connecting digital audiences through various IPs, we enable a clearer understanding of user journeys across devices and platforms. Beyond IPs, we’re also able to connect these IDs to broader ID types like Mobile Advertising IDs and CTV Ids.
Key Features: - IPV4 and IPV6 in hashed format - Detailed mapping of Anonymous IPs for secure user behavior analysis - Integration with Mobile IP Data for insights into mobile user interactions - Comprehensive Identity Data for enhanced audience profiling - Digital Audience Data to understand demographics and interests - Identity Linkage Data for connecting user profiles across different channels
Use Cases: - Audience segmentation and targeting strategies - Traffic source analysis and optimization - Digital campaign performance measurement - User journey mapping across devices - Compliance-focused marketing solutions
Data Compliance: Our Anonymous IP Data is fully compliant with industry standards for data privacy and security. We prioritize ethical data collection practices, ensuring that user identities remain anonymous while still providing valuable insights.
Data Quality: DRAKO employs rigorous quality assurance protocols to maintain the accuracy and reliability of our Anonymous IP Data. We continuously update our datasets and utilize advanced validation techniques to ensure data integrity.
https://dataintelo.com/privacy-and-policyhttps://dataintelo.com/privacy-and-policy
The global market size for anonymous social networking software was valued at USD 1.3 billion in 2023 and is projected to reach USD 2.7 billion by 2032, growing at a CAGR of 8.4% during the forecast period. This remarkable growth can be attributed to the increasing emphasis on privacy and data security among internet users, as well as the rising demand for platforms that allow for free expression without the fear of identity exposure.
One of the primary growth factors for the anonymous social networking software market is the increasing awareness and concern about data privacy. In an age where data breaches and misuse of personal information have become frequent, users are gravitating towards platforms that offer anonymity. This ensures their personal data is not exposed to unwarranted surveillance or commercial exploitation. Furthermore, the rise in internet penetration, particularly in emerging economies, has broadened the user base for these platforms, driving the demand for anonymous social networking solutions globally.
Another significant growth factor is the increasing demand for platforms that facilitate open and honest communication. Traditional social media platforms often subject users to social judgments and biases, which can inhibit open communication. Anonymous social networking software creates a safe space where users can share their thoughts and experiences without revealing their identities. This has proven particularly beneficial for individuals seeking mental health support, discussing sensitive topics, or simply wishing to express opinions freely.
Technological advancements and the advent of artificial intelligence have also bolstered the growth of the anonymous social networking software market. Innovative features such as AI-driven content moderation, real-time language translation, and enhanced user interfaces have improved the overall user experience, making these platforms more appealing. Additionally, the integration of blockchain technology to ensure data security and transparency is expected to further drive market growth.
From a regional perspective, North America dominated the anonymous social networking software market in 2023, accounting for a substantial share of the market. This dominance is due to the high rate of technology adoption, advanced internet infrastructure, and a growing number of tech-savvy users who value privacy. Europe also holds a significant share, driven by stringent data protection regulations such as the GDPR. Meanwhile, the Asia Pacific region is expected to witness the highest growth rate during the forecast period, fueled by increasing internet penetration, growing smartphone adoption, and rising awareness about data privacy among users.
The platform segment of the anonymous social networking software market can be broadly categorized into mobile and web-based platforms. Mobile platforms have gained significant traction over the past few years, largely due to the proliferation of smartphones and mobile internet services. Users prefer mobile applications because they offer the convenience of accessing the platform anytime and anywhere, thereby enhancing user engagement. Moreover, mobile platforms have the advantage of leveraging various smartphone features such as location services, camera, and push notifications to provide a richer user experience.
Web-based platforms, on the other hand, cater to users who prefer accessing social networking services through desktops or laptops. These platforms often provide a more comprehensive user interface and are capable of handling more complex functionalities compared to mobile apps. Web-based platforms are particularly popular among professional users who require robust features to manage large communities or engage in detailed discussions. Additionally, web platforms often offer better data storage and retrieval capabilities, making them suitable for enterprise-level applications.
The choice between mobile and web-based platforms often depends on the specific needs and preferences of the user. For instance, younger users and individuals are more inclined towards mobile platforms due to their ease of use and accessibility. In contrast, enterprises and professional users may prefer web-based platforms for their enhanced functionalities and better data management capabilities. Both platform types are expected to witness substantial growth during the forecast period, driven by continuous technological advancements and increasing user demand for anonymity.<
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Our paper complements previous findings of Ederer, Goldsmith-Pinkham and Jensen (2024) by analyzing EJMR’s evolving interactions with external information sources. We focus on three key aspects: (1) the prevalence and impact of links to external domains; (2) the surge in discussions driven by Twitter posts since 2018; and (3) the categorization of individuals whose tweets and content are discussed on EJMR.
anonymous-ml123/nips2025-data dataset hosted on Hugging Face and contributed by the HF Datasets community
We evaluate an experimental program in which the French public employment service anonymized résumés for firms that were hiring. Firms were free to participate or not; participating firms were then randomly assigned to receive either anonymous résumés or name-bearing ones. We find that participating firms become less likely to interview and hire minority candidates when receiving anonymous résumés. We show how these unexpected results can be explained by the self-selection of firms into the program and by the fact that anonymization prevents the attenuation of negative signals when the candidate belongs to a minority. (JEL J15, J68, J71)
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
The idea is that others can run simple regression and classification models on this dataset.
This dataset contains anonymous data from a call center, and the metrics obtained regarding customer service agents.
"Anonymized database pertaining to the AlphaBay marketplace. This data was used in the papers ""Plug and Prey? Measuring the Commoditization of Cybercrime via Online Anonymous Markets"" (Van Wegberg et al., 2018), ""An Empirical Analysis of Traceability in the Monero Blockchain"" (Moeser et al., 2018) and in the joint EMCDDA/EUROPOL report ""Drugs and thedarknet: Perspectives for enforcement, researchand policy"" (EMCDDA, 2017). In this dataset, we chose not to make available any textual information (item name, description, or feedback text). We also anonymized all handles (user id, item id). This represents more than two and a half years of parsed data from what was arguably the largest online anonymous marketplace ever.
EMCDDA (2017) Drugs and thedarknet: Perspectives for enforcement, researchand policy. November 2017.
Van Wegberg et al.. Plug and Prey? Measuring the Commoditization of Cybercrime via Online Anonymous Markets. To appear in Proceedings of the 27th USENIX Security Symposium (USENIX Security'18). Baltimore, MD. August 2018.
Moeser et al. An Empirical Analysis of Traceability in the Monero Blockchain. To appear in Proceedings of the Privacy Enhancing Technology Symposium (PETS 2018), volume 3. Barcelona, Spain. July 2018."
Anonymised LPIS data for 2023. The following attributes are available 1) LPIS Data: Applicant Herd, Herd Number, Parcel Label, Claimed Area, Crop, Digitised Area, Eligible Hectare, Commonage Denominator, Commonage Numerator, Subdivision, Commonage Indicator, Owner/Leased/Rented, Grassland, Tillage, Permanent, Arable, Straw Incorporation Measure Indicator, Basic Income Support for Sustainability, Eco-Schemes, Complementary Redistributive Income Support for Sustainability, Protein Aid, Complementary income support for young farmers, Areas of Natural Constraints, ACRES, Organic, Straw Incorporation Measure, Manual Deduction Area, Fixed Area Deduction Area, Fixed Area Deduction Description, Manual Deduction Description, Date of Extract 2) LPIS Sub Features Data: Parcel Label, Feature Label, Feature Description, Percentage, Gross Area.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Psychometric and genotypic data obtained for AN and BN patients
Anonymized data set for peer review based in the information requested by the journal. The data set includes 2 files: a Stata data file and a Stata do file.
https://www.datainsightsmarket.com/privacy-policyhttps://www.datainsightsmarket.com/privacy-policy
The anonymous social networking software market is experiencing robust growth, driven by increasing demand for privacy and safety online. While precise market sizing data is unavailable, considering the substantial presence of companies like Tencent (with its various social media platforms incorporating privacy features) and Momo (known for its dating apps with anonymity options), we can reasonably estimate the 2025 market size at approximately $2 billion USD. The compound annual growth rate (CAGR) for this sector is projected to be around 15% from 2025 to 2033, reflecting the ongoing adoption of these platforms by users seeking alternatives to traditional, publicly identifiable social networks. Key drivers include concerns over data privacy breaches, cyberbullying, and the desire for more authentic online interactions without the pressures of public scrutiny. Emerging trends such as advanced encryption technologies, decentralized platforms leveraging blockchain, and the integration of AI-powered moderation tools are further shaping market dynamics. However, regulatory challenges surrounding user anonymity and the potential for misuse of these platforms (e.g., illegal activities) represent significant restraints on market expansion. Segmentation analysis would likely reveal diverse user demographics and platform specializations (e.g., dating, gaming, professional networking). The projected growth signifies a significant opportunity for innovative companies. Existing players like Tencent, Momo, Tantan, and others are continuously investing in improving user experience and security features to maintain their competitive edge. The market’s future will depend on successfully balancing user privacy demands with regulatory compliance and combating potential risks associated with anonymity. Future growth will likely be concentrated in regions with high smartphone penetration and a young, tech-savvy population. The forecast period (2025-2033) suggests a substantial expansion of the market, assuming continued technological advancements and evolving user preferences. Further research into specific regional data is crucial for a more granular understanding of market segmentation and penetration rates.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
About the NUDA Dataset
Media bias is a multifaceted problem, leading to one-sided views and impacting decision-making. A way to address bias in news articles is to automatically detect and indicate it through machine-learning methods. However, such detection is limited due to the difficulty of obtaining reliable training data. To facilitate the data-gathering process, we introduce NewsUnravel, a news-reading web application leveraging an initially tested feedback mechanism to collect reader feedback on machine-generated bias highlights within news articles. Our approach augments dataset quality by significantly increasing inter-annotator agreement by 26.31% and improving classifier performance by 2.49%. As the first human-in-the-loop application for media bias, NewsUnravel shows that a user-centric approach to media bias data collection can return reliable data while being scalable and evaluated as easy to use. NewsUnravel demonstrates that feedback mechanisms are a promising strategy to reduce data collection expenses, fluidly adapt to changes in language, and enhance evaluators' diversity.
General
This dataset was created through user feedback on automatically generated bias highlights on news articles on the website NewsUnravel made by ANON. Its goal is to improve the detection of linguistic media bias for analysis and to indicate it to the public. Support came from ANON. None of the funders played any role in the dataset creation process or publication-related decisions.
The dataset consists of text, namely biased sentences with binary bias labels (processed, biased or not biased) as well as metadata about the article. It includes all feedback that was given. The single ratings (unprocessed) used to create the labels with correlating User IDs are included.
For training, this dataset was combined with the BABE dataset. All data is completely anonymous. Some sentences might be offensive or triggering as they were taken from biased or more extreme news sources. The dataset does not identify sub-populations or can be considered sensitive to them, nor is it possible to identify individuals.
Description of the Data Files
This repository contains the datasets for the anonymous NewsUnravel submission. The tables contain the following data:
NUDAdataset.csv: the NUDA dataset with 310 new sentences with bias labels
Statistics.png: contains all Umami statistics for NewsUnravel's usage data
Feedback.csv: holds the participantID of a single feedback with the sentence ID (contentId), the bias rating, and provided reasons
Content.csv: holds the participant ID of a rating with the sentence ID (contentId) of a rated sentence and the bias rating, and reason, if given
Article.csv: holds the article ID, title, source, article metadata, article topic, and bias amount in %
Participant.csv: holds the participant IDs and data processing consent
Collection Process
Data was collected through interactions with the Feedback Mechanism on NewsUnravel. A news article was displayed with automatically generated bias highlights. Each highlight could be selected, and readers were able to agree or disagree with the automatic label. Through a majority vote, labels were generated from those feedback interactions. Spammers were excluded through a spam detection approach.
Readers came to our website voluntarily through posts on LinkedIn and social media as well as posts on university boards. The data collection period lasted for one week, from March 4th to March 11th (2023). The landing page informed them about the goal and the data processing. After being informed, they could proceed to the article overview.
So far, the dataset has been used on top of BABE to train a linguistic bias classifier, adopting hyperparameter configurations from BABE with a pre-trained model from Hugging Face.
The dataset will be open source. On acceptance, a link with all details and contact information will be provided. No third parties are involved.
The dataset will not be maintained as it captures the first test of NewsUnravel at a specific point in time. However, new datasets will arise from further iterations. Those will be linked in the repository. Please cite the NewsUnravel paper if you use the dataset and contact us if you're interested in more information or joining the project.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Recently big data and its applications had sharp growth in various fields such as IoT, bioinformatics, eCommerce, and social media. The huge volume of data incurred enormous challenges to the architecture, infrastructure, and computing capacity of IT systems. Therefore, the compelling need of the scientific and industrial community is large-scale and robust computing systems. Since one of the characteristics of big data is value, data should be published for analysts to extract useful patterns from them. However, data publishing may lead to the disclosure of individuals’ private information. Among the modern parallel computing platforms, Apache Spark is a fast and in-memory computing framework for large-scale data processing that provides high scalability by introducing the resilient distributed dataset (RDDs). In terms of performance, Due to in-memory computations, it is 100 times faster than Hadoop. Therefore, Apache Spark is one of the essential frameworks to implement distributed methods for privacy-preserving in big data publishing (PPBDP). This paper uses the RDD programming of Apache Spark to propose an efficient parallel implementation of a new computing model for big data anonymization. This computing model has three-phase of in-memory computations to address the runtime, scalability, and performance of large-scale data anonymization. The model supports partition-based data clustering algorithms to preserve the λ-diversity privacy model by using transformation and actions on RDDs. Therefore, the authors have investigated Spark-based implementation for preserving the λ-diversity privacy model by two designed City block and Pearson distance functions. The results of the paper provide a comprehensive guideline allowing the researchers to apply Apache Spark in their own researches.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Trained models and data sets for anonymous submission
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This is the data obtained by three participants using eye movement tracker, when they were wayfinding with paper maps, digital maps or voice-assisted digital maps.
1) LPIS Data: Applicant Herd, Herd Number, Parcel Label, Claimed Area, Crop, Digitised Area, Eligible Hectare, Commonage Denominator, Commonage Numerator, Subdivision, Commonage Indicator, Owner/Leased/Rented, Grassland, Tillage, Permanent, Arable, Straw Incorporation Measure Indicator, Basic Income Support for Sustainability, Eco-Schemes, Complementary Redistributive Income Support for Sustainability, Protein Aid, Complementary income support for young farmers, Areas of Natural Constraints, ACRES, Organic, Straw Incorporation Measure, Manual Deduction Area, Fixed Area Deduction Area, Fixed Area Deduction Description, Manual Deduction Description, Date of Extract 2) LPIS Sub Features Data: Parcel Label, Feature Label, Feature Description, Percentage, Gross Area. 3) Partnership Data: Herd Number, Parcel Label
Private contractor survey results - anonymous. 2017-18