Facebook
TwitterIn situations where data is not readily available but needed, you'll have to resort to building up the data yourself. There are many methods you can use to acquire this data from web scraping to APIs. But sometimes, you'll end up needing to create fake or “dummy” data. Dummy data can be useful in times where you know the exact features you’ll be using and the data types included but, you just don’t have the data itself.
Features Description
Reference - https://towardsdatascience.com/build-a-your-own-custom-dataset-using-python-9296540a0178
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description:
The "Daily Social Media Active Users" dataset provides a comprehensive and dynamic look into the digital presence and activity of global users across major social media platforms. The data was generated to simulate real-world usage patterns for 13 popular platforms, including Facebook, YouTube, WhatsApp, Instagram, WeChat, TikTok, Telegram, Snapchat, X (formerly Twitter), Pinterest, Reddit, Threads, LinkedIn, and Quora. This dataset contains 10,000 rows and includes several key fields that offer insights into user demographics, engagement, and usage habits.
Dataset Breakdown:
Platform: The name of the social media platform where the user activity is tracked. It includes globally recognized platforms, such as Facebook, YouTube, and TikTok, that are known for their large, active user bases.
Owner: The company or entity that owns and operates the platform. Examples include Meta for Facebook, Instagram, and WhatsApp, Google for YouTube, and ByteDance for TikTok.
Primary Usage: This category identifies the primary function of each platform. Social media platforms differ in their primary usage, whether it's for social networking, messaging, multimedia sharing, professional networking, or more.
Country: The geographical region where the user is located. The dataset simulates global coverage, showcasing users from diverse locations and regions. It helps in understanding how user behavior varies across different countries.
Daily Time Spent (min): This field tracks how much time a user spends on a given platform on a daily basis, expressed in minutes. Time spent data is critical for understanding user engagement levels and the popularity of specific platforms.
Verified Account: Indicates whether the user has a verified account. This feature mimics real-world patterns where verified users (often public figures, businesses, or influencers) have enhanced status on social media platforms.
Date Joined: The date when the user registered or started using the platform. This data simulates user account history and can provide insights into user retention trends or platform growth over time.
Context and Use Cases:
Researchers, data scientists, and developers can use this dataset to:
Model User Behavior: By analyzing patterns in daily time spent, verified status, and country of origin, users can model and predict social media engagement behavior.
Test Analytics Tools: Social media monitoring and analytics platforms can use this dataset to simulate user activity and optimize their tools for engagement tracking, reporting, and visualization.
Train Machine Learning Algorithms: The dataset can be used to train models for various tasks like user segmentation, recommendation systems, or churn prediction based on engagement metrics.
Create Dashboards: This dataset can serve as the foundation for creating user-friendly dashboards that visualize user trends, platform comparisons, and engagement patterns across the globe.
Conduct Market Research: Business intelligence teams can use the data to understand how various demographics use social media, offering valuable insights into the most engaged regions, platform preferences, and usage behaviors.
Sources of Inspiration: This dataset is inspired by public data from industry reports, such as those from Statista, DataReportal, and other market research platforms. These sources provide insights into the global user base and usage statistics of popular social media platforms. The synthetic nature of this dataset allows for the use of realistic engagement metrics without violating any privacy concerns, making it an ideal tool for educational, analytical, and research purposes.
The structure and design of the dataset are based on real-world usage patterns and aim to represent a variety of users from different backgrounds, countries, and activity levels. This diversity makes it an ideal candidate for testing data-driven solutions and exploring social media trends.
Future Considerations:
As the social media landscape continues to evolve, this dataset can be updated or extended to include new platforms, engagement metrics, or user behaviors. Future iterations may incorporate features like post frequency, follower counts, engagement rates (likes, comments, shares), or even sentiment analysis from user-generated content.
By leveraging this dataset, analysts and data scientists can create better, more effective strategies ...
Facebook
TwitterOpen Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
Internet use in the UK annual estimates by age, sex, disability, ethnic group, economic activity and geographical location, including confidence intervals.
Facebook
TwitterAs of the early of 2020, around ** percent of surveyed respondents in China were awared that many online shopping and e-commerce mobile apps overused user permissions. Social media and messenger apps were the second app category with a low user trust in data security.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset contains the relation between users, instruments, elapsed time, result and status_code.
Facebook
Twitterhttp://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/
This dataset consolidates data from multiple sources to provide a comprehensive view of security anomalies, insider threats, system updates, and user management. It includes information such as user behavior patterns, anomaly detection metrics, system update details, and user contact information. Designed for multi-dimensional analysis, the dataset is ideal for tasks like anomaly detection, insider threat assessment, system update tracking, and user data management in cybersecurity applications. Each record is enriched with timestamps and other relevant attributes to enable dynamic analysis and decision-making.
Facebook
TwitterAs of January 2025, around 13.7 percent of paid iOS apps admitted collecting data from users engaging with their mobile products. In comparison, approximately 53 percent of free-to-download iOS apps reported they collect private data from users worldwide, while approximately 86 percent of paid apps have not declared whether they collect users' privacy data.
Facebook
TwitterSuccess.ai’s User Profiles Data for Nonprofit and NGO Leaders provides businesses, organizations, and researchers with comprehensive access to global leaders in the nonprofit and NGO sectors. With data sourced from over 700 million verified LinkedIn profiles, this dataset includes actionable insights and contact details for executives, program managers, administrators, and decision-makers. Whether your goal is to partner with nonprofits, support global causes, or conduct research into social impact, Success.ai ensures your outreach is backed by accurate, enriched, and continuously updated data.
Why Choose Success.ai’s User Profiles Data for Nonprofit and NGO Leaders? Comprehensive Professional Profiles
Access verified LinkedIn profiles of nonprofit leaders, NGO managers, program directors, grant writers, and administrative executives. AI-driven validation ensures 99% accuracy for efficient communication and minimized bounce rates. Global Coverage Across Nonprofit Sectors
Includes profiles from nonprofits, humanitarian organizations, environmental groups, social enterprises, and advocacy organizations. Covers key markets across North America, Europe, APAC, South America, and Africa for global reach. Continuously Updated Dataset
Reflects real-time professional updates, organizational changes, and emerging trends in the nonprofit landscape to keep your targeting relevant and effective. Tailored for Nonprofit Insights
Enriched profiles include work histories, organizational affiliations, areas of expertise, and social impact projects for deeper engagement opportunities. Data Highlights: 700M+ Verified LinkedIn Profiles: Access a vast network of nonprofit and NGO professionals worldwide. 100M+ Work Emails: Direct communication with executives, managers, and decision-makers in the nonprofit sector. Enriched Organizational Data: Gain insights into leadership structures, mission focuses, and operational scales. Industry-Specific Segmentation: Target nonprofits focused on healthcare, education, environmental sustainability, human rights, and more. Key Features of the Dataset: Nonprofit and NGO Leader Profiles
Identify and connect with executives, program managers, fundraisers, and policy directors in global nonprofit and NGO sectors. Engage with individuals who drive decision-making and operational strategies for impactful organizations. Detailed Organizational Insights
Leverage firmographic data, including organizational size, mission, regional activity, and funding sources, to align with specific nonprofit goals. Advanced Filters for Precision Targeting
Refine searches by region, mission type, role, or organizational focus for tailored outreach. Customize campaigns based on social impact priorities, such as climate action, gender equality, or economic development. AI-Driven Enrichment
Enhanced datasets provide actionable insights into professional accomplishments, partnerships, and leadership achievements for targeted engagement. Strategic Use Cases: Partnership Development and Outreach
Identify nonprofits and NGOs for collaboration on social impact projects, sponsorships, or grant distribution. Build relationships with decision-makers driving advocacy, fundraising, and community initiatives. Donor Engagement and Fundraising
Target nonprofit leaders responsible for managing fundraising campaigns and donor relationships. Tailor outreach efforts to align with specific causes and funding priorities. Research and Analysis
Analyze leadership trends, mission focuses, and regional nonprofit activities to inform program design and funding strategies. Use insights to evaluate the effectiveness of social impact initiatives and partnerships. Recruitment and Talent Acquisition
Target HR professionals and administrators seeking qualified staff, consultants, or volunteers for nonprofits and NGOs. Offer talent solutions for specialized roles in program management, advocacy, and administration. Why Choose Success.ai? Best Price Guarantee
Access industry-leading, verified User Profiles Data at unmatched pricing to ensure your campaigns are cost-effective and impactful. Seamless Integration
Easily integrate verified nonprofit data into your CRM or marketing platforms with APIs or downloadable formats. AI-Validated Accuracy
Rely on 99% accuracy to minimize wasted outreach efforts and maximize engagement outcomes. Customizable Solutions
Tailor datasets to focus on specific nonprofit types, geographical regions, or areas of social impact to meet your strategic objectives. Strategic APIs for Enhanced Campaigns: Data Enrichment API
Update your internal records with verified nonprofit leader profiles to enhance targeting and engagement. Lead Generation API
Automate lead generation for a consistent pipeline of nonprofit and NGO professionals, scaling your outreach efforts efficiently. Success.ai’s User Profiles Data for Nonprofit and NGO Leader...
Facebook
TwitterThis page pulls together resources for various types of data.wa.gov users, including developers, publishers and data users.
Facebook
TwitterData-driven models help mobile app designers understand best practices and trends, and can be used to make predictions about design performance and support the creation of adaptive UIs. This paper presents Rico, the largest repository of mobile app designs to date, created to support five classes of data-driven applications: design search, UI layout generation, UI code generation, user interaction modeling, and user perception prediction. To create Rico, we built a system that combines crowdsourcing and automation to scalably mine design and interaction data from Android apps at runtime. The Rico dataset contains design data from more than 9.3k Android apps spanning 27 categories. It exposes visual, textual, structural, and interactive design properties of more than 66k unique UI screens. To demonstrate the kinds of applications that Rico enables, we present results from training an autoencoder for UI layout similarity, which supports query-by-example search over UIs.
Rico was built by mining Android apps at runtime via human-powered and programmatic exploration. Like its predecessor ERICA, Rico’s app mining infrastructure requires no access to — or modification of — an app’s source code. Apps are downloaded from the Google Play Store and served to crowd workers through a web interface. When crowd workers use an app, the system records a user interaction trace that captures the UIs visited and the interactions performed on them. Then, an automated agent replays the trace to warm up a new copy of the app and continues the exploration programmatically, leveraging a content-agnostic similarity heuristic to efficiently discover new UI states. By combining crowdsourcing and automation, Rico can achieve higher coverage over an app’s UI states than either crawling strategy alone. In total, 13 workers recruited on UpWork spent 2,450 hours using apps on the platform over five months, producing 10,811 user interaction traces. After collecting a user trace for an app, we ran the automated crawler on the app for one hour.
UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN https://interactionmining.org/rico
The Rico dataset is large enough to support deep learning applications. We trained an autoencoder to learn an embedding for UI layouts, and used it to annotate each UI with a 64-dimensional vector representation encoding visual layout. This vector representation can be used to compute structurally — and often semantically — similar UIs, supporting example-based search over the dataset. To create training inputs for the autoencoder that embed layout information, we constructed a new image for each UI capturing the bounding box regions of all leaf elements in its view hierarchy, differentiating between text and non-text elements. Rico’s view hierarchies obviate the need for noisy image processing or OCR techniques to create these inputs.
Facebook
TwitterAs of March 2021, YouTube was the video and streaming app found to collect the largest amount of data from global iOS users. The app collected a total of ** data points from each of the examined data types, respectively. The mobile app of video streaming service Amazon Prime Video followed, with ** data points collected across all the examined data types.
Facebook
TwitterAs of 2024, the average data consumption per user per month in India was at **** gigabytes. 5G data traffic contributes to ***percent of the overall data traffic. It was launched in India in October 2022. Increased online education, remote working for professionals, and higher OTT viewership contributed to the data traffic growth.
Facebook
TwitterJohnyquest7/OctoTools-Gradio-Demo-User-Data dataset hosted on Hugging Face and contributed by the HF Datasets community
Facebook
TwitterAttribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
License information was derived automatically
The dataset is an archive of reader comments from the Delfi news site from 2014-2019, containing approximately 12M comments, mostly in the Latvian language, with some in Russian.
Description of the Datasets
There are 6 CSV files:
* lv-comments-2014.csv contains 2 753 655 comments from year 2014
* lv-comments-2015.csv contains 2 221 122 comments from year 2015
* lv-comments-2016.csv contains 1 897 669 comments from year 2016
* lv-comments-2017.csv contains 1 896 083 comments from year 2017
* lv-comments-2018.csv contains 2 222 051 comments from year 2018
* lv-comments-2019.csv contains 1 421 883 comments from year 2019
In sum: 12 412 463 comments
Columns:
* comment_id (string) - the ID of the written comment
* article_id (string) - the ID of the article for which the comment was written
* created_time (string) - the time and date of the comment
* subject (string) - the title of the comment
* reply_to_comment_id (string) - the parent comments ID
* content (string) - the comment itself
* is_anonymous (string) -
* 1 if the comment was published anonymously
* 0 if the comment was published by a registered user
* is_enabled (string) -
* 1 if the comment was published (online)
* 0 if it wasn’t published
* Questionable field: not all have been manually moderated
* No additional information from the moderators
* channel_language (string) - the language of the channel
* 'nat' for Latvian
* 'rus' for Russian
* create_user_id (string) - the user ID of the commentator
* modereted_by (string) - the ID of the moderator
Facebook
TwitterOpen Database License (ODbL) v1.0https://www.opendatacommons.org/licenses/odbl/1.0/
License information was derived automatically
This dataset was originally collected for a data science and machine learning project that aimed at investigating the potential correlation between the amount of time an individual spends on social media and the impact it has on their mental health.
The project involves conducting a survey to collect data, organizing the data, and using machine learning techniques to create a predictive model that can determine whether a person should seek professional help based on their answers to the survey questions.
This project was completed as part of a Statistics course at a university, and the team is currently in the process of writing a report and completing a paper that summarizes and discusses the findings in relation to other research on the topic.
The following is the Google Colab link to the project, done on Jupyter Notebook -
https://colab.research.google.com/drive/1p7P6lL1QUw1TtyUD1odNR4M6TVJK7IYN
The following is the GitHub Repository of the project -
https://github.com/daerkns/social-media-and-mental-health
Libraries used for the Project -
Pandas
Numpy
Matplotlib
Seaborn
Sci-kit Learn
Facebook
TwitterDuring a December 2022 survey among smartphone users aged 18 years or more who feel comfortable sharing their data with advertisers in the United States, over half of the respondents aged up to ** (** percent) said they were willing to share information about their interests. The same age group also indicated willingness to share their shopping habits at 35 percent.
Facebook
TwitterAnalyzing user funnels involves collecting and analyzing data related to user behaviour and actions at each stage of the funnel to understand how users progress through the different stages, and where they give up or exit.
Here’s a dataset we collected from an e-commerce platform based on the flow of users on their platform. Below are all the features in the dataset:
user_id: represents unique user identifiers stage: represents the stage of the user’s journey through the funnel conversion: indicates whether the user has converted or not
Facebook
TwitterNon-traditional data signals from social media and employment platforms for USER stock analysis
Facebook
TwitterSalutary Data is a boutique, B2B contact and company data provider that's committed to delivering high quality data for sales intelligence, lead generation, marketing, recruiting / HR, identity resolution, and ML / AI. Our database currently consists of 148MM+ highly curated B2B Contacts ( US only), along with over 4M+ companies, and is updated regularly to ensure we have the most up-to-date information.
We can enrich your in-house data ( CRM Enrichment, Lead Enrichment, etc.) and provide you with a custom dataset ( such as a lead list) tailored to your target audience specifications and data use-case. We also support large-scale data licensing to software providers and agencies that intend to redistribute our data to their customers and end-users.
What makes Salutary unique? - We offer our clients a truly unique, one-stop aggregation of the best-of-breed quality data sources. Our supplier network consists of numerous, established high quality suppliers that are rigorously vetted. - We leverage third party verification vendors to ensure phone numbers and emails are accurate and connect to the right person. Additionally, we deploy automated and manual verification techniques to ensure we have the latest job information for contacts. - We're reasonably priced and easy to work with.
Products: API Suite Web UI Full and Custom Data Feeds
Services: Data Enrichment - We assess the fill rate gaps and profile your customer file for the purpose of appending fields, updating information, and/or rendering net new “look alike” prospects for your campaigns. ABM Match & Append - Send us your domain or other company related files, and we’ll match your Account Based Marketing targets and provide you with B2B contacts to campaign. Optionally throw in your suppression file to avoid any redundant records. Verification (“Cleaning/Hygiene”) Services - Address the 2% per month aging issue on contact records! We will identify duplicate records, contacts no longer at the company, rid your email hard bounces, and update/replace titles or phones. This is right up our alley and levers our existing internal and external processes and systems.
Facebook
TwitterIn situations where data is not readily available but needed, you'll have to resort to building up the data yourself. There are many methods you can use to acquire this data from web scraping to APIs. But sometimes, you'll end up needing to create fake or “dummy” data. Dummy data can be useful in times where you know the exact features you’ll be using and the data types included but, you just don’t have the data itself.
Features Description
Reference - https://towardsdatascience.com/build-a-your-own-custom-dataset-using-python-9296540a0178